Extra: Intro to vector

WUR Geoscripting WUR logo

Extra: Intro to vector

Learning objectives

  • Create point, line and polygon objects from scratch
  • Explore the structure of sf classes for spatial vector data
  • Plot spatial vector data
  • Transform between datums and map projections
  • Write and read spatial vector formats (e.g. KML, GML, GeoJSON, shapefile)
  • Apply basic operations on vector data, such as buffering, intersection and area calculation

Vector R basics

Working with spatial vector data in R

In this tutorial we will use the sf package. The sf package focuses solely on vector data. It provides a standardized encoding of vector data and uses GDAL to read and write data, GEOS for geometrical operations and Proj for projection conversions and datum transformations.

The possiblities are huge. During the course (and also in this extra tutorial) we can only scratch the surface with some essentials, which hopefully invite you to experiment further and use them in your research. Details can be found in the book Applied Spatial Data Analysis with R and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio.

Creating and manipulating geometries

The package sf has geometry types, such as point (for one point), linestring (for one line) and polygon (for one polygon). For sets of spatial classes it has the multipoint (multiple points), multilinestring, multipolygon and geometrycollection. The geometrycollection data class allows to have mixed geometries in one object. The package sf has even more special geometry types (check here). The most common geometry types for spatial vector data are shown below:

Overview of sf package geometry types.
Geometry type description
point POINT single point ; zero-dimensional geometry
points MULTIPOINT set of points
line LINESTRING sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry
lines MULTILINESTRING set of linestrings
rings POLYGON sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring; geometry with a positive area (two-dimensional);
rings MULTIPOLYGON set of polygons
mix GEOMETRYCOLLECTION set of geometries of any type except GEOMETRYCOLLECTION

Simple feature objects are stored in data classes. Three data classes are used in sf to represent features:

  • simple feature geometry (sfg): the feature geometry of an individual simple feature;
  • simple feature collections (sfc): a list or collection of simple feature geometries;
  • simple features (sf): simple feature collections linked to a data.frame with feature attributes;

We will go through a few examples of creating geometries from scratch to familiarize yourself with these geometry types and data classes.

First, go to Google Maps on your computer, click on a location (e.g. a building or road), look at the bottom of the banner that pops up and copy the longitude and latitude of two points in Wageningen that are relevant to you. Use a decimal degree notation with at least 4 digits after the decimal point.

Question 1: In what coordinate reference system [CRS] are your points from Google Maps? What is the EPSG belonging to that CRS?

Points

The example below shows how you can create spatial point objects from these coordinates. Type ?function name (e.g. ?st_point) for finding help on the functions used.

# Check for the sf package and install if missing
if(!"sf" %in% installed.packages()){install.packages("sf")}

# Load the sf package
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
# Create two simple feature geometry points
# Get coordinates of two points identified in Google Maps, for example:
point1_sfg <- st_point(c(5.664190, 51.993843))  # enter your own lat/long
point2_sfg <- st_point(c(5.673310, 51.982752))   # enter your own lat/long

# Combine point features into simple feature collection with CRS: mypoints_sfc
mypoints_sfc <- st_sfc(point1_sfg, point2_sfg, crs = 4326)
# Inspect the geometry types and data class of your objects
point1_sfg
class(point1_sfg)

mypoints_sfc
class(mypoints_sfc)
# Create and display some attribute data and store in a data frame
mydata <- data.frame(cbind(
    id = c(1, 2),
    name = c("my first point","my second point")))
mydata
##   id            name
## 1  1  my first point
## 2  2 my second point
# Combine simple feature collection and dataframe to create simple feature: mypoints_sf
mypoints_sf <- st_set_geometry(mydata, mypoints_sfc)
mypoints_sf
class(mypoints_sf) # notice how simple features are two classes
class(mypoints_sf) == "data.frame"

Question 2: What is the the difference between the objects mypoints_sfc and mypoints_sf?

# When plotting a simple feature it makes a plot for every column
# Plot only the name column
plot(mypoints_sf[, "name"], col = c("orange", "blue"))

Lines

Now let us connect the two points by a straight line. Consult how to make a line: ?st_linestring.

## Now make the line by creating a matrix and linestring
points_matrix <- rbind(point1_sfg, point2_sfg) # create matrix
myline_sfg <- st_linestring(points_matrix) # create sfg
myline_sfc <- st_sfc(myline_sfg, crs = 4326) # create sfc
# Check the class and structure of the objects again.
class(points_matrix)
class(myline_sfg)
class(myline_sfc)
myline_sfc
# Add feature attributes by giving the line an id and a name
myline_df <- data.frame(id = 1, name = "Straight line") # create data.frame
myline_sf <- st_set_geometry(myline_df, myline_sfc) # create sf
# Check class and geometry type
class(myline_df)
class(myline_sf)
myline_sf
plot(myline_sf[1], main = myline_sf$name, col = "red", lwd = 3, lty = 5, graticule = TRUE, axes = TRUE)

Try to understand the above code and its results by studying help: ?plot_sf

Try to add the points together with the lines on the same map.

Writing and reading spatial vector data

What now follows is a brief intermezzo before we continue with the classes for polygons. We will use the OGR functionality of GDAL available through the package sf.

# Create subdirectory data within current working directory
dir.create("data", showWarnings = FALSE)

# Write to KML
st_write(obj = mypoints_sf, dsn = "data/mypoints.kml")
st_write(obj = myline_sf, dsn = "data/myline.kml")

The function st_write requires entries for the arguments obj (object of class sf) and dsn (data source name). The function detects what driver it needs based on the file extension of the dsn. Please study details in the help file: ?st_write. Similarly the function st_read allows reading OGR compatible data into a suitable simple feature object.

Check (in Google Maps) whether the attribute data were written correctly to the KML output (use My Maps from Google). You need to be logged in on your Google account, go to My Maps, click get started, click the + button to create a new digital map. On the untitled layer click import, import the mypoints.kml, add a new layer and import myline.kml. Cool! You just created a webmap with your point and line features.

Now let us see if we can create a line in the webmap and import it to R. Digitize a route (e.g. a walking route) between two points of interest in Wageningen: click on the draw a line symbol (between the place marker and the direction symbol), click Add walking route, draw your route in Wageningen and name the new layer myroute.

Click on the main menu in the left panel (top … symbol), click export to KML, check box export to .KML file, select your route layer and download the kml (check here how to export). Save the KML in the data directory under the name myroute.kml. We will read this file into a simple features object.

## Read the KML as a simple feature
myroute <- st_read(dsn = "data/myroute.kml")
## Reading layer `route' from data source 
##   `C:\Users\marti\OneDrive - Wageningen University & Research\Bureaublad\Geoscripting 2023\lesson_extra1\data\myroute.kml' 
##   using driver `KML'
## Simple feature collection with 3 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XYZ
## Bounding box:  xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84
# What is the geometry type of this object?
myroute 
## Simple feature collection with 3 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XYZ
## Bounding box:  xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84
##                                                                                         Name
## 1 Directions from Churchillweg 141-185, 6708 Wageningen, Netherlands to Bloemhof, Wageningen
## 2                                         Churchillweg 141-185, 6708 Wageningen, Netherlands
## 3                                                                       Bloemhof, Wageningen
##   Description                       geometry
## 1             LINESTRING Z (5.67109 51.97...
## 2               POINT Z (5.671093 51.9777 0)
## 3              POINT Z (5.665383 51.97358 0)
# Select name column only
names(myroute)
## [1] "Name"        "Description" "geometry"
myroute = myroute[, "Name"] 

# What geometry types do you see?
plot(myroute) 

Now we have the data, but some of our friends want these exact data too. One friend of ours is a software engineer and he wants a GeoJSON. Another friend is a GIS-analyst in QGIS and as a backup he wants the file in two formats: Geographic Markup Language (GML) and a Shapefile. These fileformats (KML, GeoJSON, GML and Shapefile) are commonly used in spatial analysis. Let’s try to give them the files in those formats!

# Try to export the simple feature to a GeoJSON. What happens?
try(st_write(myroute, dsn = "data/route.geojson"))

Okay success! Let us continue with the GML.

# Try to export the simple feature to a GML. What happens?
try(st_write(myroute, dsn = "data/route.gml"))

Works! Now transform to a Shapefile.

## Again export to a Shapefile. What happens now?
try(st_write(myroute, dsn = "data/route.shp"))

Writing the myroute object to a Shapefile worked only for the linestring, but did not work for the points. Shapefiles do not handle multiple geometry types well. We can solve this! Have a look at ?st_is.

# Select linestring route
myroute_line <- myroute[st_is(myroute, type = "LINESTRING"),]

# Select start and end point route
myroute_points <- myroute[st_is(myroute, type = "POINT"),]
# Let us try to write the simple features to a Shapefile
try(st_write(myroute_line, dsn = "data/route_line.shp"))
try(st_write(myroute_points, dsn = "data/route_points.shp"))

We run into another issue; Shapefiles do not support 3D line strings and points. We have fixed the geometry type, but now need to work on the dimensions of the data. Let us look into dimensions.

Dimensions

All geometries are composed of points. Points are coordinates in a 2-, 3- or 4-dimensional space. All points in a geometry have the same dimensionality. In addition to X and Y coordinates, there are two optional additional dimensions:

  • a Z coordinate, denoting altitude
  • a M coordinate (rarely used), denoting a measure that is associated with the point; examples could be time of measurement, or measurement error of the coordinates

The four possible cases are:

  1. two-dimensional points XY
  2. three-dimensional points XYZ
  3. three-dimensional points XYM
  4. four-dimensional points XYZM

So we need to find a function that changes dimensions from XYZ to XY. Feel free to script your own function that can change the dimensions. Otherwise look into ?st_zm.

## Convert XYZ dimension to XY dimension
myroute_points_xy <- st_zm(myroute_points)
myroute_line_xy <- st_zm(myroute_line)

## Check your geometry type and dimension. Are they point/linestring and XY?
myroute_points_xy
## Simple feature collection with 2 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 5.665383 ymin: 51.97358 xmax: 5.671093 ymax: 51.9777
## Geodetic CRS:  WGS 84
##                                                 Name                  geometry
## 2 Churchillweg 141-185, 6708 Wageningen, Netherlands  POINT (5.671093 51.9777)
## 3                               Bloemhof, Wageningen POINT (5.665383 51.97358)
myroute_line_xy
## Simple feature collection with 1 feature and 1 field
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## Geodetic CRS:  WGS 84
##                                                                                         Name
## 1 Directions from Churchillweg 141-185, 6708 Wageningen, Netherlands to Bloemhof, Wageningen
##                         geometry
## 1 LINESTRING (5.67109 51.9777...
# Try to export the simple feature to a Shapefile
try(st_write(myroute_points_xy, dsn = "data/route_points.shp"))

# Overwrite your previous Shapefile by adding delete_layer = TRUE
try((st_write(myroute_line_xy, dsn = "data/route.shp", delete_layer = TRUE)))

Hooray, it worked! Now our friends have the data they wanted. Try to understand the above code and results. Feel free to display the data in QGIS.

Transformation of coordinate system

Transformations between coordinate systems are crucial to many GIS applications. The Keyhole Markup Language (KML) used by Google Earth uses latitude and longitude in a polar WGS84 coordinate system (i.e. geographic coordinates). However, in some of the examples below we will use metric distances (i.e. projected coordinates).There are two types of coordinate systems that you need to recognise: projected coordinate systems and geographic coordinate systems.

One of the challenges of working with geo-spatial data is that geodetic locations (points on the Earth surface) are mapped into a two-dimensional cartesian plane using a cartographic projection. Projected coordinates are coordinates that refer to a point on a two-dimensional map that represents the surface of the Earth (i.e. projected coordinate system). Latitude and longitude values are an example of an unprojected coordinate system. These are coordinates that directly refer to a point on the Earth’s surface. One way to deal with this is by transforming the data to a planar coordinate system. In R this can be achieved via bindings to the PROJ - Cartographic Projections Library (https://proj.org/), which are available in sf.

Central to spatial data in the sf package is that they have a coordinate reference system, which is coded as a numeric epsg code and a character string describing the projection by the PROJ projection library. Operations on different spatial data sets need to have compatible coordinate reference systems (i.e., identical). An interface to the PROJ library is available through the sf package.

We will transform our spatial data to the Dutch grid (Rijksdriehoekstelsel), often referred to as RD or RD_New. Please note that:

## Define CRS object for RD projection
# See how the transformation from unprojected WGS84 is defined to a projected coordinate system
prj_string_RD <- st_crs("+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.2369,50.0087,465.658,-0.406857330322398,0.350732676542563,-1.8703473836068,4.0812 +units=m +no_defs")

# Perform the coordinate transformation from WGS84 to RD
myroute_rd_string <- st_transform(myroute_line_xy, crs = prj_string_RD)

# Instead of having to define the projection ourselves, we can make use of the projection library by using an EPSG code
myroute_rd_num <- st_transform(myroute_line_xy, crs = 28992)
myroute_points_rd <- st_transform(myroute_points_xy, crs = 28992)

# Check if your transformed routes are the same
identical(myroute_rd_string, myroute_rd_num)
## [1] FALSE

Question 3: Why are your routes not identical? Hint: inspect the data

After transforming the line, plot the line using the basic plot command.

plot(myroute_rd_num, main = myroute_rd_num$name, col = "red")
box()

Now that the geometries are projected to a planar coordinate system, the length can be computed. Let’s use different methods to calculate the distance between the two points. Before we calculate distances we need the package geosphere.

# Check for the geosphere package and install if missing
if(!"geosphere" %in% installed.packages()){install.packages("geosphere")}

# Load package
library(geosphere)

# Network distance: WGS84, RD_New
(dist_netw_wgs = st_length(myroute_line_xy)) # line with WGS84 CRS
## 982.5135 [m]
(dist_netw_rd = st_length(myroute_rd_num)) # line with RD_new CRS
## 984.3013 [m]
## Euclidean distance: WGS84, RD_New
# Get data from matrix position [1, 2]
(dist_eucl_wgs = st_distance(myroute_points_xy)[1, 2])
## 601.6798 [m]
(dist_eucl_rd = st_distance(myroute_points_rd)[1, 2])
## 602.6681 [m]

Okay as expected the network distance is bigger than a straight line between the two points, but a question remains:

Question 4: Why does the distance slightly differ between objects with CRS WGS84 and RD_New?

Now we know the network and euclidean distance between the two points. As you might have noticed the st_length() and st_distance() functions gave a unit behind the number. It was a meter in our case, but maybe we want to know how many feet or inch the distance was. The sf package allows unit conversions by bindings to the units package.

units::set_units(dist_netw_rd, km)
## 0.9843013 [km]
units::set_units(dist_netw_rd, feet)
## 3229.335 [feet]
units::set_units(dist_netw_rd, inch)
## 38752.02 [inch]

Calculating units can also be performed for surfaces or volumes. Later you can give a try to calculate from square meters to hectares for a surface area.

Polygons

After the intermezzo, we now continue with sf classes for polygon objects. The idea is to illustrate the classes; the data are meaningless. Let us create overlapping circles around our points.

# Create sfc with geometry in RD New
# We need to have projected (RD_New) data to be able to measure features in 2D (e.g. meters)
point1_sfg_rd <- st_point(c(174023.833719082, 445086.995949553))
point2_sfg_rd <- st_point(c(174655.068279838, 443855.475880421))

# Combine point features into simple feature collection with CRS
mypoints_sfc_rd <- st_sfc(point1_sfg_rd, point2_sfg_rd, crs = 28992)

# Create individual points
pnt1_rd <- mypoints_sfc_rd[[1]]
pnt2_rd <- mypoints_sfc_rd[[2]]

# Calculate distance between points
dist_points_rd <- st_distance(mypoints_sfc_rd)[1, 2]
# Make circles around points, with radius equal to distance between points
# Define a series of angles going from 0 to 2pi
ang <- pi * 0:200 / 100

# Distance needs to be converted from units m class to double class
circle1x <- pnt1_rd[1] + cos(ang) * as.double(dist_points_rd)
circle1y <- pnt1_rd[2] + sin(ang) * as.double(dist_points_rd)
circle2x <- pnt2_rd[1] + cos(ang) * as.double(dist_points_rd)
circle2y <- pnt2_rd[2] + sin(ang) * as.double(dist_points_rd)
c1 <- cbind(circle1x, circle1y)
c2 <- cbind(circle2x, circle2y)

You can plot these points using basic R plot commands:

plot(c1, pch = 19, cex = 0.2, col = "red", ylim = range(circle1y, circle2y), xlim = range(circle1x, circle2x))
points(c2, pch = 19, cex = 0.2, col = "blue")
plot(mypoints_sfc_rd, pch = 3, col = "darkgreen", add = TRUE)

See how we calculated a circle around every point. Although it looks like a closed ring, the visualized circle is not a polygon. Let’s try to create a circular polygon. As before we step through the process of creating a simple feature:

  • Simple feature geometry (sfg)
  • Simple feature collections (sfc)
  • Dataframe (df)
  • Simple features (sf)
# Bind columns together into matrix and create a list of matrix
# ?cbind combines columns and ?rbind combines rows
circle1_sfg <- st_polygon(list(cbind(circle1x, circle1y)))
circle2_sfg <- st_polygon(list(cbind(circle2x, circle2y)))

circles_sfc <- st_sfc(circle1_sfg, circle2_sfg, crs = 28992)

circles_df <- data.frame(name = c("circle1", "circle2"), row.names = c("1", "2"))
circles_sf <- st_set_geometry(circles_df, circles_sfc)
## Plot the circle
plot(circles_sf)

Now we calculated the points of the circle to obtain a perfect circle and then made it into a polygon, but we can also create a circular polygon by using the function st_buffer().

The sf package can do more spatial operations by spatially selecting certain features and changing the geometry. Examples of spatial selections are st_intersects, st_contains, st_within, st_covers etc.

The package can change geometries of features with st_difference (clip), st_intersection (intersect), st_union, st_sym_difference, st_centroid, st_segmentize, st_convex_hull and st_buffer. More methods can be found with the command: methods(class = "sf").

The final results of st_buffer can be plotted using basic R plotting commands:

# Buffer
point_buff <- st_buffer(mypoints_sfc_rd, dist = dist_points_rd)

# Plot
plot(point_buff, col = c("#2b8cbe", "#31a354"), lwd = 2)
plot(mypoints_sfc_rd, add = TRUE, col = "red", pch = 19, cex = 1.5)
plot example of final results
# Buffer again with the nquadsegs option
circle_quadsegs <- st_buffer(mypoints_sfc_rd[1], dist = dist_points_rd, nQuadSegs = 2)
circles_diff <- st_difference(circles_sfc[1], circle_quadsegs)

# Plot (see the difference?)
plot(circles_sfc[1], col = "red")
plot(circle_quadsegs, add = TRUE, lty = 3, lwd = 2, col = "grey")

Question 5: What happens if you change nQuadSegs to a higher number?

# Check areal difference
st_area(circles_diff)
## 598753.3 [m^2]
# What is the difference in hectares?
units::set_units(st_area(circles_diff), hectare)
## 59.87533 [hectare]
# Intersect
circles_intersect <- st_intersection(circles_sfc[1], circles_sfc[2])

# Plot
plot(circles_sfc, col = "grey")
plot(circles_intersect, col = "red", add = TRUE)
# What percentage of the two circles overlaps?
(paste(round(100 * st_area(circles_intersect) / st_area(circles_sfc[1]), 2), "%"))
## [1] "39.1 %"

Question 6: Do you understand the script? What is the difference between st_difference and st_intersection ?

More info