sf
classes for spatial vector
dataIn this tutorial we will use the sf package. The sf package focuses solely on vector data. It provides a standardized encoding of vector data and uses GDAL to read and write data, GEOS for geometrical operations and Proj for projection conversions and datum transformations.
The possiblities are huge. During the course (and also in this extra tutorial) we can only scratch the surface with some essentials, which hopefully invite you to experiment further and use them in your research. Details can be found in the book Applied Spatial Data Analysis with R and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio.
The package sf has geometry types, such as
point
(for one point), linestring
(for one
line) and polygon
(for one polygon). For sets of spatial
classes it has the multipoint
(multiple points),
multilinestring
, multipolygon
and
geometrycollection
. The geometrycollection
data class allows to have mixed geometries in one object. The package
sf has even more special geometry types (check
here). The most common geometry types for spatial vector data are
shown below:
Geometry | type | description |
---|---|---|
point | POINT | single point ; zero-dimensional geometry |
points | MULTIPOINT | set of points |
line | LINESTRING | sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry |
lines | MULTILINESTRING | set of linestrings |
rings | POLYGON | sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring; geometry with a positive area (two-dimensional); |
rings | MULTIPOLYGON | set of polygons |
mix | GEOMETRYCOLLECTION | set of geometries of any type except GEOMETRYCOLLECTION |
Simple feature objects are stored in data classes. Three data classes are used in sf to represent features:
sfg
): the feature geometry of
an individual simple feature;sfc
): a list or collection
of simple feature geometries;sf
): simple feature collections linked
to a data.frame with feature attributes;We will go through a few examples of creating geometries from scratch to familiarize yourself with these geometry types and data classes.
First, go to Google Maps on your computer, click on a location (e.g. a building or road), look at the bottom of the banner that pops up and copy the longitude and latitude of two points in Wageningen that are relevant to you. Use a decimal degree notation with at least 4 digits after the decimal point.
Question 1: In what coordinate reference system [CRS] are your points from Google Maps? What is the EPSG belonging to that CRS?
The example below shows how you can create spatial point objects from
these coordinates. Type ?function name
(e.g. ?st_point
) for finding help on the functions
used.
# Check for the sf package and install if missing
if(!"sf" %in% installed.packages()){install.packages("sf")}
# Load the sf package
library(sf)
# Create two simple feature geometry points
# Get coordinates of two points identified in Google Maps, for example:
point1_sfg <- st_point(c(5.664190, 51.993843)) # enter your own lat/long
point2_sfg <- st_point(c(5.673310, 51.982752)) # enter your own lat/long
# Combine point features into simple feature collection with CRS: mypoints_sfc
mypoints_sfc <- st_sfc(point1_sfg, point2_sfg, crs = 4326)
# Inspect the geometry types and data class of your objects
point1_sfg
class(point1_sfg)
mypoints_sfc
class(mypoints_sfc)
# Create and display some attribute data and store in a data frame
mydata <- data.frame(cbind(
id = c(1, 2),
name = c("my first point","my second point")))
mydata
## id name
## 1 1 my first point
## 2 2 my second point
# Combine simple feature collection and dataframe to create simple feature: mypoints_sf
mypoints_sf <- st_set_geometry(mydata, mypoints_sfc)
mypoints_sf
class(mypoints_sf) # notice how simple features are two classes
class(mypoints_sf) == "data.frame"
Question 2: What is the the difference between the objects
mypoints_sfc
andmypoints_sf
?
# When plotting a simple feature it makes a plot for every column
# Plot only the name column
plot(mypoints_sf[, "name"], col = c("orange", "blue"))
Now let us connect the two points by a straight line. Consult how to
make a line: ?st_linestring
.
## Now make the line by creating a matrix and linestring
points_matrix <- rbind(point1_sfg, point2_sfg) # create matrix
myline_sfg <- st_linestring(points_matrix) # create sfg
myline_sfc <- st_sfc(myline_sfg, crs = 4326) # create sfc
# Check the class and structure of the objects again.
class(points_matrix)
class(myline_sfg)
class(myline_sfc)
myline_sfc
# Add feature attributes by giving the line an id and a name
myline_df <- data.frame(id = 1, name = "Straight line") # create data.frame
myline_sf <- st_set_geometry(myline_df, myline_sfc) # create sf
# Check class and geometry type
class(myline_df)
class(myline_sf)
myline_sf
plot(myline_sf[1], main = myline_sf$name, col = "red", lwd = 3, lty = 5, graticule = TRUE, axes = TRUE)
Try to understand the above code and its results by studying
help: ?plot_sf
Try to add the points together with the lines on the same map.
What now follows is a brief intermezzo before we continue with the classes for polygons. We will use the OGR functionality of GDAL available through the package sf.
# Create subdirectory data within current working directory
dir.create("data", showWarnings = FALSE)
# Write to KML
st_write(obj = mypoints_sf, dsn = "data/mypoints.kml")
st_write(obj = myline_sf, dsn = "data/myline.kml")
The function st_write
requires entries for the arguments
obj
(object of class sf) and dsn
(data source
name). The function detects what driver it needs based on the file
extension of the dsn. Please study details in the help file:
?st_write
. Similarly the function st_read
allows reading OGR compatible data into a suitable simple feature
object.
Check (in Google Maps) whether the attribute data were written
correctly to the KML output (use My Maps from
Google). You need to be logged in on your Google account, go to My Maps, click
get started
, click the +
button to create a
new digital map. On the untitled layer click import
, import
the mypoints.kml
, add a new layer and import
myline.kml
. Cool! You just created a webmap with your point
and line features.
Now let us see if we can create a line in the webmap and import it to
R. Digitize a route (e.g. a walking route) between two points of
interest in Wageningen: click on the draw a line
symbol
(between the place marker and the direction symbol), click
Add walking route
, draw your route in Wageningen and name
the new layer myroute
.
Click on the main menu in the left panel (top … symbol), click
export to KML
, check box export to .KML file
,
select your route layer and download the kml (check here
how to export). Save the KML in the data directory under the name
myroute.kml
. We will read this file into a simple features
object.
## Read the KML as a simple feature
myroute <- st_read(dsn = "data/myroute.kml")
## Reading layer `route' from data source
## `C:\Users\marti\OneDrive - Wageningen University & Research\Bureaublad\Geoscripting 2023\lesson_extra1\data\myroute.kml'
## using driver `KML'
## Simple feature collection with 3 features and 2 fields
## Geometry type: GEOMETRY
## Dimension: XYZ
## Bounding box: xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
# What is the geometry type of this object?
myroute
## Simple feature collection with 3 features and 2 fields
## Geometry type: GEOMETRY
## Dimension: XYZ
## Bounding box: xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Name
## 1 Directions from Churchillweg 141-185, 6708 Wageningen, Netherlands to Bloemhof, Wageningen
## 2 Churchillweg 141-185, 6708 Wageningen, Netherlands
## 3 Bloemhof, Wageningen
## Description geometry
## 1 LINESTRING Z (5.67109 51.97...
## 2 POINT Z (5.671093 51.9777 0)
## 3 POINT Z (5.665383 51.97358 0)
# Select name column only
names(myroute)
## [1] "Name" "Description" "geometry"
myroute = myroute[, "Name"]
# What geometry types do you see?
plot(myroute)
Now we have the data, but some of our friends want these exact data too. One friend of ours is a software engineer and he wants a GeoJSON. Another friend is a GIS-analyst in QGIS and as a backup he wants the file in two formats: Geographic Markup Language (GML) and a Shapefile. These fileformats (KML, GeoJSON, GML and Shapefile) are commonly used in spatial analysis. Let’s try to give them the files in those formats!
# Try to export the simple feature to a GeoJSON. What happens?
try(st_write(myroute, dsn = "data/route.geojson"))
Okay success! Let us continue with the GML.
# Try to export the simple feature to a GML. What happens?
try(st_write(myroute, dsn = "data/route.gml"))
Works! Now transform to a Shapefile.
## Again export to a Shapefile. What happens now?
try(st_write(myroute, dsn = "data/route.shp"))
Writing the myroute object to a Shapefile worked only for the
linestring, but did not work for the points. Shapefiles do not handle
multiple geometry types well. We can solve this! Have a look at
?st_is
.
# Select linestring route
myroute_line <- myroute[st_is(myroute, type = "LINESTRING"),]
# Select start and end point route
myroute_points <- myroute[st_is(myroute, type = "POINT"),]
# Let us try to write the simple features to a Shapefile
try(st_write(myroute_line, dsn = "data/route_line.shp"))
try(st_write(myroute_points, dsn = "data/route_points.shp"))
We run into another issue; Shapefiles do not support 3D line strings and points. We have fixed the geometry type, but now need to work on the dimensions of the data. Let us look into dimensions.
All geometries are composed of points. Points are coordinates in a 2-, 3- or 4-dimensional space. All points in a geometry have the same dimensionality. In addition to X and Y coordinates, there are two optional additional dimensions:
The four possible cases are:
XY
XYZ
XYM
XYZM
So we need to find a function that changes dimensions from XYZ to XY.
Feel free to script your own function that can change the dimensions.
Otherwise look into ?st_zm
.
## Convert XYZ dimension to XY dimension
myroute_points_xy <- st_zm(myroute_points)
myroute_line_xy <- st_zm(myroute_line)
## Check your geometry type and dimension. Are they point/linestring and XY?
myroute_points_xy
## Simple feature collection with 2 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 5.665383 ymin: 51.97358 xmax: 5.671093 ymax: 51.9777
## Geodetic CRS: WGS 84
## Name geometry
## 2 Churchillweg 141-185, 6708 Wageningen, Netherlands POINT (5.671093 51.9777)
## 3 Bloemhof, Wageningen POINT (5.665383 51.97358)
myroute_line_xy
## Simple feature collection with 1 feature and 1 field
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## Geodetic CRS: WGS 84
## Name
## 1 Directions from Churchillweg 141-185, 6708 Wageningen, Netherlands to Bloemhof, Wageningen
## geometry
## 1 LINESTRING (5.67109 51.9777...
# Try to export the simple feature to a Shapefile
try(st_write(myroute_points_xy, dsn = "data/route_points.shp"))
# Overwrite your previous Shapefile by adding delete_layer = TRUE
try((st_write(myroute_line_xy, dsn = "data/route.shp", delete_layer = TRUE)))
Hooray, it worked! Now our friends have the data they wanted. Try to understand the above code and results. Feel free to display the data in QGIS.
Transformations between coordinate systems are crucial to many GIS applications. The Keyhole Markup Language (KML) used by Google Earth uses latitude and longitude in a polar WGS84 coordinate system (i.e. geographic coordinates). However, in some of the examples below we will use metric distances (i.e. projected coordinates).There are two types of coordinate systems that you need to recognise: projected coordinate systems and geographic coordinate systems.
One of the challenges of working with geo-spatial data is that geodetic locations (points on the Earth surface) are mapped into a two-dimensional cartesian plane using a cartographic projection. Projected coordinates are coordinates that refer to a point on a two-dimensional map that represents the surface of the Earth (i.e. projected coordinate system). Latitude and longitude values are an example of an unprojected coordinate system. These are coordinates that directly refer to a point on the Earth’s surface. One way to deal with this is by transforming the data to a planar coordinate system. In R this can be achieved via bindings to the PROJ - Cartographic Projections Library (https://proj.org/), which are available in sf.
Central to spatial data in the sf package is that they have a coordinate reference system, which is coded as a numeric epsg code and a character string describing the projection by the PROJ projection library. Operations on different spatial data sets need to have compatible coordinate reference systems (i.e., identical). An interface to the PROJ library is available through the sf package.
We will transform our spatial data to the Dutch grid (Rijksdriehoekstelsel), often referred to as RD or RD_New. Please note that:
## Define CRS object for RD projection
# See how the transformation from unprojected WGS84 is defined to a projected coordinate system
prj_string_RD <- st_crs("+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.2369,50.0087,465.658,-0.406857330322398,0.350732676542563,-1.8703473836068,4.0812 +units=m +no_defs")
# Perform the coordinate transformation from WGS84 to RD
myroute_rd_string <- st_transform(myroute_line_xy, crs = prj_string_RD)
# Instead of having to define the projection ourselves, we can make use of the projection library by using an EPSG code
myroute_rd_num <- st_transform(myroute_line_xy, crs = 28992)
myroute_points_rd <- st_transform(myroute_points_xy, crs = 28992)
# Check if your transformed routes are the same
identical(myroute_rd_string, myroute_rd_num)
## [1] FALSE
Question 3: Why are your routes not identical? Hint: inspect the data
After transforming the line, plot the line using the basic plot command.
Now that the geometries are projected to a planar coordinate system, the length can be computed. Let’s use different methods to calculate the distance between the two points. Before we calculate distances we need the package geosphere.
# Check for the geosphere package and install if missing
if(!"geosphere" %in% installed.packages()){install.packages("geosphere")}
# Load package
library(geosphere)
# Network distance: WGS84, RD_New
(dist_netw_wgs = st_length(myroute_line_xy)) # line with WGS84 CRS
## 982.5135 [m]
(dist_netw_rd = st_length(myroute_rd_num)) # line with RD_new CRS
## 984.3013 [m]
## Euclidean distance: WGS84, RD_New
# Get data from matrix position [1, 2]
(dist_eucl_wgs = st_distance(myroute_points_xy)[1, 2])
## 601.6798 [m]
(dist_eucl_rd = st_distance(myroute_points_rd)[1, 2])
## 602.6681 [m]
Okay as expected the network distance is bigger than a straight line between the two points, but a question remains:
Question 4: Why does the distance slightly differ between objects with CRS WGS84 and RD_New?
Now we know the network and euclidean distance between the two
points. As you might have noticed the st_length()
and
st_distance()
functions gave a unit behind the number. It
was a meter in our case, but maybe we want to know how many feet or inch
the distance was. The sf package allows unit conversions by
bindings to the units package.
units::set_units(dist_netw_rd, km)
## 0.9843013 [km]
units::set_units(dist_netw_rd, feet)
## 3229.335 [feet]
units::set_units(dist_netw_rd, inch)
## 38752.02 [inch]
Calculating units can also be performed for surfaces or volumes. Later you can give a try to calculate from square meters to hectares for a surface area.
After the intermezzo, we now continue with sf classes for polygon objects. The idea is to illustrate the classes; the data are meaningless. Let us create overlapping circles around our points.
# Create sfc with geometry in RD New
# We need to have projected (RD_New) data to be able to measure features in 2D (e.g. meters)
point1_sfg_rd <- st_point(c(174023.833719082, 445086.995949553))
point2_sfg_rd <- st_point(c(174655.068279838, 443855.475880421))
# Combine point features into simple feature collection with CRS
mypoints_sfc_rd <- st_sfc(point1_sfg_rd, point2_sfg_rd, crs = 28992)
# Create individual points
pnt1_rd <- mypoints_sfc_rd[[1]]
pnt2_rd <- mypoints_sfc_rd[[2]]
# Calculate distance between points
dist_points_rd <- st_distance(mypoints_sfc_rd)[1, 2]
# Make circles around points, with radius equal to distance between points
# Define a series of angles going from 0 to 2pi
ang <- pi * 0:200 / 100
# Distance needs to be converted from units m class to double class
circle1x <- pnt1_rd[1] + cos(ang) * as.double(dist_points_rd)
circle1y <- pnt1_rd[2] + sin(ang) * as.double(dist_points_rd)
circle2x <- pnt2_rd[1] + cos(ang) * as.double(dist_points_rd)
circle2y <- pnt2_rd[2] + sin(ang) * as.double(dist_points_rd)
c1 <- cbind(circle1x, circle1y)
c2 <- cbind(circle2x, circle2y)
You can plot these points using basic R plot commands:
plot(c1, pch = 19, cex = 0.2, col = "red", ylim = range(circle1y, circle2y), xlim = range(circle1x, circle2x))
points(c2, pch = 19, cex = 0.2, col = "blue")
plot(mypoints_sfc_rd, pch = 3, col = "darkgreen", add = TRUE)
See how we calculated a circle around every point. Although it looks like a closed ring, the visualized circle is not a polygon. Let’s try to create a circular polygon. As before we step through the process of creating a simple feature:
Simple feature geometry (sfg)
Simple feature collections (sfc)
Dataframe (df)
Simple features (sf)
# Bind columns together into matrix and create a list of matrix
# ?cbind combines columns and ?rbind combines rows
circle1_sfg <- st_polygon(list(cbind(circle1x, circle1y)))
circle2_sfg <- st_polygon(list(cbind(circle2x, circle2y)))
circles_sfc <- st_sfc(circle1_sfg, circle2_sfg, crs = 28992)
circles_df <- data.frame(name = c("circle1", "circle2"), row.names = c("1", "2"))
circles_sf <- st_set_geometry(circles_df, circles_sfc)
Now we calculated the points of the circle to obtain a perfect circle
and then made it into a polygon, but we can also create a circular
polygon by using the function st_buffer()
.
The sf package can do more spatial operations by spatially
selecting certain features and changing the geometry. Examples of
spatial selections are st_intersects
,
st_contains
, st_within
, st_covers
etc.
The package can change geometries of features with
st_difference
(clip), st_intersection
(intersect), st_union
, st_sym_difference
,
st_centroid
, st_segmentize
,
st_convex_hull
and st_buffer
. More methods can
be found with the command: methods(class = "sf")
.
The final results of st_buffer
can be plotted using
basic R plotting commands:
# Buffer
point_buff <- st_buffer(mypoints_sfc_rd, dist = dist_points_rd)
# Plot
plot(point_buff, col = c("#2b8cbe", "#31a354"), lwd = 2)
plot(mypoints_sfc_rd, add = TRUE, col = "red", pch = 19, cex = 1.5)
# Buffer again with the nquadsegs option
circle_quadsegs <- st_buffer(mypoints_sfc_rd[1], dist = dist_points_rd, nQuadSegs = 2)
circles_diff <- st_difference(circles_sfc[1], circle_quadsegs)
# Plot (see the difference?)
plot(circles_sfc[1], col = "red")
plot(circle_quadsegs, add = TRUE, lty = 3, lwd = 2, col = "grey")
Question 5: What happens if you change
nQuadSegs
to a higher number?
# Check areal difference
st_area(circles_diff)
## 598753.3 [m^2]
# What is the difference in hectares?
units::set_units(st_area(circles_diff), hectare)
## 59.87533 [hectare]
# Intersect
circles_intersect <- st_intersection(circles_sfc[1], circles_sfc[2])
# Plot
plot(circles_sfc, col = "grey")
plot(circles_intersect, col = "red", add = TRUE)
# What percentage of the two circles overlaps?
(paste(round(100 * st_area(circles_intersect) / st_area(circles_sfc[1]), 2), "%"))
## [1] "39.1 %"
Question 6: Do you understand the script? What is the difference between
st_difference
andst_intersection
?