Go through the whole lesson and try to answer the questions below. We will address the questions in the lesson and discuss them.

Learn how to handle vector data

In today's lecture, we will explore the basics of handling spatial vector data in R. There are several R packages for this purpose but we will focus on using `sf`

and some related packages. At the end of the lecture, you should be able to:

- create point, line and polygon objects from scratch;
- explore the structure of
`sf`

classes for spatial vector data; - plot spatial vector data;
- transform between datums and map projections;
- write and read spatial vector formats (e.g. KML, GML, GeoJSON, shapefile);
- apply basic operations on vector data, such as buffering, intersection and area calculation;

In this tutorial we will use the *sf* package. It is a spatial vector data package, that builds on top of *sp*, *rgdal* and *rgeos* packages.

The *sf* package focuses solely on vector data. It provides a standardized encoding of vector data and uses GDAL to read and write data, GEOS for geometrical operations and Proj.4 for projection conversions and datum transformations.

Another package that commonly is used, is the *sp* package. It provides classes for importing, manipulating and exporting spatial data in R, and methods for doing so. It can handle both vectors and rasters. It is often the foundation for other spatial packages, such as *raster*.

The *rgdal* package includes bindings to parts of the **OGR Simple Feature Library** which provides access to a variety of vector file formats such as GML, GeoJSON, ESRI Shapefiles and KML. The OGR library is part of the widely used **Geospatial Data Abstraction Library** (GDAL). The GDAL library is the most useful freely-available library for reading and writing geospatial data. The GDAL library is well-documented (http://gdal.org/), but with a catch for R and Python programmers. The GDAL (and associated OGR) library and command line tools are all written in C and C++. Bindings are available that allow access from a variety of other languages including *R* and *Python* but the documentation is all written for the C++ version of the libraries. This can make reading the documentation rather challenging. Fortunately, the *sf* package, providing GDAL bindings in R, is also well documented with lots of examples. The same is valid for the Python libaries.

Similarly, *rgeos* is an interface to the powerful **Geometry Engine Open Source (GEOS)** library for all kinds of operations on geometries (buffering, overlaying, area calculations, etc.).

Thus, functionality that you commonly find in expensive GIS software is also available within R, using free but very powerful software libaries.

The possiblities are huge. In this course we can only scratch the surface with some essentials, which hopefully invite you to experiment further and use them in your research. Details can be found in the book *Applied Spatial Data Analysis with R* and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio.

The package *sf* has geometry types, such as `point`

(for one point), `linestring`

(for one line) and `polygon`

(for one polygon). For sets of spatial classes it has the `multipoint`

(multiple points), `multilinestring`

, `multipolygon`

and `geometrycollection`

. The geometry collection data class allows to have mixed geometries in one object. The package *sf* has even more special geometry types (check here). The most common geometry types for spatial vector data are shown below:

Geometry | type | description |
---|---|---|

point | POINT |
single point ; zero-dimensional geometry |

points | MULTIPOINT |
set of points |

line | LINESTRING |
sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry |

lines | MULTILINESTRING |
set of linestrings |

rings | POLYGON |
sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring; geometry with a positive area (two-dimensional); |

rings | MULTIPOLYGON |
set of polygons |

mix | GEOMETRYCOLLECTION |
set of geometries of any type except GEOMETRYCOLLECTION |

Simple feature objects are stored in data classes. Three data classes are used in *sf* to represent features:

- simple feature geometry (sfg): the feature geometry of an individual simple feature;
- simple feature collections (sfc): a list or collection of simple feature geometries;
- simple features (sf): simple feature collections linked to a data.frame with feature attributes;

We will go through a few examples of creating geometries from scratch to familiarize yourself with these geometry types and data classes.

First, go to Google Maps on your computer, click on a location (e.g. a building or road), look at the bottom of the banner that pops up and copy the longitude and latitude of two points in Wageningen that are relevant to you. Use a decimal degree notation with at least 4 digits after the decimal point.

Question 1:In what coordinate reference system [CRS] are your points from Google Maps? What is the EPSG belonging to that CRS?

The example below shows how you can create spatial point objects from these coordinates. Type `?function name`

(e.g. `?st_point`

) for finding help on the functions used.

```
## Load the sf package
library(sf)
```

```
## Create two simple feature geometry points
## Get coordinates of two points identified in Google Maps, for example:
point1_sfg <- st_point(c(5.664190, 51.993843)) # enter your own lat/long
point2_sfg <- st_point(c(5.673310, 51.982752)) # enter your own lat/long
## Combine point features into simple feature collection with CRS: mypoints_sfc
mypoints_sfc <- st_sfc(point1_sfg, point2_sfg, crs = 4326)
mypoints_sfc_wgs <- st_sfc(point1_sfg, point2_sfg, crs = "+proj=longlat +datum=WGS84 +no_defs")
# These two functions do the same. Which one do you like more?
```

```
## Are they identical?
identical(mypoints_sfc, mypoints_sfc_wgs)
```

```
## Inspect the geometry types and data class of your objects
point1_sfg
class(point1_sfg)
mypoints_sfc
class(mypoints_sfc)
```

```
## Create and display some attribute data and store in a data frame
mydata <- data.frame(cbind(
id = c(1,2),
name = c("my first point","my second point")))
mydata
```

```
## id name
## 1 1 my first point
## 2 2 my second point
```

```
## Combine simple feature collection and dataframe to create simple feature: mypoints_sf
mypoints_sf <- st_set_geometry(mydata, mypoints_sfc)
```

```
mypoints_sf
class(mypoints_sf) # notice how simple features are two classes
class(mypoints_sf) == "data.frame"
```

Question 2:What is the the difference between the objects`mypoints_sfc`

and`mypoints_sf`

?

```
## When plotting a simple feature it makes a plot for every column
## Plot only the name column
plot(mypoints_sf[,"name"], col = c("orange","blue"))
```

Now let us connect the two points by a straight line. Consult how to make a line: `?st_linestring`

.

```
## Now make the line by creating a matrix and linestring
points_matrix <- rbind(point1_sfg, point2_sfg) # Create matrix
myline_sfg <- st_linestring(points_matrix) # Create sfg
myline_sfc <- st_sfc(myline_sfg, crs = 4326) # Create sfc
```

```
## Check the class and structure of the objects again.
class(points_matrix)
class(myline_sfg)
class(myline_sfc)
myline_sfc
```

```
## Add feature attributes by giving the line an id and a name
myline_df <- data.frame(id = 1, name = "Straight line") # Create data.frame
myline_sf <- st_set_geometry(myline_df, myline_sfc) #Create sf
```

```
## Check class and geometry type
class(myline_df)
class(myline_sf)
myline_sf
```

`plot(myline_sf[1], main = myline_sf$name, col = "red", lwd = 3, lty = 5, graticule = TRUE, xlab = "latitude", ylab = "longitude", axes = TRUE)`

*Try to understand the above code and its results by studying help: ?plot_sf*

*Try to add the points together with the lines on the same map.*

What now follows is a brief intermezzo before we continue with the classes for polygons. We will use the OGR functionality of GDAL available through the package *sf*.

```
## Create subdirectory data within current working directory
#getwd() #if needed
#setwd()
dir.create("data", showWarnings = FALSE)
## Write to KML
st_write(obj = mypoints_sf, dsn = "./data/mypoints.kml")
st_write(obj = myline_sf, dsn = "./data/myline.kml")
```

The function `st_write`

requires entries for the arguments `obj`

(object of class sf) and `dsn`

(data source name). The function detects what driver it needs based on the file extension of the dsn. Please study details in the help file: `?st_write`

. Similarly the function `st_read`

allows reading OGR compatible data into a suitable simple feature object.

Check (in Google Maps) whether the attribute data were written correctly to the KML output (use My Maps from Google). You need to be logged in on your google account, go to My Maps, click `get started`

, click the `+`

button to create a new digital map. On the untitled layer click `import`

, import the mypoints.kml, add a new layer and import myline.kml. Cool! You just created a webmap with your point and line features.

Now let us see if we can create a line in the webmap and import it to R. Digitize a route (e.g. a walking route) between two points of interest in Wageningen: click on the `draw a line`

symbol (between the place marker and the direction symbol), click `Add walking route`

, draw your route in Wageningen and name the new layer `myroute`

.

Click on the main menu in the left panel (top ... symbol), click `export to KML`

, check box `export to .KML file`

, select your route layer and download the kml (check here how to export). Save the KML in the data folder within the working directory under the name `route.kml`

. We will read this file into a simple features object.

```
## Read the KML as a simple feature
myroute <- st_read(dsn = "./data/myroute.kml")
```

```
## Reading layer `route' from data source `/home/david/Documents/Geoscripting/IntroToVector/data/myroute.kml' using driver `LIBKML'
## Simple feature collection with 3 features and 11 fields
## geometry type: GEOMETRY
## dimension: XYZ
## bbox: xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
```

`myroute #What is the geometry type of this object? `

```
## Simple feature collection with 3 features and 11 fields
## geometry type: GEOMETRY
## dimension: XYZ
## bbox: xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## Name
## 1 Directions from Churchillweg 141-185, 6708 Wageningen, Netherlands to Bloemhof, Wageningen
## 2 Churchillweg 141-185, 6708 Wageningen, Netherlands
## 3 Bloemhof, Wageningen
## description timestamp begin end altitudeMode tessellate extrude
## 1
``` 1 0
## 2 -1 0
## 3 -1 0
## visibility drawOrder icon geometry
## 1 -1 NA LINESTRING Z (5.67109 51.97...
## 2 -1 NA POINT Z (5.6710926 51.97769...
## 3 -1 NA POINT Z (5.6653834 51.97358...

`names(myroute)`

```
## [1] "Name" "description" "timestamp" "begin"
## [5] "end" "altitudeMode" "tessellate" "extrude"
## [9] "visibility" "drawOrder" "icon" "geometry"
```

```
myroute = myroute[,"Name"] # Select name column only
plot(myroute) # What geometry types do you see?
```

Now we have the data, but some of our friends want this exact data too. One friend of ours is a software engineer and he wants a GeoJSON. Another friend is a GIS-analyst in QGIS and as a backup he wants the file in two formats: Geographic Markup Language (GML) and a shapefile. These fileformats (KML, GeoJSON, GML and shapefile) are commonly used in spatial analysis. Let's try to give them the files in those formats!

```
## Try to export the simple feature to a GeoJSON. What happens?
try(st_write(myroute, dsn = "./data/route.geojson"))
```

Okay success! Let us continue with the GML.

```
## Try to export the simple feature to a GeoJSON. What happens?
try(st_write(myroute, dsn = "./data/route.gml"))
```

Works! Now transform to a shapefile.

```
## Again export to a shapefile. What happens now?
try(st_write(myroute, dsn = "./data/route.shp"))
```

Writing the myroute object to a shapefile worked only for the linestring, but did not work for the points. Shapefiles do not handle multiple geometry types well. We can solve this! Have a look at `?st_is`

.

```
## Select linestring route
myroute_line <- myroute[st_is(myroute, type = "LINESTRING"),]
## Select start and end point route
myroute_points <- myroute[st_is(myroute, type = "POINT"),]
```

```
## Let us try to write the simple features to a shapefile
try(st_write(myroute_line, dsn = "./data/route_line.shp"))
try(st_write(myroute_points, dsn = "./data/route_points.shp"))
```

We run into another issue; shapefiles do not support 3D line strings and points. We have fixed the geometry type, but now need to work on the dimensions of the data. Let us look into dimensions.

All geometries are composed of points. Points are coordinates in a 2-, 3- or 4-dimensional space. All points in a geometry have the same dimensionality. In addition to X and Y coordinates, there are two optional additional dimensions:

- a Z coordinate, denoting altitude
- a M coordinate (rarely used), denoting a measure that is associated with the point; examples could be time of measurement, or measurement error of the coordinates

The four possible cases are:

- two-dimensional points
`XY`

- three-dimensional points
`XYZ`

- three-dimensional points
`XYM`

- four-dimensional points
`XYZM`

So we need to find a function that changes dimensions from XYZ to XY. Feel free to script your own function that can change the dimensions. Otherwise look into `?st_zm`

.

```
## Convert XYZ dimension to XY dimension
myroute_points_xy <- st_zm(myroute_points)
myroute_line_xy <- st_zm(myroute_line)
## Check your geometry type and dimension. Are they point/linestring and XY?
myroute_points_xy
```

```
## Simple feature collection with 2 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: 5.665383 ymin: 51.97358 xmax: 5.671093 ymax: 51.9777
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## Name
## 2 Churchillweg 141-185, 6708 Wageningen, Netherlands
## 3 Bloemhof, Wageningen
## geometry
## 2 POINT (5.6710926 51.9776967)
## 3 POINT (5.6653834 51.9735844)
```

`myroute_line_xy`

```
## Simple feature collection with 1 feature and 1 field
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 5.66538 ymin: 51.97307 xmax: 5.67242 ymax: 51.9777
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## Name
## 1 Directions from Churchillweg 141-185, 6708 Wageningen, Netherlands to Bloemhof, Wageningen
## geometry
## 1 LINESTRING (5.67109 51.9777...
```

```
## Try to export the simple feature to a shapefile
try(st_write(myroute_points_xy, dsn = "./data/route_points.shp"))
#Overwrite your previous shapefile by adding delete_layer = TRUE
try((st_write(myroute_line_xy, dsn = "./data/route.shp", delete_layer = TRUE)))
```

Hooray, it worked! Now our friends have the data they wanted. Try to understand the above code and results. Feel free to display the data in QGIS.

Transformations between coordinate systems are crucial to many GIS applications. The **Keyhole Markup Language (KML)** used by Google Earth uses latitude and longitude in a polar WGS84 coordinate system (i.e. geographic coordinates). However, in some of the examples below we will use metric distances (i.e. projected coordinates).There are two types of coordinate systems that you need to recognise: **projected coordinate systems** and **geographic coordinate systems**.

One of the challenges of working with geo-spatial data is that geodetic locations (points on the Earth surface) are mapped into a two-dimensional cartesian plane using a cartographic projection. Projected coordinates are coordinates that refer to a point on a two-dimensional map that *represents* the surface of the Earth (i.e. **projected coordinate system**). Latitude and Longitude values are an example of an **unprojected coordinate system**. These are coordinates that directly refer to a point on the Earth's surface. One way to deal with this is by transforming the data to a planar coordinate system. In `R`

this can be achieved via bindings to the **PROJ.4 - Cartographic Projections Library** (http://trac.osgeo.org/proj/), which are available in *sf*.

Central to spatial data in the *sf* package is that they have a coordinate reference system, which is coded as a numeric epsg code and a character string describing the projection by the *PROJ.4* projection library. Operations on different spatial data sets need to have compatible coordinate reference systems (i.e., identical). An interface to the *PROJ.4* library is available through the *sf* package.

We will transform our spatial data to the Dutch grid (Rijksdriehoekstelsel), often referred to as RD or RD_New. Please note that:

- Some widely spread definitions of the Dutch grid (
**EPSG: 28992**) are incomplete (see e.g. http://www.spatialreference.org and search for the EPSG number); - The transformation used below is approximate. Details can be found at http://nl.wikipedia.org/wiki/Rijksdriehoekscoordinaten.
- The PROJ.4 details can be found here: http://www.spatialreference.org/ref/epsg/28992/proj4/

```
## Define CRS object for RD projection
# See how the transformation from unprojected WGS84 is defined to a projected coordinate system
prj_string_RD <- st_crs("+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.2369,50.0087,465.658,-0.406857330322398,0.350732676542563,-1.8703473836068,4.0812 +units=m +no_defs")
## Perform the coordinate transformation from WGS84 to RD
myroute_rd_string <- st_transform(myroute_line_xy, crs = prj_string_RD)
# Instead of having to define the projection ourselves, we can make use of the projection library by using an EPSG code
myroute_rd_num <- st_transform(myroute_line_xy, crs = 28992)
myroute_points_rd <- st_transform(myroute_points_xy, crs = 28992)
# Check if your transformed routes are the same
identical(myroute_rd_string, myroute_rd_num)
```

```
## [1] FALSE
```

Question 3:Why are your routes not identical? Hint: inspect the data

After transforming the line, plot the line using the basic plot command.

Now that the geometries are projected to a planar coordinate system, the length can be computed.

```
## Let us use different methods to calculate the distance between the two points
## Before we calculate distances we need the package geosphere
try(install.packages("geosphere"))
```

```
library(geosphere)
## Network distance: WGS84, RD_New
(dist_netw_wgs = st_length(myroute_line_xy)) # line with WGS84 CRS
```

```
## 984.3845 m
```

`(dist_netw_rd = st_length(myroute_rd_num)) # line with RD_new CRS`

```
## 984.3013 m
```

```
## Euclidean distance: WGS84, RD_New
# Get data from matrix position [1, 2]
(dist_eucl_wgs = st_distance(myroute_points_xy)[1,2])
```

```
## 602.7191 m
```

`(dist_eucl_rd = st_distance(myroute_points_rd)[1,2]) `

```
## 602.6681 m
```

Okay as expected the network distance is bigger than a straight line between the two points, but a question remains:

Question 4:Why does the distance slightly differ between objects with CRS WGS84 and RD_New?

Now we know the network and euclidean distance between the two points. As you might have noticed the `st_length()`

and `st_distance()`

functions gave a unit behind the number. It was a meter in our case, but maybe we want to know how many feet or inch the distance was. The *sf* package allows unit conversions by bindings to the *units* package.

`units::set_units(dist_netw_rd, km)`

```
## 0.9843013 km
```

`units::set_units(dist_netw_rd, feet)`

```
## 3229.335 feet
```

`units::set_units(dist_netw_rd, inch)`

```
## 38752.02 inch
```

Calculating units can also be performed for surfaces or volumes. Later you can give a try to calculate from square meters to hectares for a surface area.

After the intermezzo, we now continue with *sf* classes for polygon objects. The idea is to illustrate the classes; the data are meaningless. Let us create overlapping circles around our points.

```
## Create sfc with geometry in RD New
## We need to have projected (RD_New) data to be able to measure features in 2D (e.g. meters)
point1_sfg_rd <- st_point(c(174023.833719082, 445086.995949553))
point2_sfg_rd <- st_point(c(174655.068279838, 443855.475880421))
## Combine point features into simple feature collection with CRS
mypoints_sfc_rd <- st_sfc(point1_sfg_rd, point2_sfg_rd, crs = 28992)
## Create individual points
pnt1_rd <- mypoints_sfc_rd[[1]]
pnt2_rd <- mypoints_sfc_rd[[2]]
## Calculate distance between points
dist_points_rd <- st_distance(mypoints_sfc_rd)[1,2]
```

```
## Make circles around points, with radius equal to distance between points
## Define a series of angles going from 0 to 2pi
ang <- pi*0:200/100
# distance needs to be converted from units m class to double class
circle1x <- pnt1_rd[1]+ cos(ang) * as.double(dist_points_rd)
circle1y <- pnt1_rd[2] + sin(ang) * as.double(dist_points_rd)
circle2x <- pnt2_rd[1]+ cos(ang) * as.double(dist_points_rd)
circle2y <- pnt2_rd[2] + sin(ang) * as.double(dist_points_rd)
c1 <- cbind(circle1x, circle1y)
c2 <- cbind(circle2x, circle2y)
```

You can plot these points using basic R plot commands:

```
plot(c1, pch = 19, cex = 0.2, col = "red", ylim = range(circle1y, circle2y), xlim = range(circle1x, circle2x))
points(c2, pch = 19, cex = 0.2, col = "blue")
plot(mypoints_sfc_rd, pch = 3, col= "darkgreen", add = TRUE)
```

See how we calculated a circle around every point. Although it looks like a closed ring, the visualized circle is not a polygon. Let's try to create a circular polygon. As before we step through the process of creating a simple feature:

`Simple feature geometry (sfg)`

`Simple feature collections (sfc)`

`Dataframe (df)`

`Simple features (sf)`

```
## Bind columns together into matrix and create a list of matrix
# ?cbind combines columns and ?rbind combines rows
circle1_sfg <- st_polygon(list(cbind(circle1x, circle1y)))
circle2_sfg <- st_polygon(list(cbind(circle2x, circle2y)))
circles_sfc <- st_sfc(circle1_sfg, circle2_sfg, crs = 28992)
circles_df <- data.frame(name = c("circle1", "circle2"), row.names=c("1", "2"))
circles_sf <- st_set_geometry(circles_df, circles_sfc)
```

Just now we calculated the points of the circle to obtain a perfect circle and then made it into a polygon, but we can also create a circular polygon by using the function `st_buffer()`

.

The *sf* package can do more spatial operations by spatially selecting certain features and changing the geometry. Examples of spatial selections are `st_intersects`

, `st_contains`

, `st_within`

, `st_covers`

etc.

The package can change geometries of features with `st_difference`

(clip), `st_intersection`

(intersect), `st_union`

, `st_sym_difference`

, `st_centroid`

, `st_segmentize`

, `st_convex_hull`

and `st_buffer`

. More methods can be found with the command: `methods(class = "sf")`

.

The final results of `st_buffer`

can be plotted using basic R plotting commands:

```
## Buffer and plot
point_buff <- st_buffer(mypoints_sfc_rd, dist = dist_points_rd)
plot(point_buff, col = c("#2b8cbe", "#31a354"), lwd = 2)
plot(mypoints_sfc_rd, add = TRUE, col="red", pch=19, cex=1.5)
```

```
## Buffer again with the nquadsegs option. See the difference?
circle_quadsegs <- st_buffer(mypoints_sfc_rd[1], dist=dist_points_rd, nQuadSegs=2)
circles_diff <- st_difference(circles_sfc[1], circle_quadsegs)
plot(circles_sfc[1], col = "red")
plot(circle_quadsegs, add = TRUE, lty = 3, lwd = 2, col = "grey")
```

Question 5:What happens if you change`nQuadSegs`

to a higher number?

```
## Check areal difference
st_area(circles_diff)
```

```
## 598753.3 m^2
```

```
# What is the difference in hectares?
units::set_units(st_area(circles_diff),hectare)
```

```
## 59.87533 hectare
```

```
circles_intersect <- st_intersection(circles_sfc[1], circles_sfc[2])
plot(circles_sfc, col = "grey")
plot(circles_intersect, col="red", add = TRUE)
```

```
# What percentage of the two circles overlaps?
(paste(round(100 * st_area(circles_intersect) / st_area(circles_sfc[1]),2), "%"))
```

```
## [1] "39.1 %"
```

Question 6:Do you understand the script? What is the difference between`st_difference`

and`st_intersection`

?

We learned about:

- The geometry types of the
**sf**package: points, linestrings, polygons etc. - The data classes of the
**sf**package: sfg, sfc and sf - How to read/write data and change vector data format (KML, GML, shapefile and GeoJSON)
- Visualize spatial vector data in R and in Google Maps
- How to perform simple spatial operations on Geometries in R using the
**sf**package

First, download and unzip the following shape files manually from this website:

- Places
- Railways

Second, create a clear and documented script that:

- Selects the "industrial" (
`type == "industrial"`

) railways - Buffers the "industrial" railways with a buffer of 1000m (hint:
`st_buffer`

) - Find the place (i.e. a city) that
`intersects`

with this buffer. - Create a
`plot`

that shows the buffer, the points, and the name of the city - Write down the name of the city and the
`population`

of that city as one comment at the end of the script.

Think about project structure and use of git!

Bonus (optional!): if you can also download and unzip this within the script.