Week 2, Lesson 6: Intro to vector with R

WUR Geoscripting WUR logo

Authors: Jan Verbesselt, Sytze de Bruyn, Loïc Dutrieux, Valerio Avitabile, Dainius Masiliunas

12 January, 2017

Week 2, Lesson 6: Intro to vector with R

ToDo in the morning

Go through the whole lesson and try to answer the questions below. We will address the questions in the lesson and discuss them.

Objective of today

Learn how to handle vector data

Learning outcomes of today:

In today's lecture, we will explore the basics of handling spatial vector data in R. There are several R packages for this purpose but we will focus on using sp, rgdal, rgeos and some related packages. At the end of the lecture, you should be able to:

  • create point, line and polygon objects from scratch;
  • explore the structure of sp classes for spatial vector data;
  • plot spatial vector data;
  • transform between datums and map projections;
  • apply basic operations on vector data, such as buffering, intersection and area calculation;
  • write spatial vector data to a kml file;

Vector R basics

Some packages for working with spatial vector data in R

The packages sp and rgdal are widely used throughout this course. Both packages not only provide functionality for raster data but also for vector data.

The sp package provides classes for importing, manipulating and exporting spatial data in R, and methods for doing so. It is often the foundation for other spatial packages, such as raster.

The rgdal package includes bindings to parts of the OGR Simple Feature Library which provides access to a variety of vector file formats such as ESRI Shapefiles and kml. The OGR library is part of the widely used Geospatial Data Abstraction Library (GDAL). The GDAL library is the most useful freely-available library for reading and writing geospatial data. The GDAL library is well-documented (http://gdal.org/), but with a catch for R and Python programmers. The GDAL (and associated OGR) library and command line tools are all written in C and C++. Bindings are available that allow access from a variety of other languages including R and Python but the documentation is all written for the C++ version of the libraries. This can make reading the documentation rather challenging. Fortunately, the rgdal package, providing GDAL bindings in R, is also well documented with lots of examples. The same is valid for the Python libaries.

Similarly, rgeos is an interface to the powerful Geometry Engine Open Source (GEOS) library for all kinds of operations on geometries (buffering, overlaying, area calculations, etc.).

Thus, functionality that you commonly find in expensive GIS software is also available within R, using free but very powerful software libaries.

The possiblities are huge. In this course we can only scratch the surface with some essentials, which hopefully invite you to experiment further and use them in your research. Details can be found in the book Applied Spatial Data Analysis with R and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio. Owing to time constraints, this lecture cannot cover the related package spacetime with classes and methods for spatio-temporal data [@Bivand:2013ux].

Creating and manipulating geometries

The package sp provides classes for spatial-only geometries, such as SpatialPoints (for points), and combinations of geometries and attribute data, such as a SpatialPointsDataFrame. The following data classes are available for spatial vector data [@Edzer:2005ux]:

Overview of sp package spatial-only geometry classes.
Geometry class attribute
points SpatialPoints No
points SpatialPointsDataFrame data.frame
line Line No
lines Lines No
lines SpatialLines No
lines SpatialLinesDataFrame data.frame
rings Polygon No
rings Polygons No
rings SpatialPolygons No
rings SpatialPolygonsDataFrame data.frame

We will go through a few examples of creating geometries from scratch to familiarize yourself with these classes.

First, start Google Earth on your computer and make a note of the longitude and latitude of two points in Wageningen that are relevant to you. Use a decimal degree notation with at least 4 digits after the decimal point. To change the settings in Google Earth click Tools | Options and change the Show Lat/Long setting on the 3D View Tab.

Points: SpatialPoints, SpatialPointsDataFrame

The example below shows how you can create spatial point objects from these coordinates. Type ?function name (e.g. ?cbind) for finding help on the functions used.

Question 1: See ?CRS and have a look at the help.
## Load the sp and rgdal packages

## Coordinates of two points identified in Google Earth, for example:
pnt1_xy <- cbind(5.6660, 51.9872)   # enter your own coordinates
pnt2_xy <- cbind(5.6643, 51.9668)   # enter your own coordinates

## Combine coordinates in single matrix
coords <- rbind(pnt1_xy, pnt2_xy)

## Make spatial points object
prj_string_WGS <- CRS("+proj=longlat +datum=WGS84")
mypoints <- SpatialPoints(coords, proj4string=prj_string_WGS)
## Inspect object
## Create and display some attribute data and store in a data frame
mydata <- data.frame(cbind(id = c(1,2), 
                Name = c("my first point", 
                         "my second point")))

## Make spatial points data frame
mypointsdf <- SpatialPointsDataFrame(
  coords, data = mydata, 
class(mypointsdf) # Inspect and plot object
spplot(mypointsdf, zcol="Name", col.regions = c("red", "blue"), 
       xlim = bbox(mypointsdf)[1, ]+c(-0.01,0.01), 
       ylim = bbox(mypointsdf)[2, ]+c(-0.01,0.01),
       scales= list(draw = TRUE))
## Play with the spplot function!
## What is needed to make the following work?
spplot(mypointsdf, col.regions = c(1,2))

Question 2: What is the the difference between the objects mypoints and mypointsdf?


Now let us connect the two points by a straight line. First find information on the classes for lines that are available in sp. The goal is to create SpatialLinesDataFrame but we have to go through some other classes.

## Consult help on SpatialLines class
(simple_line <- Line(coords))
## An object of class "Line"
## Slot "coords":
##        [,1]    [,2]
## [1,] 5.6660 51.9872
## [2,] 5.6643 51.9668
(lines_obj <- Lines(list(simple_line), "1"))
## An object of class "Lines"
## Slot "Lines":
## [[1]]
## An object of class "Line"
## Slot "coords":
##        [,1]    [,2]
## [1,] 5.6660 51.9872
## [2,] 5.6643 51.9668
## Slot "ID":
## [1] "1"
(spatlines <- SpatialLines(list(lines_obj), proj4string=prj_string_WGS))
## An object of class "SpatialLines"
## Slot "lines":
## [[1]]
## An object of class "Lines"
## Slot "Lines":
## [[1]]
## An object of class "Line"
## Slot "coords":
##        [,1]    [,2]
## [1,] 5.6660 51.9872
## [2,] 5.6643 51.9668
## Slot "ID":
## [1] "1"
## Slot "bbox":
##       min     max
## x  5.6643  5.6660
## y 51.9668 51.9872
## Slot "proj4string":
## CRS arguments:
##  +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
(line_data <- data.frame(Name = "straight line", row.names="1"))
##            Name
## 1 straight line
(mylinesdf <- SpatialLinesDataFrame(spatlines, line_data))
## An object of class "SpatialLinesDataFrame"
## Slot "data":
##            Name
## 1 straight line
## Slot "lines":
## [[1]]
## An object of class "Lines"
## Slot "Lines":
## [[1]]
## An object of class "Line"
## Slot "coords":
##        [,1]    [,2]
## [1,] 5.6660 51.9872
## [2,] 5.6643 51.9668
## Slot "ID":
## [1] "1"
## Slot "bbox":
##       min     max
## x  5.6643  5.6660
## y 51.9668 51.9872
## Slot "proj4string":
## CRS arguments:
##  +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Question 3: What is the difference between Line and Lines?
spplot(mylinesdf, col.regions = "blue", 
       xlim = bbox(mypointsdf)[1, ]+c(-0.01,0.01), 
       ylim = bbox(mypointsdf)[2, ]+c(-0.01,0.01),
       scales= list(draw = TRUE))

Try to understand the above code and its results by studying help.

Try to add the points together with the lines on the same map.

Writing and reading spatial vector data using OGR

What now follows is a brief intermezzo before we continue with the classes for polygons. Let us first export the objects created as KML files that can be displayed in Google Earth. We will use the OGR functionality available through the package rgdal.

## Write to KML; below we assume a subdirectory data within the current 
#  working directory.
dir.create("data", showWarnings = FALSE) 
writeOGR(mypointsdf, file.path("data","mypointsGE.kml"), 
       "mypointsGE", driver="KML", overwrite_layer=TRUE)
writeOGR(mylinesdf, file.path("data","mylinesGE.kml"), 
       "mylinesGE", driver="KML", overwrite_layer=TRUE)

Check (in Google Earth) whether the attribute data were written to the KML output.

The function readOGR allows reading OGR compatible data into a suitable Spatial vector object. Similar to writeOGR, the function requires entries for the arguments dsn (data source name) and layer (layer name). The interpretation of these enties vary by driver. Please study details in the help file.

Digitize a path (e.g. a bicycle route) between the two points of interest you selected earlier in Google Earth. This can be achieved using the Add Path functionality of Google Earth (see here for more info). Save the path in the data folder within the working directory under the name route.kml. We will read this file into a spatial lines object and add it to the already existing SpatialLinesDataFrame object.

dsn = file.path("data","route.kml")
ogrListLayers(dsn) # To find out what the layers are
myroute <- readOGR(dsn, layer = ogrListLayers(dsn))
## Put both in single data frame
proj4string(myroute) <- prj_string_WGS
## Warning in `proj4string<-`(`*tmp*`, value = ): A new CRS was assigned to an object with an existing CRS:
## +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## without reprojecting.
## For reprojection, use function spTransform
## [1] "Name"        "Description"
myroute$Description <- NULL # delete Description
# mylinesdf <- rbind(mylinesdf, myroute)
# Note: some problems were reported with this step, see Q&A
mylinesdf <-  rbind.SpatialLines(mylinesdf, myroute)

Try to understand the above code and results. Feel free to display the data and export to Google Earth.

Transformation of coordinate system

Transformations between coordinate systems are crucial to many GIS applications. The Keyhole Markup Language (kml) used by Google Earth uses latitude and longitude in a polar WGS84 coordinate system (i.e. geographic coordinates). However, in some of the examples below we will use metric distances (i.e. carthographic coordinates).There are two types of coordinate systems that you need to recognise: projected coordinate systems and unprojected coordinates systems

One of the challenges of working with geo-spatial data is that geodetic locations (points on the Earth surface) are mapped into a two-dimensional cartesian plane using a cartographic projection. Projected coordinates are coordinates that refer to a point on a two-dimensional map that represents the surface of the Earth (i.e. projected coordinate system). Latitude and Longitude values are an example of an unprojected coordinate system. These are coordinates that directly refer to a point on the Earth's surface. One way to deal with this is by transforming the data to a planar coordinate system. In R this can be achieved via bindings to the PROJ.4 - Cartographic Projections Library (http://trac.osgeo.org/proj/), which are available in rgdal.

Central to spatial data in the sp package is that they have a coordinate reference system, which is coded in object of CRS class. Central to operations on different spatial data sets is that their coordinate reference system is compatible (i.e., identical). This CRS can be a character string describing a reference system in a way understood by the PROJ.4 projection library, or a (character) missing value. An interface to the PROJ.4 library is available only if the R package rgdal is present.

We will transform our spatial data to the Dutch grid (Rijksdriehoekstelsel), often referred to as RD. Please note that:

## Define CRS object for RD projection
prj_string_RD <- CRS("+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.2369,50.0087,465.658,-0.406857330322398,0.350732676542563,-1.8703473836068,4.0812 +units=m +no_defs")

## Perform the coordinate transformation from WGS84 to RD
mylinesRD <- spTransform(mylinesdf, prj_string_RD)

You can always plot the line using the basic plot command:

plot(mylinesRD, col = c("red", "blue"))

Now that the geometries are projected to a planar coordinate system, the length can be computed using a function from the package rgeos.

## Use rgeos for computing the length of lines 
## rgeos version: 0.3-21, (SVN revision 540)
##  GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084 
##  Linking to sp version: 1.2-3 
##  Polygon checking: TRUE
(mylinesdf$length <- gLength(mylinesRD, byid=T))
##        1        0 
## 2272.656 3201.050

Feel free to export the updated lines to Google Earth or to inspect the contents of the data slot of the object mylinesdf:

# or


Polygons with the sp package

We now continue with sp classes for polygon objects. The idea is to illustrate the classes; the data are meaningless. Let us create overlapping circles around the points you defined earlier.

## Perform the coordinate transformation from WGS84 (i.e. not a projection) to RD (projected)"
#  This step is necessary to be able to measure objectives in 2D (e.g. meters)
(mypointsRD <- spTransform(mypointsdf, prj_string_RD))
##            coordinates id            Name
## 1   (174151, 444348.4)  1  my first point
## 2 (174042.9, 442078.3)  2 my second point
pnt1_rd <- coordinates(mypointsRD)[1,]
pnt2_rd <- coordinates(mypointsRD)[2,]

## Make circles around points, with radius equal to distance between points
## Define a series of angles going from 0 to 2pi
ang <- pi*0:200/100
circle1x <- pnt1_rd[1] + cos(ang) * mylinesdf$length[1]
circle1y <- pnt1_rd[2] + sin(ang) * mylinesdf$length[1]
circle2x <- pnt2_rd[1] + cos(ang) * mylinesdf$length[1]
circle2y <- pnt2_rd[2] + sin(ang) * mylinesdf$length[1] 
c1 <- cbind(circle1x, circle1y)
c2 <- cbind(circle2x, circle2y)

You can plot everything again using basic R plot commands:

plot(c2, pch = 19, cex = 0.2, col = "red", ylim = range(circle1y, circle2y))
points(c1, pch = 19, cex = 0.2, col = "blue")
points(mypointsRD, pch = 3, col= "darkgreen")

Now, we create subsequently

  • Polygons
  • SpatialPolygons
  • SpatialPolygonsDataFrame
## Iterate through some steps to create SpatialPolygonsDataFrame object
circle1 <- Polygons(list(Polygon(cbind(circle1x, circle1y))),"1")
circle2 <- Polygons(list(Polygon(cbind(circle2x, circle2y))),"2")
spcircles <- SpatialPolygons(list(circle1, circle2), proj4string=prj_string_RD)
circledat <- data.frame(mypointsRD@data, row.names=c("1", "2"))
circlesdf <- SpatialPolygonsDataFrame(spcircles, circledat)

Similar results can be obtained using the function gBuffer of the package rgeos, as demonstrated below. Notice the use of two overlay functions from the package rgeos.

The final results can be plotted using basic R plotting commands:

plot(circlesdf, col = c("gray60", "gray40"))
plot(mypointsRD, add = TRUE, col="red", pch=19, cex=1.5)
plot(mylinesRD, add = TRUE, col = c("green", "yellow"), lwd=1.5)

Here is an example of a plot of the results which employs a few more advanced options of spplot.

spplot(circlesdf, zcol="Name", col.regions=c("gray60", "gray40"), 
       sp.layout=list(list("sp.points", mypointsRD, col="red", pch=19, cex=1.5), 
                      list("sp.lines", mylinesRD, lwd=1.5)))

Try to understand how spplot works by breaking it down in simple steps e.g.

spplot(circlesdf, zcol="Name", col.regions=c("gray60", "gray40"))

Question 4: Which plotting options do you prefer? E.g spplot or plot?

Polygon operations with rgeos (buffer, intersect, difference)

## Expand the given geometry to include the area within the specified width with specific styling options
buffpoint <- gBuffer(mypointsRD[1,], width=mylinesdf$length[1], quadsegs=2)
mydiff <- gDifference(circlesdf[1,], buffpoint)

plot(circlesdf[1,], col = "red")
plot(buffpoint, add = TRUE, lty = 3, lwd = 2, col = "blue")
gArea(mydiff) ## what is the area of the difference?
## [1] 1614821
plot(mydiff, col = "red")
myintersection <- gIntersection(circlesdf[1,], buffpoint)

plot(myintersection, col="blue")
## [1] 14608733
print(paste("The difference in area =", round(100 * gArea(mydiff) / 
                                             gArea(myintersection),2), "%"))
## [1] "The difference in area = 11.05 %"
  • Question 5: What happens if you change quadsegs to a higher number?
  • Question 6: Do you understand the script? What is the difference between gIntersection and gDifference?

Today's summary

We learned about:

  • The spatial classes of the sp package
  • How to read/write data and change data format with rgdal package (readOGR() and writeOGR())
  • Visualize spatial vector data in R and in Google Earth
  • How to perform simple operations on Geometries in R using the rgeos package

Excercise of today

First, download and unzip the following shape files manually from this website:

  • Places
  • Railways

Second, create a clear and documented script that:

  • Selects the "industrial" (type == "industrial") railways
  • Buffers the "industrial" railways with a buffer of 1000m (hint: gBuffer with byid=TRUE)
  • Find the place (i.e. a city) that intersects with this buffer.
  • Create a plot that shows the buffer, the points, and the name of the city
  • write down the name of the city and the population of that city as one comment at the end of the script.

Think about project structure and use of git!

Bonus (optional!): if you can also download and unzip this within the script.


Put your project on your GitHub account as a new repository post the clone url in the forum before tommorow!

More info

About projections and code: https://www.nceas.ucsb.edu/scicomp/recipes/projections