Authors: Jan Verbesselt, Sytze de Bruyn, Loïc Dutrieux, Valerio Avitabile, Dainius Masiliunas
12 January, 2017
Go through the whole lesson and try to answer the questions below. We will address the questions in the lesson and discuss them.
Learn how to handle vector data
In today's lecture, we will explore the basics of handling spatial vector data in R. There are several R packages for this purpose but we will focus on using
rgeos and some related packages. At the end of the lecture, you should be able to:
spclasses for spatial vector data;
The packages sp and rgdal are widely used throughout this course. Both packages not only provide functionality for raster data but also for vector data.
The sp package provides classes for importing, manipulating and exporting spatial data in R, and methods for doing so. It is often the foundation for other spatial packages, such as raster.
The rgdal package includes bindings to parts of the OGR Simple Feature Library which provides access to a variety of vector file formats such as ESRI Shapefiles and kml. The OGR library is part of the widely used Geospatial Data Abstraction Library (GDAL). The GDAL library is the most useful freely-available library for reading and writing geospatial data. The GDAL library is well-documented (http://gdal.org/), but with a catch for R and Python programmers. The GDAL (and associated OGR) library and command line tools are all written in C and C++. Bindings are available that allow access from a variety of other languages including R and Python but the documentation is all written for the C++ version of the libraries. This can make reading the documentation rather challenging. Fortunately, the rgdal package, providing GDAL bindings in R, is also well documented with lots of examples. The same is valid for the Python libaries.
Similarly, rgeos is an interface to the powerful Geometry Engine Open Source (GEOS) library for all kinds of operations on geometries (buffering, overlaying, area calculations, etc.).
Thus, functionality that you commonly find in expensive GIS software is also available within R, using free but very powerful software libaries.
The possiblities are huge. In this course we can only scratch the surface with some essentials, which hopefully invite you to experiment further and use them in your research. Details can be found in the book Applied Spatial Data Analysis with R and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio. Owing to time constraints, this lecture cannot cover the related package spacetime with classes and methods for spatio-temporal data [@Bivand:2013ux].
The package sp provides classes for spatial-only geometries, such as
SpatialPoints (for points), and combinations of geometries and attribute data, such as a
SpatialPointsDataFrame. The following data classes are available for spatial vector data [@Edzer:2005ux]:
We will go through a few examples of creating geometries from scratch to familiarize yourself with these classes.
First, start Google Earth on your computer and make a note of the longitude and latitude of two points in Wageningen that are relevant to you. Use a decimal degree notation with at least 4 digits after the decimal point. To change the settings in Google Earth click Tools | Options and change the Show Lat/Long setting on the 3D View Tab.
The example below shows how you can create spatial point objects from these coordinates. Type
?function name (e.g.
?cbind) for finding help on the functions used.
?CRSand have a look at the help.
## Load the sp and rgdal packages library(sp) library(rgdal) ## Coordinates of two points identified in Google Earth, for example: pnt1_xy <- cbind(5.6660, 51.9872) # enter your own coordinates pnt2_xy <- cbind(5.6643, 51.9668) # enter your own coordinates ## Combine coordinates in single matrix coords <- rbind(pnt1_xy, pnt2_xy) ## Make spatial points object prj_string_WGS <- CRS("+proj=longlat +datum=WGS84") mypoints <- SpatialPoints(coords, proj4string=prj_string_WGS)
## Inspect object class(mypoints) str(mypoints)
## Create and display some attribute data and store in a data frame mydata <- data.frame(cbind(id = c(1,2), Name = c("my first point", "my second point"))) ## Make spatial points data frame mypointsdf <- SpatialPointsDataFrame( coords, data = mydata, proj4string=prj_string_WGS)
class(mypointsdf) # Inspect and plot object names(mypointsdf) str(mypointsdf)
spplot(mypointsdf, zcol="Name", col.regions = c("red", "blue"), xlim = bbox(mypointsdf)[1, ]+c(-0.01,0.01), ylim = bbox(mypointsdf)[2, ]+c(-0.01,0.01), scales= list(draw = TRUE))
## Play with the spplot function! ## What is needed to make the following work? spplot(mypointsdf, col.regions = c(1,2))
Question 2: What is the the difference between the objects
Now let us connect the two points by a straight line. First find information on the classes for lines that are available in sp. The goal is to create
SpatialLinesDataFrame but we have to go through some other classes.
## Consult help on SpatialLines class (simple_line <- Line(coords))
## An object of class "Line" ## Slot "coords": ## [,1] [,2] ## [1,] 5.6660 51.9872 ## [2,] 5.6643 51.9668
(lines_obj <- Lines(list(simple_line), "1"))
## An object of class "Lines" ## Slot "Lines": ## [] ## An object of class "Line" ## Slot "coords": ## [,1] [,2] ## [1,] 5.6660 51.9872 ## [2,] 5.6643 51.9668 ## ## ## ## Slot "ID": ##  "1"
(spatlines <- SpatialLines(list(lines_obj), proj4string=prj_string_WGS))
## An object of class "SpatialLines" ## Slot "lines": ## [] ## An object of class "Lines" ## Slot "Lines": ## [] ## An object of class "Line" ## Slot "coords": ## [,1] [,2] ## [1,] 5.6660 51.9872 ## [2,] 5.6643 51.9668 ## ## ## ## Slot "ID": ##  "1" ## ## ## ## Slot "bbox": ## min max ## x 5.6643 5.6660 ## y 51.9668 51.9872 ## ## Slot "proj4string": ## CRS arguments: ## +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
(line_data <- data.frame(Name = "straight line", row.names="1"))
## Name ## 1 straight line
(mylinesdf <- SpatialLinesDataFrame(spatlines, line_data))
## An object of class "SpatialLinesDataFrame" ## Slot "data": ## Name ## 1 straight line ## ## Slot "lines": ## [] ## An object of class "Lines" ## Slot "Lines": ## [] ## An object of class "Line" ## Slot "coords": ## [,1] [,2] ## [1,] 5.6660 51.9872 ## [2,] 5.6643 51.9668 ## ## ## ## Slot "ID": ##  "1" ## ## ## ## Slot "bbox": ## min max ## x 5.6643 5.6660 ## y 51.9668 51.9872 ## ## Slot "proj4string": ## CRS arguments: ## +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
spplot(mylinesdf, col.regions = "blue", xlim = bbox(mypointsdf)[1, ]+c(-0.01,0.01), ylim = bbox(mypointsdf)[2, ]+c(-0.01,0.01), scales= list(draw = TRUE))
Try to understand the above code and its results by studying help.
Try to add the points together with the lines on the same map.
What now follows is a brief intermezzo before we continue with the classes for polygons. Let us first export the objects created as KML files that can be displayed in Google Earth. We will use the OGR functionality available through the package rgdal.
library(rgdal) ## Write to KML; below we assume a subdirectory data within the current # working directory. dir.create("data", showWarnings = FALSE) writeOGR(mypointsdf, file.path("data","mypointsGE.kml"), "mypointsGE", driver="KML", overwrite_layer=TRUE) writeOGR(mylinesdf, file.path("data","mylinesGE.kml"), "mylinesGE", driver="KML", overwrite_layer=TRUE)
Check (in Google Earth) whether the attribute data were written to the KML output.
readOGR allows reading OGR compatible data into a suitable Spatial vector object. Similar to
writeOGR, the function requires entries for the arguments
dsn (data source name) and
layer (layer name). The interpretation of these enties vary by driver. Please study details in the help file.
Digitize a path (e.g. a bicycle route) between the two points of interest you selected earlier in Google Earth. This can be achieved using the Add Path functionality of Google Earth (see here for more info). Save the path in the data folder within the working directory under the name
route.kml. We will read this file into a spatial lines object and add it to the already existing
dsn = file.path("data","route.kml") ogrListLayers(dsn) # To find out what the layers are myroute <- readOGR(dsn, layer = ogrListLayers(dsn))
## Put both in single data frame proj4string(myroute) <- prj_string_WGS
## Warning in `proj4string<-`(`*tmp*`, value =
): A new CRS was assigned to an object with an existing CRS: ## +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs ## without reprojecting. ## For reprojection, use function spTransform
##  "Name" "Description"
myroute$Description <- NULL # delete Description # mylinesdf <- rbind(mylinesdf, myroute) # Note: some problems were reported with this step, see Q&A mylinesdf <- rbind.SpatialLines(mylinesdf, myroute)
Try to understand the above code and results. Feel free to display the data and export to Google Earth.
Transformations between coordinate systems are crucial to many GIS applications. The Keyhole Markup Language (kml) used by Google Earth uses latitude and longitude in a polar WGS84 coordinate system (i.e. geographic coordinates). However, in some of the examples below we will use metric distances (i.e. carthographic coordinates).There are two types of coordinate systems that you need to recognise: projected coordinate systems and unprojected coordinates systems
One of the challenges of working with geo-spatial data is that geodetic locations (points on the Earth surface) are mapped into a two-dimensional cartesian plane using a cartographic projection. Projected coordinates are coordinates that refer to a point on a two-dimensional map that represents the surface of the Earth (i.e. projected coordinate system). Latitude and Longitude values are an example of an unprojected coordinate system. These are coordinates that directly refer to a point on the Earth's surface. One way to deal with this is by transforming the data to a planar coordinate system. In
R this can be achieved via bindings to the PROJ.4 - Cartographic Projections Library (http://trac.osgeo.org/proj/), which are available in rgdal.
Central to spatial data in the sp package is that they have a coordinate reference system, which is coded in object of
CRS class. Central to operations on different spatial data sets is that their coordinate reference system is compatible (i.e., identical). This CRS can be a character string describing a reference system in a way understood by the PROJ.4 projection library, or a (character) missing value. An interface to the PROJ.4 library is available only if the R package
rgdal is present.
We will transform our spatial data to the Dutch grid (Rijksdriehoekstelsel), often referred to as RD. Please note that:
## Define CRS object for RD projection prj_string_RD <- CRS("+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.2369,50.0087,465.658,-0.406857330322398,0.350732676542563,-1.8703473836068,4.0812 +units=m +no_defs") ## Perform the coordinate transformation from WGS84 to RD mylinesRD <- spTransform(mylinesdf, prj_string_RD)
You can always plot the line using the basic plot command:
Now that the geometries are projected to a planar coordinate system, the length can be computed using a function from the package rgeos.
## Use rgeos for computing the length of lines library(rgeos)
(mylinesdf$length <- gLength(mylinesRD, byid=T))
## 1 0 ## 2272.656 3201.050
Feel free to export the updated lines to Google Earth or to inspect the contents of the
data slot of the object
mylinesdf@data # or data.frame(mylinesdf)
We now continue with sp classes for polygon objects. The idea is to illustrate the classes; the data are meaningless. Let us create overlapping circles around the points you defined earlier.
## Perform the coordinate transformation from WGS84 (i.e. not a projection) to RD (projected)" # This step is necessary to be able to measure objectives in 2D (e.g. meters) (mypointsRD <- spTransform(mypointsdf, prj_string_RD))
## coordinates id Name ## 1 (174151, 444348.4) 1 my first point ## 2 (174042.9, 442078.3) 2 my second point
pnt1_rd <- coordinates(mypointsRD)[1,] pnt2_rd <- coordinates(mypointsRD)[2,] ## Make circles around points, with radius equal to distance between points ## Define a series of angles going from 0 to 2pi ang <- pi*0:200/100 circle1x <- pnt1_rd + cos(ang) * mylinesdf$length circle1y <- pnt1_rd + sin(ang) * mylinesdf$length circle2x <- pnt2_rd + cos(ang) * mylinesdf$length circle2y <- pnt2_rd + sin(ang) * mylinesdf$length c1 <- cbind(circle1x, circle1y) c2 <- cbind(circle2x, circle2y)
You can plot everything again using basic R plot commands:
plot(c2, pch = 19, cex = 0.2, col = "red", ylim = range(circle1y, circle2y)) points(c1, pch = 19, cex = 0.2, col = "blue") points(mypointsRD, pch = 3, col= "darkgreen")
Now, we create subsequently
## Iterate through some steps to create SpatialPolygonsDataFrame object circle1 <- Polygons(list(Polygon(cbind(circle1x, circle1y))),"1") circle2 <- Polygons(list(Polygon(cbind(circle2x, circle2y))),"2") spcircles <- SpatialPolygons(list(circle1, circle2), proj4string=prj_string_RD) circledat <- data.frame(mypointsRD@data, row.names=c("1", "2")) circlesdf <- SpatialPolygonsDataFrame(spcircles, circledat)
Similar results can be obtained using the function gBuffer of the package rgeos, as demonstrated below. Notice the use of two overlay functions from the package rgeos.
The final results can be plotted using basic R plotting commands:
plot(circlesdf, col = c("gray60", "gray40")) plot(mypointsRD, add = TRUE, col="red", pch=19, cex=1.5) plot(mylinesRD, add = TRUE, col = c("green", "yellow"), lwd=1.5) box()
Here is an example of a plot of the results which employs a few more advanced options of
spplot(circlesdf, zcol="Name", col.regions=c("gray60", "gray40"), sp.layout=list(list("sp.points", mypointsRD, col="red", pch=19, cex=1.5), list("sp.lines", mylinesRD, lwd=1.5)))
Try to understand how
spplot works by breaking it down in simple steps e.g.
spplot(circlesdf, zcol="Name", col.regions=c("gray60", "gray40"))
Question 4: Which plotting options do you prefer? E.g
library(rgeos) ## Expand the given geometry to include the area within the specified width with specific styling options buffpoint <- gBuffer(mypointsRD[1,], width=mylinesdf$length, quadsegs=2) mydiff <- gDifference(circlesdf[1,], buffpoint) plot(circlesdf[1,], col = "red") plot(buffpoint, add = TRUE, lty = 3, lwd = 2, col = "blue")
gArea(mydiff) ## what is the area of the difference?
##  1614821
plot(mydiff, col = "red")
myintersection <- gIntersection(circlesdf[1,], buffpoint) plot(myintersection, col="blue")
##  14608733
print(paste("The difference in area =", round(100 * gArea(mydiff) / gArea(myintersection),2), "%"))
##  "The difference in area = 11.05 %"
quadsegsto a higher number?
We learned about:
First, download and unzip the following shape files manually from this website:
Second, create a clear and documented script that:
type == "industrial") railways
intersectswith this buffer.
plotthat shows the buffer, the points, and the name of the city
populationof that city as one comment at the end of the script.
Think about project structure and use of git!
Bonus (optional!): if you can also download and unzip this within the script.
Put your project on your GitHub account as a new repository post the clone url in the forum before tommorow!