Raster Vector integration in R

WUR Geoscripting WUR logo

Today

  • Morning: Self-study & answer the questions within the lesson
  • 13:30-14:30: Questions and clarifications
  • Rest of the afternoon: Do/finalise the exercise.

Today's learning objectives

At the end of the lecture, you should be able to:

  • Have an overview of what can be achieved when combining raster and vector data in R
  • Transform simple feature objects of the sf package into spatial object of the sp package
  • Be able to extract raster data using vector data

Introduction

In the previous lectures we saw how to deal with raster data using R, as well as how to deal with vector data and the multiple classes of the sf package.

In this previous vector tutorial we used sf, but another popular package that handles vector data in R is sp. Sp is similar to sf, but it is able to work with vector data and raster data through the raster package. In the current lesson, we'll see what can be done when the two worlds of vector data and raster data intersect. Secondly we will learn how to change sf objects to sp objects.

Working with sp

The package sp provides classes for spatial-only geometries, such as SpatialPoints (for points), and combinations of geometries and attribute data, such as a SpatialPointsDataFrame. These data classes are very similar to data classes in sf package. The following data classes are available for spatial vector data in sp package [@Edzer:2005ux]:

Overview of sp package spatial-only geometry classes.
Geometry class attribute
points SpatialPoints No
points SpatialPointsDataFrame data.frame
line Line No
lines Lines No
lines SpatialLines No
lines SpatialLinesDataFrame data.frame
rings Polygon No
rings Polygons No
rings SpatialPolygons No
rings SpatialPolygonsDataFrame data.frame

The functions of the sp package work together with functions in rgdal, rgeos and raster package to do format changes, geomety selections or transformations and raster/vector format changes.

Conversions

You will find some utilities in R to convert data from raster to vector format and vice-versa. However, whenever you start converting objects, you should wonder whether you are taking the right approach to solve your problem. An approach that does not involve converting your data from vector to raster or the opposite should almost always be preferred.

As a result, because these functions are only useful for some very particular situations, I only give a brief description of them below.

Vector to raster

There is one function that allows to convert an object in vector to a raster object. It is the rasterize() function.

Raster to vector

Three functions allow to convert raster data to vector; the rasterToPoints(), rasterToContour(), and rasterToPolygons() functions. The latter can be useful to convert the result of a classification. In that case, set dissolve = to TRUE so that the polygons with the same attribute value will be dissolved into multi-polygon regions. This option requires the rgeos package. Note: These methods are known to perform poorly under R. Calling gdal_translate directly or through the gdalUtils package can be much faster.

Geometric operations

Raw raster data do not usually conform to any notion of administrative or geographical boundaries. Vector data (and extents) can be used to mask or crop data to a desired region of interest.

Crop

Cropping consists in reducing the extent of a spatial object to a smaller extent. As a result, the output of crop() will automatically be rectangular and will not consider any features such as polygons to perform the subsetting. It is often useful to crop data as tightly as possible to the area under investigation to reduce the amount of data and have a more focused view when visualizing the data.

Crop uses objects of class extent to define the new extent, or any object that can be coerced to an extent (see ?extent for more info on this). This means that practically all spatial objects (raster or vector) can be used directly in crop. Considering two raster objects r1 and r2 with r2 smaller than r1, you can simply use crop(r1, r2) in order to crop r1 to the extent of r2.

You can easily define an extent interactively (by clicking) thanks to the drawExtent() function.

Mask

mask() can be used with almost all spatial objects to mask (= set to NA) values of a raster object. When used with a SpatialPolygon object, mask will keep values of the raster overlayed by polygons and mask the values outside of polygons.

Note the very useful inverse= argument of mask(), which allows to mask the inverse of the area covered by the features. We will use this feature of mask later in the tutorial to exclude water areas of a raster, defined in an independent SpatialPolygons object.

Extract

The most common operation when combining vector and raster data is the extraction. It simply consists in extracting the values of a raster object for locations specified by a vector object. The object can be one of the class of the sp package, or an extent object.

When using extract() with SpatialPolygons or SpatialLines, individual features of the vector layer may overlay or intersect several pixels. In that case a function (fun =) can be used to summarize the values into one. Note that although most often the function min, max, mean and median are used for the spatial aggregation, also any custom-made function can be used. extract() provides many options for the return object, such as a data frame or a new sp object with extracted values appended to the attribute table. See ?extract() example section.

Examples

A simple land cover classification of Wageningen from Landsat 8 data

In the example below we will do a simple supervised land cover classification of Wageningen. The example uses the same data as you used in the exercise of the raster lesson.

Step by step we will:

  • Download the Landsat 8 data of Wageningen
  • Download and prepare administrative boundary data of the Netherlands
  • Download Water area data of Wageningen
  • Mask the data to match the boundaries of the city
  • Mask the data to exclude water bodies
  • Build a calibration dataset using Google Maps
  • Export the KML file and import it in R
  • Extract the surface reflectance values for the calibration pixels
  • Calibrate a model with the classifier
  • Predict the land cover using a Landsat image

Prepare the data

library(raster)
## Download, unzip and load the data
download.file(url = 'https://raw.githubusercontent.com/GeoScripting-WUR/VectorRaster/gh-pages/data/landsat8.zip', destfile = 'landsat8.zip', method = 'auto')

unzip('landsat8.zip')
## Identify the right file
landsatPath <- list.files(pattern = glob2rx('LC8*.grd'), full.names = TRUE)

wagLandsat <- brick(landsatPath)
## Loading required package: sp

We can start by visualizing the data. Since it is a multispectral image, we can use plotRGB() to do so.

# plotRGB does not support negative values, so they need to be removed
wagLandsat[wagLandsat < 0] <- NA
plotRGB(wagLandsat, 5, 4, 3)
## Download municipality boundaries
nlCity <- raster::getData('GADM',country='NLD', level=2)
class(nlCity)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
## Investigate the structure of the object
head(nlCity@data)
##   OBJECTID ID_0 ISO      NAME_0 ID_1  NAME_1 ID_2        NAME_2   HASC_2
## 1        1  158 NLD Netherlands    1 Drenthe    1   Aa en Hunze NL.DR.AH
## 2        2  158 NLD Netherlands    1 Drenthe    2         Assen NL.DR.AS
## 3        3  158 NLD Netherlands    1 Drenthe    3 Borger-Odoorn NL.DR.BO
## 4        4  158 NLD Netherlands    1 Drenthe    4     Coevorden NL.DR.CO
## 5        5  158 NLD Netherlands    1 Drenthe    5     De Wolden NL.DR.DW
## 6        6  158 NLD Netherlands    1 Drenthe    6         Emmen NL.DR.EM
##   CCN_2 CCA_2   TYPE_2    ENGTYPE_2 NL_NAME_2 VARNAME_2
## 1    NA       Gemeente Municipality                    
## 2    NA       Gemeente Municipality                    
## 3    NA       Gemeente Municipality                    
## 4    NA       Gemeente Municipality                    
## 5    NA       Gemeente Municipality                    
## 6    NA       Gemeente Municipality

It seems that the municipality names are in the NAME_2 column. So we can subset the SpatialPolygonsDataFrame to the city of Wageningen alone. To do so we can use simple data frame manipulation/subsetting syntax.

nlCity@data <- nlCity@data[!is.na(nlCity$NAME_2),] # Remove rows with NA
wagContour <- nlCity[nlCity$NAME_2 == 'Wageningen',]

We can use the resulting wagContour object, to mask the values out of Wageningen, but first, since the two objects are in different coordinate systems, we need to reproject the projection of one to the other.

** Question 1 **: Would you rather reproject a raster or a vector layer? Give two reasons why you would choose to reproject a raster or vector.

## Load rgdal library (needed to reproject data)
library(rgdal)
wagContourUTM <- spTransform(wagContour, CRS(proj4string(wagLandsat)))

Now that the two objects are in the same CRS, we can do the masking and visualize the result. Let's first crop and then mask, to see the difference.

wagLandsatCrop <- crop(wagLandsat, wagContourUTM)
wagLandsatSub <- mask(wagLandsat, wagContourUTM)

## Set graphical parameters (one row and two columns)
opar <- par(mfrow=c(1,2))
plotRGB(wagLandsatCrop, 5, 4, 3)
plotRGB(wagLandsatSub, 5, 4, 3)
plot(wagContourUTM, add = TRUE, border = "green", lwd = 3)
## Reset graphical parameters
par(opar)

In the figure above, the left panel displays the output of crop, while the second panel shows the result of masking the Landsat scene using the contour of Wageningen as input.

We also have a water mask of Wageningen in vector format. Let's download it and also reproject it to the CRS of the Landsat data.

download.file(url = 'https://raw.githubusercontent.com/GeoScripting-WUR/VectorRaster/gh-pages/data/wageningenWater.zip', destfile = 'wageningenWater.zip', method = 'auto')
unzip('wageningenWater.zip')
## Check the names of the layers for input in readOGR()
ogrListLayers('Water.shp')
water <- readOGR('Water.shp', layer = 'Water')
waterUTM <- spTransform(water, CRS(proj4string(wagLandsat)))
## OGR data source with driver: ESRI Shapefile 
## Source: "data/Water.shp", layer: "Water"
## with 632 features
## It has 32 fields
## Integer64 fields read as strings:  TOP10_ID HOOGTENIVE TDN_CODE VISUALISAT

Note the use of inverse = TRUE in the code below, to mask the pixels that intersect with the features of the vector object.

wagLandsatSubW <- mask(wagLandsatSub, mask = waterUTM, inverse = TRUE)
plotRGB(wagLandsatSubW, 5, 4, 3)
plot(waterUTM, col = 'blue', add = TRUE, border = 'blue', lwd = 2)

For the rest of the example, we'll use the wagLandsatCrop object, for I have a few doubts about the spatial accuracy of the two vector layers we used in the masking steps. You can check for yourself by converting them to KML and opening them in Google My Maps. (Let us know during the lesson, what do you think? Any solutions?)

Build a calibration dataset in Google Maps

Below we'll see how we can deal with a calibration dataset in R, for calibrating a classifier. It involves working with SpatialPointsDataFrame classes, extracting reflectance data for the corresponding calibration samples, manipulating data frames and building a model that can be used to predict land cover. But first we need to build the calibration dataset (ground truth), and for that we will use Google Maps.

Open Google My Maps, click get started, login on your google account, create a new map by clicking on the + sign, name the map training_landcover, name the untitled layer landcover_points, change the basemap to a satellite map, click add marker, draw points on top of a few landcover types, and name the point as the land cover type. Try to keep your points close to Wageningen (otherwise they are out of the extent of the landsat tile). Keep it to a few classes, such as agric, forest, water, urban. When you are done (15 - 30 points), export the file to kml.

Note: in the approach described above, we get to decide where the calibration samples are. Another approach would be to automatically generate randomly distributed samples. This can be done very easily in R using the sampleRandom() function, which automatically returns a SpatialPoints object of any given size. In the family of spatial sampling functions, there is also sampleRegular() (for regular sampling) and sampleStratified(), which can be used on a categorical raster (e.g. a land cover classification), to ensure that all classes are equally represented in the sample.

Transform sf object into sp object

Until now in this tutorial we worked with sp vector objects. But in the previous tutorial we learnt how to use sf functionality. When combining raster and vector data with functions such as extract, mask and crop, you need to work with sp and raster packages instead of sf. Let's transform a sf object into a sp object.

Load the newly created KML file using the st_read() function.

library(sf)
## Linking to GEOS 3.5.0, GDAL 2.1.3, proj.4 4.9.2, lwgeom 2.2.1 r14555
samples <- st_read(dsn = './data/sampleLandcover.kml')
## Reading layer `landcover_points' from data source `/home/david/Documents/Geoscripting/VectorRaster/data/sampleLandcover.kml' using driver `LIBKML'
## Simple feature collection with 22 features and 11 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 5.626974 ymin: 51.95011 xmax: 5.70611 ymax: 52.00109
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

Okay nice the data has been read as a simple feature. Re-project the object to the CRS of the Landsat data and then change from simple feature point object to a spatial points data frame with as(). The as() function is a simple function to coerce an object to belong to a class.

## Re-project SpatialPointsDataFrame
samples_utm <- st_transform(samples, crs = proj4string(wagLandsat))

## Convert simple feature from *sf* package 
# to spatial feature of *sp* package
samples_utm_sp <- as(samples_utm, "Spatial")
## Check if the classes are different
class(samples_utm)
## [1] "sf"         "data.frame"
class(samples_utm_sp)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
## Check if the markers are on top of the cropped satellite image
plot(samples_utm_sp, col = "red", pch = 4)
plot(wagLandsatCrop$band1, alpha = 0.5, legend = F, add = T)
# alpha arguments sets transparency on a scale of 0.0 to 1.0

Luckily all the points are within the boundaries of the cropped satellite image. If some points fall outside, remove those markers in your KML in google maps and add some new ones.

Calibrate the classifier

# The extract function does not understand why the object would have 3 coord columns, so we need to edit this field
samples_utm_sp@coords <- coordinates(samples_utm_sp)[,-3]

## Extract the surface reflectance 
calib <- extract(wagLandsatCrop, samples_utm_sp, df=TRUE) ## df=TRUE i.e. return as a data.frame
## Combine the newly created dataframe to the description column of the calibration dataset
calib2 <- cbind(samples_utm_sp$Name, calib)
## Change the name of the first column, for convenience
colnames(calib2)[1] <- 'lc'
## Inspect the structure of the dataframe
str(calib2)
## 'data.frame':    22 obs. of  9 variables:
##  $ lc   : Factor w/ 4 levels "Agriculture",..: 3 3 3 4 4 4 1 1 1 1 ...
##  $ ID   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ band1: int  492 379 279 146 181 147 102 367 104 134 ...
##  $ band2: int  542 438 358 210 235 208 134 404 130 160 ...
##  $ band3: int  741 662 585 361 357 345 448 528 404 474 ...
##  $ band4: int  884 765 513 229 217 199 172 648 157 207 ...
##  $ band5: int  1944 1447 2529 216 151 132 5473 1607 5587 5129 ...
##  $ band6: int  1820 1543 1701 118 48 37 1494 1908 1586 1600 ...
##  $ band7: int  1455 1437 1358 82 29 22 631 1318 670 691 ...

Note: the use of df = TRUE in the extract() call is so that we get a data frame in return. Data frame is the most common class to work with all types of models, such as linear models (lm()) or random forest models as we use later.

Now we will calibrate a random forest model using the extracted data frame. Do not focus too much on the algorithm used, the important part for this tutorial is the data extraction and the following data frame manipulation. More details will come about random forest classifiers tomorrow.

if(!require(randomForest)) {
  install.packages("randomForest")
}
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
library(randomForest)
## Calibrate model
model <- randomForest(lc ~ band1 + band2 + band3 + band4 + band5 + band6 + band7, data = calib2)
## Use the model to predict land cover
lcMap <- predict(wagLandsatCrop, model = model)

Let's visualize the output. The function levelplot() from the rasterVis package is a convenient function to plot categorical raster data.

library(rasterVis)
## Loading required package: lattice
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
levelplot(lcMap, main = "Landcover Map of Wageningen", col.regions = c('lightgreen', 'darkgreen', 'orange', 'blue'))

OK, we've seen better land cover maps of Wageningen, but given the amount of training data we used (22 in my case), it is not too bad. A larger calibration dataset would certainly result in a better accuracy.

Extract raster values along a transect

Another use of the extract() function can be to visualize or analyse data along transects. In the following example, we will run a transect across Belgium and visualize the change in elevation.

Let's first download the elevation data of Belgium, using the getData() function of the raster package.

## Download data
bel <- getData('alt', country='BEL', mask=TRUE)
## Display metadata
bel
## class       : RasterLayer 
## dimensions  : 264, 480, 126720  (nrow, ncol, ncell)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 2.5, 6.5, 49.4, 51.6  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : /home/david/Documents/Geoscripting/VectorRaster/BEL_msk_alt.grd 
## names       : BEL_msk_alt 
## values      : -105, 691  (min, max)

bel is a RasterLayer.

We can start by visualizing the data.

plot(bel)

Everything seems correct.

We want to look at a transect, which we can draw by hand by selecting two points by clicking. The drawLine() function will help us do that. Once you run the function, you will be able to click in the plotting window of R (The bel object should already be present in the plot panel before running drawLine()). Click Finish or right-click on the plot once you have selected the two extremities of the line.

line <- drawLine()