Loïc Dutrieux, Jan Verbesselt, Dainius Masiliunas and David Swinkels

2024-09-18

WUR Geoscripting WUR logo

Vector - Raster

Learning objectives

  • Plot spatial vector and raster data;
  • Transform vector and raster data;
  • Write and read spatial vector formats (e.g. KML, GML, GeoJSON and Shapefile);
  • Apply basic operations on vector data, such as masking and cropping;
  • Be able to extract raster data using vector data.

Introduction

In the previous tutorials we saw how to deal with raster data using R. This tutorial is an introduction on how to handle vector data in R, as well as how to handle the combination of vector and raster data.

Some packages for working with vector and raster data in R

In this tutorial we will use the sf package. This package focuses solely on vector data. It provides a standardized encoding of vector data and uses GDAL to read and write data, GEOS for geometrical operations and PROJ for projection conversions and datum transformations.

The GDAL library is well-documented (http://gdal.org/), but with a catch for R and Python programmers. The GDAL (and associated OGR) library and command line tools are all written in C and C++. Bindings are available that allow access from a variety of other languages including R and Python but the documentation is all written for the C++ version of the libraries. This can make reading the documentation rather challenging. Fortunately, the sf package, providing GDAL bindings in R, is also well documented with lots of examples. The same is valid for the Python libaries.

Thus, functionality that you commonly find in expensive GIS software is also available within R, using free but very powerful software libraries. Here is handy ‘cheatsheet’ for spatial operations with sf. The functions of the sf package are prefixed by st_, short for ‘spatial type’.

The possibilities are huge. In this course we can only scratch the surface with some essentials, which hopefully invites you to experiment further and use them in your research. Details can be found in the book Applied Spatial Data Analysis with R and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio. This book can be accessed for free through the following link!

Raster and vector integration and conversion

Historically, the first package handling vectors in R was the sp package, and the first package handling rasters was the raster package. They had cross-integration so that you could perform operations such as cropping a raster by a vector or extracting raster information over a vector location. However, the sp package got deprecated and sf was made as its successor, with much easier handling of the data (as a regular data.frame). Many of the packages that previously handled sp objects, including raster, got updated to also handle sf objects. Next, the stars package was developed by the creators of the sf package as a means of having multidimensional rasters. However, the older raster package was still the go-to solution for raster handling in R. Finally, as you noticed in a previous tutorial, in 2020 the raster package was deprecated and replaced with the terra package, which is a much faster C++ version of raster, but it also includes its own definition of vector data.

The result is that some packages provide or work with sp, sf and terra vectors, and some provide or work with raster, stars and terra rasters. This can be quite confusing. Therefore, in the course we currently use only the sf package for handling vectors and the terra package for handling rasters. The workflow is then to convert any object that is not sf or terra into sf or terra, do processing in sf and/or terra, and optionally convert the objects back if integration with some other package is necessary (e.g. sf to SpatVector for use in terra in some cases). Below is a matrix showing how to convert the various objects.

From/To sf terra raster
sp SpatialPoints/SpatialLines/SpatialPolygons st_as_sf() vect() None
sf sf data.frame None vect() None
terra SpatVector st_as_sf() None None
raster RasterLayer/RasterStack/RasterBrick None rast() None
stars stars None rast() None
terra SpatRaster None None brick()
raster extent st_bbox() ext() None
sf bbox None ext() extent()
terra SpatExtent st_bbox() None None so far

The matrix above shows lossless conversions, also known as casting or coercing data types. You can also convert data from raster to vector format and vice-versa. However, whenever you start converting between rasters and vectors, you should wonder whether you are taking the right approach to solve your problem. An approach that does not involve converting your data from vector to raster or the opposite should almost always be preferred.

As a result, because these functions are only useful for some very particular situations, we only give a brief description of them below.

  • Vector to raster:
    • There is one function that allows to convert a vector object to a raster object. It is the rasterize() function.
  • Raster to vector:
    • Three functions allow to convert raster data to vector: as.points(), as.contour() and as.polygons() produce SpatVector (terra) objects. The conversion to polygons can be useful to convert the result of a classification. In that case, set dissolve = to TRUE so that the polygons with the same attribute value will be dissolved into multi-polygon regions.

Geometric operations

Raw raster data do not usually conform to any notion of administrative or geographical boundaries. Vector data (and extents) can be used to mask or crop data to a desired region of interest.

  • Crop:
    • Cropping consists in reducing the extent of a spatial object to a smaller extent. As a result, the output of crop() will automatically be rectangular and will not consider any features such as polygons to perform the subsetting. It is often useful to crop data as tightly as possible to the area under investigation to reduce the amount of data and have a more focused view when visualizing the data.
    • crop() uses objects of class SpatExtent to define the new extent, or any object that can be coerced to an extent (see ?ext for more info on this). This means that practically all spatial objects (raster or vector) can be used directly in crop. Considering two rasters r1 and r2 with r2 smaller than r1, you can simply use crop(r1, r2) in order to crop r1 to the extent of r2.
    • You can easily define an extent interactively (by clicking) thanks to the draw() function.
  • Mask:
    • mask() can be used with almost all spatial objects to mask (= set to NA) values of a raster object. When used with a polygon object, mask() will keep values of the raster overlayed by polygons and mask the values outside of polygons.
    • Note the very useful inverse = argument of mask(), which allows to mask the inverse of the area covered by the features. We will use this feature of mask later in the tutorial to exclude water areas of a raster, defined in an independent polygon object.
  • Buffer:
    • buffer() Calculate a buffer around all cells that are not NA in a SpatRaster, or around the geometries of a SpatVector (currently only implemented for points).
    • Note that the distance unit of the buffer width parameter is meters if the CRS is (+proj=longlat), and in map units (typically also meters) if not.
    • See ?buffer for more info.
  • Extract:
    • The most common operation when combining vector and raster data is the extraction, using the function extract(). It simply consists in extracting the values of a raster object for locations specified by a vector object (e.g. points).
    • When using extract() with polygons or lines, individual features of the vector layer may overlay or intersect several pixels. In that case a function (fun =) can be used to summarize the values into one. Note that although most often the functions min, max, mean or median are used for the spatial aggregation, also any custom-made function can be used. The result is a matrix of raster values at the given locations. See ?extract() example section.

Visualise spatial vector layer together with a Landsat 8 image from Wageningen

Step by step we will:

  1. Download the Landsat 8 data of Wageningen;
  2. Download and prepare administrative boundary data of the Netherlands;
  3. Download water area data of Wageningen;
  4. Mask the data to match the boundaries of the city;
  5. Mask the data to exclude water bodies.

Prepare the data

# Install necessary packages and load them
if(!"terra" %in% installed.packages()){install.packages("terra")}
if(!"sf" %in% installed.packages()){install.packages("sf")}

library(terra)
## terra 1.7.46
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
# Create data directory if it does not yet exist
if (!dir.exists("data")) {
  dir.create("data")
}

# Download to data directory
download.file(url = 'https://github.com/GeoScripting-WUR/VectorRaster/releases/download/tutorial-data/landsat8.zip', destfile = 'data/landsat8.zip')

# Unzip to data directory
unzip('data/landsat8.zip', exdir = "data")

# Identify the right file
landsatPath <- list.files(path = "data/", pattern = glob2rx('LC8*.tif'), full.names = TRUE)
wagLandsat <- rast(landsatPath)

We can start by visualizing the data. Since it is a multispectral image, we can use plotRGB() to do so.

# plotRGB does not support negative values, so we remove them
wagLandsat[wagLandsat < 0] <- NA

# Band names can be changed here
names(wagLandsat) <- c("band1", "band2", "band3", "band4", "band5", "band6", "band7")

# Select which bands to assign to the red, green, and blue colour channels
plotRGB(wagLandsat, 5, 4, 3)

We have chosen to visualize the Landsat image as a false color composite, meaning that the chosen bands do not match the RGB channels. Indeed, we have plotted the near-infrared band as red, the red as green, and the green as blue.

# Download and unzip the Dutch municipalities boundaries
download.file(url = 'https://github.com/GeoScripting-WUR/Scripting4GeoIntro/releases/download/data/data.zip', destfile = 'data/GADM.zip')
unzip('data/GADM.zip', exdir = "data")

# Load level 2 dataset
nlCitySf <- st_read("data/gadm41_NLD_2.json")
## Reading layer `gadm41_NLD_2' from data source 
##   `C:\Users\tijug\Desktop\geoscripting\VectorRaster\data\gadm41_NLD_2.json' 
##   using driver `GeoJSON'
## Simple feature collection with 355 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 3.3608 ymin: 50.7235 xmax: 7.227 ymax: 53.5546
## Geodetic CRS:  WGS 84
# Investigate the structure of the object
head(nlCitySf)
## Simple feature collection with 6 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 6.2237 ymin: 52.6132 xmax: 7.0923 ymax: 53.0942
## Geodetic CRS:  WGS 84
##       GID_2 GID_0     COUNTRY   GID_1  NAME_1 NL_NAME_1        NAME_2 VARNAME_2
## 1 NLD.1.1_1   NLD Netherlands NLD.1_1 Drenthe        NA     AaenHunze        NA
## 2 NLD.1.2_1   NLD Netherlands NLD.1_1 Drenthe        NA         Assen        NA
## 3 NLD.1.3_1   NLD Netherlands NLD.1_1 Drenthe        NA Borger-Odoorn        NA
## 4 NLD.1.4_1   NLD Netherlands NLD.1_1 Drenthe        NA     Coevorden        NA
## 5 NLD.1.5_1   NLD Netherlands NLD.1_1 Drenthe        NA      DeWolden        NA
## 6 NLD.1.6_1   NLD Netherlands NLD.1_1 Drenthe        NA         Emmen        NA
##   NL_NAME_2   TYPE_2    ENGTYPE_2 CC_2   HASC_2                       geometry
## 1        NA Gemeente Municipality   NA NL.DR.AH MULTIPOLYGON (((6.5699 52.9...
## 2        NA Gemeente Municipality   NA NL.DR.AS MULTIPOLYGON (((6.6408 53.0...
## 3        NA Gemeente Municipality   NA NL.DR.BO MULTIPOLYGON (((6.7457 52.8...
## 4        NA Gemeente Municipality   NA NL.DR.CO MULTIPOLYGON (((6.8716 52.6...
## 5        NA Gemeente Municipality   NA NL.DR.DW MULTIPOLYGON (((6.2732 52.6...
## 6        NA Gemeente Municipality   NA NL.DR.EM MULTIPOLYGON (((7.0923 52.8...

It seems that the municipality names are in the NAME_2 column. So we can subset the sf data.frame to the city of Wageningen alone. To do so we can use simple data frame manipulation/subsetting syntax.

# Remove rows with NA
nlCitySf <- nlCitySf[!is.na(nlCitySf$NAME_2),]

# Filter Wageningen
wagContour <- nlCitySf[nlCitySf$NAME_2 == 'Wageningen',]

We can use the resulting wagContour object, to mask the values out of Wageningen, but first, since the two objects are in different coordinate systems, we need to reproject the projection of one to the other.

Question 1: Would you rather reproject a raster or a vector layer? Give two reasons why you would choose to reproject a raster or vector.

# Get the target CRS from the wagLandsat raster object
targetCRS <- st_crs(wagLandsat)

# Use sf to transform wagContour to the same projection as wagLandsat
wagContourUTM <- st_transform(wagContour, targetCRS)

Now that the two objects are in the same CRS, we can do the masking and visualize the result. Let’s first crop and then mask, to see the difference.

Crop, mask and visualise

wagLandsatCrop <- crop(wagLandsat, wagContourUTM)
wagLandsatSub <- mask(wagLandsat, wagContourUTM)

# Set graphical parameters (one row and two columns)
opar <- par(mfrow = c(1, 2))
plotRGB(wagLandsatCrop, 5, 4, 3)
plotRGB(wagLandsatSub, 5, 4, 3)
plot(st_geometry(wagContourUTM), add = TRUE, border = 'green', col = 'transparent', lwd = 3) # set fill colour to transparent

# Reset graphical parameters
par(opar)

In the figure above, the left panel displays the output of crop, while the second panel shows the result of masking the Landsat scene using the contour of Wageningen as input.

We also have a water mask of Wageningen in vector format. Let’s download it and also reproject it to the CRS of the Landsat data.

Important functions are st_read and st_write. These are very powerful functions that enable reading and writing simple features or layers from a file or database.

# Download and unzip files
download.file(url = 'https://github.com/GeoScripting-WUR/VectorRaster/releases/download/tutorial-data/wageningenWater.zip', destfile = 'data/wageningenWater.zip')
unzip('data/wageningenWater.zip', exdir = "data")

# Load the Water Shapefile directly as an sf object
water <- st_read('data/Water.shp')
## Reading layer `Water' from data source 
##   `C:\Users\tijug\Desktop\geoscripting\VectorRaster\data\Water.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 632 features and 32 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 156404.9 ymin: 437540.3 xmax: 181084.9 ymax: 448260.9
## Projected CRS: Amersfoort / RD New
names(water) # check attribute columns
##  [1] "IDENTIFICA" "TOP10_ID"   "BREEDTE"    "BREEDTEKLA" "BRUGNAAM"  
##  [6] "FUNCTIE"    "FYSIEKVOOR" "FYSIEKVO_1" "HOOFDAFWAT" "NAAM_FR_CS"
## [11] "NAAM_NL_CS" "SCHEEPSLAA" "SLUISNAAM"  "STROOMRICH" "TYPEINFRAS"
## [16] "TYPEWATER"  "VOORKOMENW" "HOOGTENIVE" "STATUS"     "DIMENSIE"  
## [21] "OBJECTBEGI" "OBJECTEIND" "VERSIEBEGI" "VERSIEEIND" "BRONACTUAL"
## [26] "BRONBESCHR" "BRONNAUWKE" "BRONTYPE"   "TDN_CODE"   "VISUALISAT"
## [31] "GEOMETRIE_" "GEOMETRI_1" "geometry"
# Transform the water object to match the projection of our other data
waterUTM <- st_transform(water, targetCRS)

Note the use of inverse = TRUE in the code below, to mask the pixels that intersect with the features of the vector object.

# Mask the Landsat data
wagLandsatSubW <- mask(wagLandsatSub, mask = waterUTM, inverse = TRUE)

# Plot
plotRGB(wagLandsatSubW, 5, 4, 3)
plot(st_geometry(waterUTM), col = 'lightblue', add = TRUE, border = '#3A9AF0', lwd = 1) # use a site such as https://htmlcolorcodes.com/ to find the hexidecimal code for colours

Also, some of our friends want these exact data too (e.g. the water polygon object).

One friend of ours is a software engineer and he wants a GeoJSON. Another friend is a GIS-analyst in QGIS and as a backup he wants the file in Geographic Markup Language (GML). These fileformats (GeoJSON and GML, but also KML and Shapefile) are commonly used in spatial analysis. Let’s try to give them the files in those formats!

You can try for yourself and e.g. start by converting them to KML and opening them in Google My Maps.

# Create output directory if it does not yet exist
if (!dir.exists("output")) {
  dir.create("output")
}

# Export the simple feature to a KML. 
try(st_write(water, "output/water.kml", driver = "kml", delete_dsn = ifelse(file.exists("output/water.kml"), TRUE, FALSE)))

# Note that the code above checks for existence of the KML file and sets the delete_dsn option to true or false accordingly

You can now open the created water.kml file in Google My Maps or Google Earth Pro. You can also try out other formats like e.g. GeoJSON. Another option for visualisation is an interactive map using mapview, which in its turn is based on leaflet. The output of the simple example below can be viewed here.

# Install and load library
if(!"mapview" %in% installed.packages()){install.packages("mapview")}
if(!"webshot" %in% installed.packages()){install.packages("webshot")}
library(mapview)

# Plot the water polygons
map <- mapview(water)
map

# Save to file
mapshot(map, url = 'mapview.html')

Extract raster values along a transect

Another use of the extract() function can be to visualize or analyse data along transects. In the following example, we will run a transect across Belgium and visualize the change in elevation.

Let’s first download the elevation data of Belgium:

# Download to data directory
download.file(url = 'https://github.com/GeoScripting-WUR/VectorRaster/releases/download/tutorial-data/belgium_elev.zip', destfile = 'data/belgium_elev.zip')

# Unzip to data directory
unzip('data/belgium_elev.zip', exdir = "data")

# Download data
bel <- rast('data/belgium_elev.tif')

# Display metadata
bel
## class       : SpatRaster 
## dimensions  : 264, 480, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 2.5, 6.5, 49.4, 51.6  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : belgium_elev.tif 
## name        : BEL_msk_alt 
## min value   :        -105 
## max value   :         691

We can start by visualizing the data.

plot(bel)

Everything seems correct.

We want to look at a transect, which we can draw by hand by selecting two points by clicking. The draw('line') function will help us do that. Once you run the function, you will be able to click in the plotting window of R (The bel object should already be present in the plot panel before running draw('line')). Press esc once you have selected the two extremities of the line.

line <- draw('line')
## Loading required package: sp
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
##      (status 2 uses the sf package in place of rgdal)

Let’s first write this line to a file, using a few different file formats. Note that draw() returns a SpatVector. That means we cannot (directly) use st_write() here like we did before. Using the overview table at the start of this tutorial, we can change that using st_as_sf(). After writing to disk, you might try opening these new files in R, or in different software, such as QGIS or ArcGIS.

# To a GeoJSON
st_write(st_as_sf(line), 'output/line.geojson')

# To an ESRI Shapefile
st_write(st_as_sf(line), 'output/line.shp')

Now the elevation values can simply be extracted using extract().

alt <- extract(bel, line)

We can plot the result as follows.

plot(alt$BEL_msk_alt, type = 'l', ylab = "Altitude (m)")

Question 2: Take a look at the x-axis of the graph above. What unit corresponds to the given values? Can you think of interpretation problems when following the method we apply here?

Extra

Check out this overview of examples for creating static and interactive maps in R, making use of packages like mapview and leaflet.