Jan Verbesselt, Loïc Dutrieux and Dainius Masiliūnas

2023-09-04

WUR Geoscripting WUR logo

Intro to functions and refresher on R

Learning objectives

  • Install packages and create directories in R
  • Learn to write a function
  • Know how to visualize data via a spatial map using an R script
  • Adopt some good scripting/programming habits
  • Use control flow for efficient function writing

RStudio setup

For this course you can use RStudio to write and run scripts. You can access it from e.g. ApplicationsProgrammingRStudio menu. In case you are working on your own computer and would like to know how to install R and RStudio, see the RStudio website.

If you are new to RStudio take a look at the following summary YouTube video about how to use RStudio: Intro to RStudio (6 min). In it you will learn how to navigate the RStudio environment and even run some code. This will be helpful for later in the course.

Additionally, the first time you open RStudio it’s helpful to change some global settings. To do this go to ToolsGlobal Options… in the top menu bar of RStudio. The following box should appear. Set the settings to match that of the screenshot, namely, Never save the workspace to .RData file on exit and uncheck the boxes under History.

Global Options
Global Options

Basic R knowledge useful for Geo-Scripting

Scripting with R to handle spatial problems requires a core set of basic R skills. To prepare you well for the coming weeks, we’ve summarized what we think are important skills for geoscripting below.

Note that this tutorial uses the basic programming structure with lists, strings, and more. For a refresher on these elements click here.

Package installation and creating new directories

To code efficiently, we use packages that contain ready-made features for specific applications. The relevant packages must first be installed before using their features. In this tutorial, we will be using the terra and sf packages. We can therefore go ahead and install them now.

# Check for the terra and sf packages and install if missing
if(!"terra" %in% installed.packages()){install.packages("terra")}
if(!"sf" %in% installed.packages()){install.packages("sf")}

We first check if the package has already been installed, to avoid unnecessary steps. If the package is missing, then we install it.

# Load the terra and sf packages 
library(terra)
## terra 1.7.39
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE

Similarly, we can create new directories for storing our data using the functions dir.exists and dir.create. Again, we check if these directories are missing before creating them, to avoid redundancy.

if(!dir.exists("data")){dir.create("data")}

Vector handling and vector arithmetic

In R we call a one-dimensional array of values a vector, and a two-dimensional array of values a matrix.

This can be a bit confusing, because R vectors have nothing to do with vectors as you know them in GIS. The latter will be introduced in a later tutorial. You’ll also be introduced in a later tutorial to the terra package and the SpatRaster class that has been developed as part of that package. Since rasters are 2D grids of values, all objects that belong to the class SpatRaster are matrices with a few more spatial attributes.

As a consequence, handling vectors and matrices is a crucial skill for processing raster data in R.

A reference manual for vector handling and vector arithmetic can be found here.

# Create a vector
a <- c(3, 6, 8, 1)
a
## [1] 3 6 8 1
# Any mathematical operation can be performed on vectors
(b <- a * 2)
## [1]  6 12 16  2
(d <- a + 6)
## [1]  9 12 14  7
# Two vectors of same length can also be added with each other
(e <- a + b)
## [1]  9 18 24  3
# It works even if their lengths differ, in which case values are "recycled"
(f <- a + b[1:2])
## [1]  9 18 14 13

Vectors of numbers are called numeric vectors, and matrices of numbers are numeric matrices. Calculations with matrices work the same way as with vectors, i.e. element-wise. So a multiplication between two matrices is element-wise value multiplication, not mathematical matrix multiplication.

Value replacement

## Values in a vector that satisfy a certain condition can be replaced by other values
a <- c(2, 5, 2, 5, 6, 9, 2, 12)

## Values inferior or equal to 5 are replaced by 0
a[a <= 5] <- 0
a
## [1]  0  0  0  0  6  9  0 12
## Condition can be defined using another vector of equal length
a <- c(2, 5, 2, 5, 6, 9, 2, 12)
b <- c(1, 1, 0, 1, 0, 0, 1, 0)

## Change the values of a based on b values
a[b == 0] <- NA
a
## [1]  2  5 NA  5 NA NA  2 NA

More complex value replacement:

a <- c(2, 5, 2, 5, 6, 9, 2, 12)
b <- c(1, 1, 2, 1, 0, 0, 1, 2)

## a values at which b is equal to either 0 or 1 are replaced by NA
a[b %in% c(0, 1)] <- NA
a
## [1] NA NA  2 NA NA NA NA 12

Question 1: How do I replace values in b by 3 at which a is 6, 9 and 12?

Character vector handling

When working with real data, such as satellite imagery or KML files, the data always needs to be read from files. File names and file paths define the location of a file on a computer hard drive. A great advantage of scripting is that locating data, reading and writing data can be fairly easily automated (generate automatic file names, recognize patterns in file names, etc). That requires from the user some basic character vector (string) handling skills.

Key functions for handling character strings are listed below:

  • list.files(): to list files in a directory.
  • glob2rx(): to select files using a wildcard. Note: if you know regular expressions, you do not need that function. But most people are more comfortable working with wildcards. For a list of wildcards click here.
  • paste(), paste0(), sprintf(): useful to combine vectors e.g. to create a file name.
  • strsplit(): to split up a string.
  • switch(): to return a value based on a match.

Example of list.files()

# Check your working directory
getwd()

# List the files available in this directory
list.files()

Question 2: List the directories in your working directory.

Example of glob2rx()

# List all .txt files in working directory
list.files(getwd(), pattern = glob2rx("*.txt"))

Example of paste() and paste0()

# Two handy examples
paste("Today is", date())
## [1] "Today is Mon Sep  4 01:06:57 2023"
paste0("A", 1:6)
## [1] "A1" "A2" "A3" "A4" "A5" "A6"

Question 3: Create one variable containing a sentence “geoscripting is fun”, by combining a, b, and c:

a <- "geoscripting"
b <- "is"
c <- "fun"

Example of strsplit()

# I have the following character string
name <- 'today_is_friday_12-12-2014'

# I want to extract the date contained in it, I can split it based on the underscores and the fourth element should be the date
date0 <- unlist(strsplit('today_is_friday_12-12-2014', split = '_'))[4]

# Which can then be formatted as a date object (until now it is a character string)
class(date0)
## [1] "character"
(date <- as.Date(date0, format = '%m-%d-%Y'))
## [1] "2014-12-12"
class(date)
## [1] "Date"

Question 4: How do we select friday from the name variable?

Example of switch()

Lets say we want to return a custom country code based on the country name

# Define country name
country = "Belgium"

# switch()
countrycode = switch(country, "Netherlands" = 1, "Belgium" = 2, "Germany" = 3)

countrycode
## [1] 2

See also ?substr, this can be handy too.

Reading and writing data

In later tutorials we will show you how you can read and write different spatial objects (e.g. vector and raster files).

Here, an example is given how you can read (import into R) and write (export from R) a text file.

The most common way to read in spreadsheet tables is with the read.csv() command. However, you can read in virtually any type of text file. Type ?read.table in your console for some other examples.

# Check your working directory
getwd() 
## [1] "/home/osboxes/Documents/Geoscripting/Tutorials2/Scripting4GeoIntro"
# Create some dummy data
(test <- c(1:5, "6, 7", "8, 9, 10"))
## [1] "1"        "2"        "3"        "4"        "5"        "6, 7"     "8, 9, 10"
# Write to your working directory
write.csv(test, file = "testing.csv")

# Remove the variable "test" from the R working environment
rm(test)

# Check that the object is no longer in the working environment
ls() 
##  [1] "a"           "b"           "country"     "countrycode" "d"          
##  [6] "date"        "date0"       "e"           "f"           "name"
# Read from your working directory
(test <- read.csv("testing.csv"))
##   X        x
## 1 1        1
## 2 2        2
## 3 3        3
## 4 4        4
## 5 5        5
## 6 6     6, 7
## 7 7 8, 9, 10

Writing a function

It is hard to unleash the full potential of R without writing your own functions. Luckily it’s very easy to do. Here are some trivial examples:

# Put the function arguments in () and the evaluation in {}
add <- function(x){
  x + 1
}

add(4)
## [1] 5

An example of setting the default argument values for your function:

add <- function(x = 5) {
  z <- x + 1
  return(z)
}

add()
## [1] 6
add(6)
## [1] 7

That’s about all there is to it. The function will generally return the result of the last line that was evaluated.

Question 5: How do you write a function that returns x and z?

Now, let’s declare a new object, a new function, newfunc (this is just a name and if you like you can give this function another name). Appearing in the first set of brackets is an argument list that specifies (in this case) two names. The value of the function appears within the second set of brackets where the process applied to the named objects from the argument list is defined.

newfunc <- function(x, y) {
  z <- 2*x + y
  return(c(z, x, y))
}

Next, a new object a2b is created which contains the result of applying newfunc to the two objects you have defined earlier. The second R command below prints this new object to the console.

a2b <- newfunc(2, 4)
a2b
## [1] 8 2 4

Finally, you can now remove the objects you have created to make room for the next exercise by selecting and running the last line of the code.

rm(a, newfunc, a2b)

Good scripting/programming habits

Increasing your scripting/programming efficiency goes through adopting good scripting habits. Following a couple of guidelines will ensure that your work:

  • Can be understood and used by others.
  • Can be understood and reused by you in the future.
  • Can be debugged with minimal effort.
  • Can be re-used across different projects.
  • Is easily accessible by others.

In order to achieve these objectives, you should try to follow a few good practices. The list below is not exhaustive, but already constitutes a good basis that will help you getting more efficient now and in the future when working on R projects.

  • Comment your code.
  • Write functions for code you need more than once:
    • Make your functions generic and flexible, using control flow.
    • Document your functions.
  • Follow a R style guide. This will make your code more readable! Most important are:
    • Meaningful and consistent naming of files, functions, variables…
    • Indentation (like in Python: use spaces or tab to indent code in functions or loops etc.).
    • Consistent use of the assignment operator: either <- or = in all your code. The former is used by core R and allows assigning in function calls, the latter is shorter and consistent with most other programming languages.
    • Consistent placement of curly braces.
  • Make your own packages.
  • Keep a similar directory structure across your projects.
  • Use version control to develop/maintain your projects and packages.
  • Never place rm(list = ls()) anywhere in your script. If you do so, you may find your computer set on fire.
  • Set your working directory relative to where you saved your R script: go to Session in the RStudio menu bar → Set Working DirectoryTo Source File Location
Session -> Set working directory -> To Source File Location
Session -> Set working directory -> To Source File Location

Warning: The use of setwd() is not a good practice in scripts without a very, very good reason. The users are aware of the environment and working directories and scripts should not try to outsmart the users. Never set either of the following in your script! This would make the script not reproducible by others! setwd(“/some/path/on/your/computer”) setwd(“../some/path/outside/your/repository”)

Protip: You can use relative paths to load your data. For example by setting the path to a file “my-table.csv” in the directory “data”, relative to your script file: datafile <- file.path(“data”, “my-table.csv”) Then you can for example read in the file using: mytable <- read.csv(datafile) # Do something with that file name

Note that R IDEs like RStudio make a lot of these good practices a lot easier and you should try to take maximum advantage of them. Take a moment to explore the menus of the RStudio session that should already be open on your machine.

Below is an example of a function written with good practices and without. First the good example:

ageCalculator <- function(x) {
    # Function to calculate age from birth year
    # x (numeric) is the year you were born
    if(!is.numeric(x)) {
        stop("x must be of class numeric")
    } else { # x is numeric
        # Get today's date
        date <- Sys.Date()
        # extract year from date and subtract
        year <- as.numeric(format(date, "%Y"))
        if(year <= x) {
            stop("You aren't born yet")
        }
        age <- year - x
    }
    return(age)
}

ageCalculator(1985)
## [1] 38

What a beautiful age for learning geoscripting!

Then the bad example:

# DON'T DO THIS, BAD EXAMPLE!!!
funTest_4 <- function(x) {
if( !is.numeric(x))
{
stop("x must be of class numeric"  )
 }
else {
a = Sys.Date()
b<- as.numeric( format( a,"%Y"))
b-x
}
}

funTest_4(1985)
## [1] 38

Note that this also does work. But which of the two is the easiest to read, understand, and modify if needed? … Exactly, the first one. So let’s look back at the examples and identify some differences:

  • Function name: Not very self descriptive in the second example.
  • Function description: Missing in the second example.
  • Arguments description: Missing in the second example.
  • Comments: The second example has none (okay, the first one really has a lot, but that’s for the example).
  • Variables naming: use of a and b not very self descriptive in second example.
  • Indentation: Missing in the second example.
  • Control flow: Second example does not check for implausible dates.
  • Consistency: Second example uses spaces, assignment operators and curly braces inconsistently.

You haven’t fully understood what control flow is or you are not fully comfortable with function writing yet? We’ll see more of that in the following sections.

Function writing

A function is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. -Wikipedia

The objective of this section is to provide some help on effective function writing. That is functions that are:

  • simple,
  • generic, and
  • flexible.

They should integrate well in a processing/analysis chain and be easily be re-used in a slightly different chain if needed. More flexibility in your function can be achieved through some easy control flow tricks. The following section develops this concept and provides examples.

Control flow

Control flow refers to the use of conditions in your code that redirect the flow to different directions depending on variables values or class. Make use of that in your code, as this will make your functions more flexible and generic.

Object classes and Control flow

You have seen in a previous tutorial already that every variable in your R working environment belongs to a class. You can take advantage of that, using control flow, to make your functions more flexible. First, let’s introduce a new class.

# Check for the terra package and install if missing
if(!"terra" %in% installed.packages()){install.packages("terra")}
library(terra)

c <- rast(ncol = 10, nrow = 10)
class(c)
## [1] "SpatRaster"
## attr(,"package")
## [1] "terra"

Here we used the function rast from the terra package to create an object of class SpatRaster. SpatRaster is a class for rasters (or matrices) that comprise the necessary elements for spatial data. With terra and SpatRaster, complex spatial data will be processed more efficiently by using specifically tailored functions for SpatRaster. This will be studied further in another tutorial.

Controlling the class of input variables of a function

One way of making functions more auto-adaptive is by adding checks of the input variables. Using object class can greatly simplify this task. For example let’s imagine that you just wrote a simple Hello World function.

HelloWorld <- function (x) {
    hello <- sprintf('Hello %s', x)
    return(hello)
}

# Let's test it
HelloWorld('john')
## [1] "Hello john"
HelloWorld(2.5)
## [1] "Hello 2.5"
HelloWorld(c("Devis", "Martin"))
## [1] "Hello Devis"  "Hello Martin"
HelloWorld(NULL)
## character(0)
HelloWorld(data.frame(a = 1:10, b = rep("Name", 10)))
## [1] "Hello 1:10"                                                                                                 
## [2] "Hello c(\"Name\", \"Name\", \"Name\", \"Name\", \"Name\", \"Name\", \"Name\", \"Name\", \"Name\", \"Name\")"

Surprisingly enough, R is smart enough to give intuitive output in most of the tested cases, since the sprintf function automatically casts non-character variables into character ones, and is also vectorised to produce output when the input is a vector. However, in the last two cases, the output is not intuitive. We may want to only allow passing character vectors to this function. We can do this with a small change:

HelloWorld <- function (x) {
    if (!is.character(x))
      stop('Object of class "character" expected for x')

    hello <- sprintf('Hello %s', x)
    return(hello)
}

HelloWorld(21)
## Error in HelloWorld(21): Object of class "character" expected for x

The function now throws an informative error when something not supported is requested. These function argument “sanity checks” are useful to avoid lengthy processing, when we know that the output of the function is going to be wrong anyway because of invalid arguments. Alternatively to stop(), the function could throw a warning() but still return some value.

Question 6: In which cases should you use stop(), warning(), message(), and when should you return a string?

Note that most common object classes have their own logical function (that returns TRUE or FALSE) to check what class it is. For example:

# This
is.character('john')
## [1] TRUE
# is similar to
class('john') == 'character'
## [1] TRUE
is.character(32)
## [1] FALSE
is.numeric(32)
## [1] TRUE

You should always try to take maximum advantage of these small utilities and check for classes and properties of your objects. This is important in some cases that you might not think of in advance, for instance, consider an object with more than one class:

a = list(a = 1:10, b = "b")

# We can also set a class (only do that if you make your own class!)
class(a) = c("myclass", "list")
# Wrong
if (class(a) == "list") {
  print("a is a list")
} else {
  print("a is not a list")
}
## Warning in if (class(a) == "list") {: the condition has length > 1 and only the
## first element will be used
## [1] "a is not a list"
# Right
if (is.list(a)) {
  print("a is a list")
} else {
  print("a is not a list")
}
## [1] "a is a list"

Also note that is.character(32) == TRUE is equivalent to is.character(32). Therefore when checking logical arguments, you don’t need to use the == TRUE. As an example, a function may have an argument (say, plot) that, if set to TRUE will generate a plot, and if set to FALSE does not generate a plot. It means that the function certainly contains an if statement. if(plot) in that case is equivalent to if(plot == TRUE), it’s just shorter (and very slightly faster).

An example, with a function that subtracts 2 SpatRasters, with the option to plot the resulting SpatRaster, or not.

library(terra)

# Function to subtract 2 SpatRasters
minusRaster <- function(x, y, plot=FALSE) {
    z <- x - y
    if (plot) {
        plot(z, 1) # Plots the first layer of the resulting SpatRaster
    }
    return(z)
}

# Let's generate 2 SpatRasters. The first one is the R logo raster
# converted to the terra package file format
r <- rast(system.file("ex/logo.tif", package = "terra"))

# The second SpatRaster is derived from the initial SpatRaster in order
# to avoid issues of non matching extent or resolution, etc
r2 <- r

# Now we fill the second SpatRaster with new values
# The /10 simply makes the result more spectacular
r2[] <- (1:ncell(r2)) / 10

# Simply performs the calculation
r3 <- minusRaster(r, r2)

# Now performs the calculation and plots the resulting SpatRaster
r4 <- minusRaster(r, r2, plot=TRUE)

Vectorised functions

A lot of core R functions are vectorised, i.e. they are capable of taking vectors, rather than individual values, as input. This allows very simple and powerful syntax without needing to use loops, for instance:

NumVec = 1:10

# We do not need to run the function on each element individually
as.character(NumVec)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
# Functions that reduce input into fewer numbers are also vectorised
mean(NumVec)
## [1] 5.5
range(NumVec)
## [1]  1 10
# All math operators are vectorised
NumVec + 100
##  [1] 101 102 103 104 105 106 107 108 109 110
NumVec ^ 2
##  [1]   1   4   9  16  25  36  49  64  81 100

Because most base functions are already vectorised, it is easy to write new functions that are themselves vectorised. For the most part, all you need to do is test your functions not just on single values, but also vectors of values, and think whether the result makes sense.

Vectorisation also allows us to write short and versatile code. For instance, to check whether the input is a positive number:

is.positive.number = function(x) {
  is.numeric(x) & x > 0
}

is.positive.number(100)
## [1] TRUE
is.positive.number(c(-100, 10, NA))
## [1] FALSE  TRUE    NA
is.positive.number(c(TRUE, FALSE))
## [1] FALSE FALSE

Question 7: Why does the following function call return different and counter-intuitive results?

is.positive.number(c(TRUE, -5, 10))
## [1]  TRUE FALSE  TRUE

Type consistency

We just became familiar with functions for checking the types (classes) of variables. You may have noticed that there is one way in which they are all consistent: no matter the input, they return a logical value (TRUE, FALSE, or also NA). This is called type consistency and is a useful property. Consider code like this:

AddNewSubject = function(NewBirthYear) {
  # Birth years of previous subjects
  BirthYears = c(1980, 1985, 1987, 1990, 1993, 1994, 1998, 2000)
  return(c(BirthYears, NewBirthYear))
}

# Works as expected
NewSubjects = AddNewSubject(c(1995, 1998, 1999, 2000))
sum(NewSubjects >= 2000) # How many subjects are born on or after 2000
## [1] 2
# Whoops!
NewSubjects = AddNewSubject(c("MCMXCV", "1998", "1999", "2000"))
sum(NewSubjects >= 2000)
## [1] 3

Question 8: What happened, and what would happen if we didn’t include the Roman numerals?

AddNewSubject is not type consistent: depending on the input, the output class changes. This is sometimes convenient, but in the case above, the function succeeds but gives an output that is completely misleading. This has gone horribly right: a lot of code in R is flexible and can handle multiple input types, and so you may only notice that your output is wrong when you inspect it yourself. If the function AddNewSubject checked the input and always returned integers, we would be sure that comparing its output against numbers and summing them is safe. Similarly, the is.numeric() etc. functions are type consistent, and thus it is safe to use them in if statements which always require a logical input. This allows us to not need extra type checking after running such a function. In addition, the function name is hinting towards the type consistency: is. makes us realise that it will give a yes/no answer.

try() and debugging

Use of try() for error handling

The try() function may help you writing functions that do not stop with a cryptic error whenever they encounter an unknown of any kind. Anything (sub-function, piece of code) that is wrapped into try() will not interrupt the bigger function that contains try(). So for instance, this is useful if you want to apply a function sequentially but independently over a large set of raster files, and you already know that some of the files are corrupted and might return an error. By wrapping your function into try() you allow the overall process to continue until its end, regardless of the success of individual layers. So try() is a perfect way to deal with heterogeneous/unpredictable input data.

Also try() returns an object of different class when it fails. You can take advantage of that at a later stage of your processing chain to make your function more adaptive. See the example below that illustrate the use of try() for sequentially calculating frequency on a list of auto-generated SpatRasters.

library(terra)

# Create a SpatRaster and fill it with "randomly" generated integer values
a <- rast(nrow = 50, ncol = 50)
a[] <- floor(rnorm(n = ncell(a)))

# The freq() function returns the frequency of a certain value in a SpatRaster
# We want to know how many times the value -2 is present in the SpatRaster
freq(a, value = -2)$count
## [1] 342
# Let's imagine that you want to run this function over a whole list of SpatRaster
# but some elements of the list are impredictibly corrupted, so the list looks as follows
b <- a
c <- NA
rasterList <- list(a, b, c)

# Now, b and a are SpatRasters, and c is ''corrupted''
# Running freq(c) would return an error and stop the whole process
out <- list()
for(i in 1:length(rasterList)) {
    out[i] <- freq(rasterList[[i]], value = -2)$count
}
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'freq' for signature '"logical"'
# If you wrap the call in a try(), you still get an error, but it's non-fatal
out <- list()
for(i in 1:length(rasterList)) {
    out[i] <- try(freq(rasterList[[i]], value = -2)$count)
}
## Error in (function (classes, fdef, mtable)  : 
##   unable to find an inherited method for function 'freq' for signature '"logical"'
out
## [[1]]
## [1] 342
## 
## [[2]]
## [1] 342
## 
## [[3]]
## [1] "Error in (function (classes, fdef, mtable)  : \n  unable to find an inherited method for function 'freq' for signature '\"logical\"'\n"
# By building a function that includes a try() we are able to catch the error 
# without having it printed, allowing the process to handle the error gracefully.
fun <- function(x, value) {
    tr <- try(freq(x = x, value = value)$count, silent=TRUE)
    if (class(tr) == 'try-error') {
        return('This object returned an error')
    } else {
        return(tr)
    }
}

# Let's try to run the loop again
out <- list()
for(i in 1:length(rasterList)) {
    out[i] <- fun(rasterList[[i]], value = -2)
}
out
## [[1]]
## [1] 342
## 
## [[2]]
## [1] 342
## 
## [[3]]
## [1] "This object returned an error"
# Note that using a function of the apply family would be a more
# elegant/shorter way to obtain the same result
(out <- sapply(X = rasterList, FUN = fun, value = -2))
## [1] "342"                           "342"                          
## [3] "This object returned an error"

Function debugging

Debugging a single line of code is usually relatively easy; simply double checking the classes of all input arguments often gives good pointers to why the line crashes. But when writing more complicated functions where objects created within the function are reused later on in that same function or in a nested function, it is easy to lose track of what is happening, and debugging can then become a nightmare. A few tricks can be used to make that process less painful.

traceback() and debugonce()

Here are the manual commands, which also work with RStudio and other IDEs:

  • The first thing to investigate right after an error occurs is to run the traceback() function; just like that without arguments.
  • Carefully reading the return of that function will tell you where exactly in your function the error occurred.
foo <- function(x) {
    x <- x + 2
    print(x)
    bar(2)
}

bar <- function(x) {
    x <- x + a.variable.which.does.not.exist
    print(x)
}

foo(2)
# gives an error

traceback()
## 2: bar(2) at #1
## 1: foo(2)
# Ah, bar() is the problem

# Debug it by declaring what to debug and running it
debugonce(bar)
foo(2)

Depending on the IDE you are using, you may be presented with tools for stepping through the function line by line, as well as a Browse console, which allows you to query the state of the variables involved so that you can identify exactly what is going on in the function call. For instance, in RKWard, the Debugging Frames pane on the right shows which line you are stepping through.

For another example see: rfunction.com.

RStudio

RStudio has integration with the debugging tools in R, so you can use a point-and-click interface. However, some parts of it are specific to the RStudio IDE.

  • To force them to catch every error, select Debug - On Error - Break in Code in the main menu.
  • Run again foo(2).
  • RStudio will stop the execution where the error happened. The traceback appears in a separate pane on the right.
  • You can and use the little green “Next” button to go line by line through the code, or the red Stop button to leave the debugging mode.
  • Reset the On Error behaviour to Error Inspector. In this default setting, RStudio will try to decide whether the error is complex enough for debugging, and then offer the options to “traceback” or “rerun the code with debugging” with two buttons in the console.

Finally, solve the problem:

# Redefine bar
bar <- function(x) {
    x + 5
}
foo(2)
## [1] 4
## [1] 7

Refer to the reference section of this document for further information on function debugging.

(optional) Writing packages

The next step to write re-usable code is packaging it, so others can simply install and use it. If you followed the steps to here, this step is not very big anymore! For this course, it is optional. Find instructions here and in the references below.

Creating a map within R - a simple demo

Here is an example of how you can create a map in R. We will make use of a subset of the Global Adminstrative Areas database (GADM):

# Create data and output directories and download data from URL
data_URL <- "https://github.com/GeoScripting-WUR/Scripting4GeoIntro/releases/download/gadm-data/gadm41_PHL_2.zip"
data_dir <- "data"

if (!dir.exists(data_dir)) {
  dir.create(data_dir)
}

if (!file.exists('data/data.zip')) {
  download.file(url = data_URL, destfile = file.path(data_dir, "data.zip"))
  unzip('data/data.zip', exdir = 'data')
}

We load the administrative boundaries of the Philippines, using the sf package, to which you will be introduced more in-depth in a later tutorial:

# Check for the sf package and install if missing
if(!"sf" %in% installed.packages()){install.packages("sf")}
library(sf)
  
# Load the data and subset its geometry
adm <- st_read(file.path(data_dir, "gadm41_PHL_2.json"), quiet = TRUE)
adm_geom <- st_geometry(adm)

# Create and example plot
plot(adm_geom[adm$NAME_1 == "Tarlac"])

Try to understand the code below, and let us know if you have questions. Feel free to use this code as an example and use this for exercise 3.

mar <- adm_geom[adm$NAME_1 == "Marinduque"]
plot(mar, bg = "dodgerblue", axes = TRUE)
plot(mar, lwd = 10, border = "skyblue", add = TRUE)
plot(mar, col = "green4", add = TRUE)
grid()
box()
invisible(text(st_coordinates(st_centroid(mar)),
               labels = as.character(adm$NAME_2[adm$NAME_1 == "Marinduque"]), cex = 1.1, col = "white", font = 2))
mtext(side = 3, line = 1, "Provincial Map of Marinduque", cex = 2)
mtext(side = 1, "Longitude", line = 2.5, cex = 1.1)
mtext(side = 2, "Latitude", line = 2.5, cex = 1.1)
mtext(side = 1, line = -2,
"Projection: Geographic\n
Coordinate System: WGS 1984    \n
Data Source: GADM.org    ", adj = 1, cex = 0.5, col = "grey20")

References and more info