Week 1, Lesson 3: Carrying out your R project

WUR Geoscripting WUR logo

Week 1, Lesson 3: Carrying out your R project

Good morning! Here is what you will do today:

Time Activity
Morning Self-study: go through the following tutorial
13:30 to 14:30 Presentation and discussion
Rest of the afternoon Do/finalise the exercise.


During the previous lecture, you saw some general aspects of the R language, such as an introduction to the syntax, object classes, reading of external data and function writing.

Today it's about carrying out a geoscripting project. This tutorial is about R, but a lot of it can be applied to other languages!

Scripting means that you often go beyond easy things and therefore face challenges. It is normal you will have to look for help. This lesson will guide you through ways of finding help. It continues with a couple "good practices" for scripting, debugging and geoscripting projects. This includes using version control and project management.

Learning objectives

At the end of the lecture, you should be able to

  • Use version control to develop, maintain, and share your code with others
  • Find help for R related issues
  • Produce a reproducible example
  • Adopt some good scripting/programming habits
  • Use control flow for efficient function writing

Version control

Important note: you need to have git installed and properly configured on your computer to do the following. Visit the system setup page for more details. Git is preinstalled in the PC lab and on virtual machines already.

What is version control?

Have you ever worked on a project and ended up having so many versions of your work that you didn't know which one was the latest, and what were the differences between the versions? Does the image below look familiar to you? Then you need to use version control (also called revision control). You will quickly understand that although it is designed primarily for big software development projects, being able to work with version control can be very helpful for scientists as well.

file name

The video below explains some basic concepts of version control and what the benefits of using it are.

What is VCS? (Git-SCM) • Git Basics #1 from GitHub on Vimeo.

So to sum up, version control allows to keep track of:

  • When you made changes to your files
  • Why you made these changes
  • What you changed

Additionally, version control:

  • Facilitates collaboration with others
  • Allows you to keep your code archived in a safe place (the cloud)
  • Allows you to go back to previous version of your code
  • Allows you to find out what changes broke your code
  • Allows you to have experimental branches without breaking your code
  • Allows you to keep different versions of your code without having to worry about file names and archiving organization

The three most popular version control software are Git, Mercurial (abbreviated as hg) and Subversion (abbreviated as svn). Git is by far the most modern and popular one, so we will only use Git in this course.

Git git

What git does

Git keeps track of changes in a local repository you set up on your computer. Typically that is a folder that contains all your code and optionally the data your code needs in order to run. The local repository contains all your files, but also (in a hidden folder) all the changes to the files you have made. It does not keep track of all files automatically: you need to tell git which files to track and which not. Therefore a repository contains your current tracked files (workspace), an index of files that are being tracked, and the version history.

Every time you make significant changes to the files in your workspace, you have to add the changed files to the index, which selects the files whose changes you want to save, and commit them, which means saving the changes to the history tracking of your local repository.

Often you also setup a remote repository, stored on an online platform like GitHub, GitLab or others. It is simply a remotely-hosted mirror of your local repository and allows you to have your work stored in a safe place and accessible from your other computers and potential collaborators. Once in a while (at the end of the day, or every new commit if you want) you can push your commits, which means sending them to the remote repository so it keeps in sync with your local one. When you want to update your local repository based on the content of a remote repository, you have to pull the commits from the remote repository.

Summary of git semantics

  • add: Tell git that you want a file or changes to be tracked. These files/changes are not yet saved in the repository! They are listed as "staged" in the index or staging area for the next commit.
  • commit: Save the staged changes to your local repository. This is like putting a milestone or taking a snapshot of your project at that moment. A commit describes what has been changed, why and when. In the future you can always revert all tracked files to the state they were at when you created the commit.
  • push: Send previous changes you committed to the local repository to the remote repository.
  • pull: Update your local repository (and your workspace) with all new stuff from the remote repository. This command is simple, but potentially destructive, since it overwrites your files with the ones in the remote server. Hence it is not available in the Git GUI.
    • fetch: Get information about the latest commits from the remote repository, but do not apply them to your local repository automatically. This is always safe as it does not change your workspace.
    • merge: Merges two versions (branches) into one, applying the result to the workspace. This includes merging commits from the remote repository with the commits of the local repository. In effect, a fetch followed by a merge is the same as a pull, but it allows you more fine-grained control and is available through the Git GUI.
  • clone : Copy the content of a remote repository locally for the first time.
  • more advanced:
    • branch : Create a branch (a parallel version of the code in the repository)
    • checkout: load the status of a branch into your workspace
git flows

Setting up a Git project

The first step in using Git is to set up a remote repository. GitHub is the most popular host, but in order to facilitate the assignment submission process, we will use the GitLab instance of Wageningen UR throughout this course. Note that the university instance is the same as what you can find on the GitLab website, except that it is managed by the university's IT department (so you do not need to register) and you may choose to make projects visible only to others from the university. If you are taking the course externally, you can use the public GitLab instance or GitHub.

Account setup

  1. Log into GitLab

Go to WUR GitLab and log in. The username is your WUR email address and the password is your WUR account password.

  1. Launch Git GUI

Launch a program called Git GUI from your start menu (or command line: git gui). Git GUI is a graphical interface to Git that comes with Git itself, and is thus cross-platform and always available. When launched, it looks something like this:

Main screen of Git GUI

  1. Create an SSH key pair

In order for GitLab (and other services) to identify that the machine connecting to it is indeed owned by you, there are two options: using a password, or using an SSH key. SSH keys are much more secure than passwords, and it doesn't require you to enter a password every time you try to communicate with the server. Therefore throughout the course we will use SSH keys.

You can generate a new SSH key pair in Git GUI by going to HelpShow SSH Key and pressing the Generate Key button. It will ask you for a passphrase. This is not a password and is completely optional: it is useful in the case your SSH key is stolen, for instance by a thief stealing your laptop or a virus; however, SSH keys are specific to each machine and are never sent over the network, so most of the time it is completely fine to leave the passphrase empty. If you keep it empty, you will not need to enter it every time you try to push your changes, yet the connection will be even more secure than when using a password.

Once done, you will see your new public key:

SSH public key generated

  1. Enrol the public key to your user account

The SSH key pair is used to identify that you own the machine. On WUR Windows PCs, the keys are stored on your M: drive, so they will follow you on any computer in the university. On WUR Linux virtual desktop instances (VDIs) via MyWorkspace/VMware Horizon, the keys are stored on your personal VDI, so you can also access them from anywhere.

Now you need to tell GitLab about your new key. To do that, copy the public key from the dialog, then in GitLab click on your avatar in the top right and go to SettingsSSH keys. Give it a title describing your machine, paste the public key in the box, and press Add key. You might need to confirm the key by email for added security.

This only has to be done once (per machine/OS you use GitLab on).

Creating a new project

  1. Create remote repository

Now we are ready to start making new repositories for our projects! In GitLab, press the New... button ("⊞" button at the top, to the right) and select New project. Give it a descriptive name and a short description (leave Create from template as Blank), choose the visibility of the project.

New project creation on GitLab

If you were to do this on GitHub, you would also be asked to provide a license for your code. That is a good idea in general, as choosing a license is crucial to let others know what you allow them to do with your code. Code without a license is copyright by default, and thus nobody is allowed to make use of your code or contribute to it. For real projects, you will want to set a more permissive license so that others could make use of your code. See Choose a License for a quick overview of what licenses are available.

  1. Configure project settings

Explore your new blank project a bit. On the left sidebar, you can find that the project can have issues and merge requests assigned to them. Issues is what will be used to review your work, and what you will need to use to review the work of others, so try and make a few issues and close them.

Next, check out the project settings. Under the Members tab of Settings, you can invite other people to collaborate on your project. Go ahead and invite your team member (and give them the Master role).

Example issue on GitLab

  1. Clone your new repository

Now that you have a remote repository, it's time to create a local repository that links to it! Copy the SSH address of your new repository by copying the URL in the SSH box (if it's showing HTTPS, click it and switch to SSH).

Blank GitLab repository

Go back to Git GUI, and press Clone Existing Repository. Paste the URL you just copied to the Source Location field, and choose a folder you want to store your code in in the Target Directory field. Note: the Target Directory must not already exist! Git GUI will create it for you.

Once you click Clone, you will get a question about whether you trust the remote machine (if you ran git gui from a command line, it will appear stuck, but actually the question will appear in the terminal and you need to answer it there). You need to answer this with yes (the full word). This puts the GitLab server into a list of trusted servers, to guard against potential impostor servers.

You will end up in an empty Git GUI window:

Git GUI in an empty directory

Working with Git GUI

  1. Make changes

To see Git in action, you need to make some changes in your repository. Try it by creating a new R script file in the directory where you cloned your new project.

Once you are done, go back to Git GUI. If you closed the window, you can get back to your repository by launching Git GUI and clicking on its path in the Open Recent Repository list. If you did not close it, click the Rescan button. You will see some changes:

Changes pending in Git GUI

At the top left corner, the Unstaged Changes panel, you can see all the files that changed in your workspace. If you click on the name of the file, the main panel will show you what changed since the last commit. Unless it is a non-text (data) file, in which case it will just note that something has changed. Note: Git is very efficient with storing changes in text files: these diff files are all it stores internally, it does not copy the whole file on each commit. However, it does not deal efficiently with non-text files, and thus you should limit the amount and size of such files as much as possible.

If you click on the icon of the file in the Unstaged Changes panel, the file changes will be staged and appear at the Staged Changes (Will Commit) panel. These are the file changes you want to save and sent to GitLab. You don't have to stage all files for each commit, only those you actually want to be tracked by git. You can safely ignore some files such as manual backups, temporary files, and the like and they will remain untracked by git, as long as you never stage them. If you do want to stage everything, you can press the Stage Changed button. If you staged more than you wanted to, you can click on the file icon in the Staged Changes panel to unstage it.

Remember: clicking the name of the file shows the changes you made, clicking the icon of the file stages or unstages the change!

  1. Commit changes

Once you staged the files that you want to commit, you need to fill out the commit message. This is a brief description of what changes you made between the last commit and the one you are about to create. The first line you enter is the name of the commit, keep that one short. Subsequent lines are the description. You may notice that the Commit message box does not have a horizontal scrollbar: that is intentional, because your commit message should fit within that box without the need for scrolling. Use new lines to break the text.

If it is the first time you use Git GUI to make a commit, it might complain about it not knowing who you are. You should go to EditOptions... and fill out the Global (All Repositories) options User Name and Email Address. These will be displayed on GitLab.

Next press the Commit button and your commit will be saved locally. A commit is like a saved state: you are always able to roll back the contents of your tracked files to the state they were in when you committed the changes.

In the case you made a mistake (a mistake in the message, forgot to stage something, etc.), you can press the Amend Last Commit button and get right back to where you were when you made the last commit; but use this functionality very sparingly, as it does not work with changes that have already been pushed to GitLab.

  1. Push changes to the server

Press the Push button, and confirm the push, to send all your changes to your GitLab repository. You can now refresh the GitLab page to see your changes. Well done!

GitLab repository with content

  1. Pull changes from the server

One of the major uses of Git is collaboration and the ability to synchronise changes across different devices. Multiple users can do changes in the same Git repository (as long as you change the repository settings in GitHub to allow another user to do that), and you can work on the same code on different devices yourself. In both cases, it is important to keep all local repositories in sync with the remote repository. That is done via Git GUI by using Fetch and Merge. If you like, you can test it by cloning the same repository in another folder, making changes and pushing them to the server, then using fetch in the other copy.

If there are any changes on GitHub that are not on your local copy yet, in Git GUI go to RemoteFetch fromorigin to download all changes. This will not apply them yet, however.

To attempt to apply the changes, go to MergeLocal Merge.... If all goes well, the changes will be applied.

There may be cases where files go out of sync in incompatible ways, however, like two people editing one file at the same time. In that case you may hit a merge conflict. It is best to try to avoid them. In case it happens, you need to go through the conflicting files in a text editor and edit them by hand, keeping the parts of the files you need. The conflicting parts will be in between lines of of >>>> and <<<< symbols. Once you remove the parts you don't need (including the separators), you can solve the conflict by comitting the changes.

Other Git GUI functionality

Git GUI not only provides a way to make, push and pull commits, but also to visualise the commit history of your repository in a tree graph. Go to RepositoryVisualise Master's History to see it. For larger and more complex projects with lots of contributors and merges, it might look like some sort of a subway map:

Git GUI history (gitk)

You might run into a situation when you have made changes in tracking files, but do not want to keep some of the changes. You can revert one file by selecting it in Git GUI, then clicking CommitRevert changes.

Project structure

Try to keep a consistent structure across your projects, so that it is easier for you to switch from one project to the other and immediately understand how things work. You may use the following structure:

  • A main.R script at the root of the project. This script performs step by step the different operations of your project. It is the only non-generic part of your project (it contains paths, already set variables, etc).
  • An R/ subdirectory: This directory should contain the functions you have defined as part of your project. These functions should be as generic as possible and are sourced and called by the main.R script.
  • A data/ subdirectory: This directory contains data sets of the project. Since Git is not as efficient with non-text files, and GitHub has storage limits, you should only put small data sets in that directory (<2-3 MB). These can be shapefiles, small rasters, csv files, but perhaps even better is to use the R archives. R offers two types of archives to store the important variables of the environments, .rda and .rds.
  • An output/ sub directory (when applicable).
project structure

Example main.R file

Typically the header of your main script will look like that.

# John Doe
# January 2017
# Import packages
# Source functions
# Load datasets 

# Then the actual commands

Bigger data

The data/ directory of your project should indeed only contain relatively small data sets. When handling bigger remote sensing data sets, these should stay out of the project, where you store the rest of your data.


  • Create 3 files in your R/ directory (ageCalculator.R, HelloWorld.R and minusRaster.R) in which you will copy paste the respective functions.
  • Create a main.R script at the root of your project and add some code to it. The content of the main.R in that case could be something as below.
# Name
# Date



# import dataset
r <- raster(system.file("external/rlogo.grd", package="raster")) 
r2 <- r 
# Filling the rasterLayer with new values.
r2[] <- (1:ncell(r2)) / 10
# Performs the calculation
r3 <- minusRaster(r, r2) 

RStudio projects

RStudio has a functionality called projects that allows organising your files a bit better. You may have learned that one of the first things to do when opening a R session is to set your working directory, using the setwd() command. When creating a new project, the working directory is automatically set to the root of the project, where your main.R is located. When working with RStudio projects you should not change the working directory. If you want to access files stored in your data/ subdirectory, simply append data/ to the beginning of the string leading to the file you want to load.

Note: RStudio projects are specific to RStudio and are not usable with base R or with other R IDEs. The use of RStudio projects is optional and is merely for convenience. When using other IDEs you can assume the user will set the working directory to where the script is located.

RStudio itself has integration with Git, and when creating a project there is an option to make it a git repository as well. However, in this lesson we will not be using this method, since it is specific to RStudio and does not work for Python or in other R IDEs. Git GUI, in contrast, is language-agnostic and standalone. So if you do create a project with RStudio, create it inside your cloned GitHub repository, and do not select that it creates a new git repository, then use Git GUI to handle all the changes you do in the repository. This will save you the confusion about how to handle it when we will come to Python lessons.

Finding help

Sources for help

The most important helper is the R documentation. In the R console, just enter ?function or help(function) to get the manual page of the function you are interested in.

There are many places where help can be found on the internet. So in case the function or package documentation is not sufficient for what you are trying to achieve, a search engine like Google is your best friend. Most likely by searching the right key words relating to your problem, the search engine will direct you to the archive of the R mailing list, or to some discussions on Stack Exchange. These two are reliable sources of information, and it is quite likely that the problem you are trying to figure out has already been answered before.

However, it may also happen that you discover a bug or something that you would qualify as abnormal behavior, or that you really have a question that no one has ever asked (corollary: has never been answered). In that case, you may submit a question to one of the R mailing list. For general R question there is a general R mailing list, while the spatial domain has its own mailing list (R SIG GEO). Geo related questions should be posted to this latter mailing list.

Note: these mailing lists have heavy mail traffic, use your mail client efficiently and set filters, otherwise it will quickly bother you.

These mailing lists have a few rules, and it's important to respect them in order to ensure that:

  • no one gets offended by your question,
  • people who are able to answer the question are actually willing to do so,
  • you get the best quality answer.

So, when posting to the mail list:

  • Be courteous.
  • Provide a brief description of the problem and why you are trying to do that.
  • Provide a reproducible example that illustrate the problem, reproducing the eventual error.
  • Sign with your name and your affiliation.
  • Do not expect an immediate answer (although well presented questions often get answered fairly quickly).

Reproducible examples

Indispensable when asking a question to the online community, being able to write a reproducible example has many advantages:

  • It may ensure that when you present a problem, people are able to answer your question without guessing what you are trying to do.
  • Reproducible examples are not only to ask questions; they may help you in your thinking, developing or debugging process when writing your own functions.
    • For instance, when developing a function to do a certain type of raster calculation, start by testing it on a small auto-generated RasterLayer object, and not directly on your actual data that might be covering the whole world.

Example of a reproducible example

Well, one could define a reproducible example by:

  • A piece of code that can be executed by anyone who has R, independently of the data present on his machine or any preloaded variables.
  • The computation time should not exceed a few seconds and if the code automatically downloads data, the data volume should be as small as possible.

So basically, if you can quickly start a R session on your neighbour's computer while he is on a break, copy-paste the code without making any adjustments and see almost immediately what you want to demonstrate; congratulations, you have created a reproducible example.

Let's illustrate this by an example. I want to perform value replacements of one raster layer, based on the values of another raster layer. (We haven't covered raster analysis in R as part of the course yet, but you will quickly understand that for certain operations rasters are analog to vectors of values.)

## Create two RasterLayer objects of similar extent
## Loading required package: sp
r <- s <- raster(ncol=50, nrow=50)
## Fill the raster with values
r[] <- 1:ncell(r)
s[] <- 2 * (1:ncell(s))
s[200:400] <- 150
s[50:150] <- 151
## Perform the replacement
r[s %in% c(150, 151)] <- NA
## Visualise the result

Useful to know when writing a reproducible example: instead of generating your own small data sets (vectors or RasterLayers, etc) as part of your reproducible example, use some of R built-in data-sets. They are part of the main R packages. Some popular data sets are: cars, meuse.grid_ll, Rlogo, iris, etc. The auto completion menu of the data() function will give you an overview of the data sets available. (In most script editing environments, including the R console and RStudio, auto-completion can be invoked by pressing the tab key, use it without moderation.)

## Import the variable "cars" in the working environment
## [1] "data.frame"
## Visualise the first six rows of the variable
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
# The plot function on this type of dataset (class = data.frame, 2 column)
# automatically generates a scatterplot

Another famous data set is the meuse data set, providing all sorts of spatial variables spread across a part of the Meuse watershed. The following example compiled from the help pages of the sp package.

## Example using built-in dataset from the sp package
## Load required datastes
# The meuse dataset is not by default a spatial object
# but its x, y coordinates are part of the data.frame
## [1] "data.frame"
coordinates(meuse) <- c("x", "y")
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Now that the object belongs to a spatial class, we can plot it using one of the vector plotting functions of the sp package. See the result in the figure below.

bubble(meuse, "zinc", maxsize = 2.5,
       main = "zinc concentrations (ppm)", key.entries = 2^(-1:4))

The sp package help page contains multiple examples of how to explore its meuse built-in data set. Another example of multiple plots using meuse.grid is given in the figure below.

## Load meuse.riv dataset
## Create an object of class SpatialPolygons from meuse.riv
meuse.sr <- SpatialPolygons(list(Polygons(list(Polygon(meuse.riv)),"meuse.riv")))
## Load the meuse.grid dataset
## Assign coordinates to the dataset and make it a grid
coordinates(meuse.grid) = c("x", "y")
gridded(meuse.grid) = TRUE
## Plot all variables of the meuse.grid dataset in a multiple window spplot
spplot(meuse.grid, col.regions=bpy.colors(), main = "meuse.grid",
           list("sp.polygons", meuse.sr),
           list("sp.points", meuse, pch="+", col="black")

Good scripting/programming habits

Increasing your scripting/programming efficiency goes through adopting good scripting habits. Following a couple of guidelines will ensure that your work:

  • Can be understood and used by others.
  • Can be understood and reused by you in the future.
  • Can be debugged with minimal effort.
  • Can be re-used across different projects.
  • Is easily accessible by others.

In order to achieve these objectives, you should try to follow a few good practices. The list below is not exhaustive, but already constitutes a good basis that will help you getting more efficient now and in the future when working on R projects.

  • Comment your code.
  • Write functions for code you need more than once:
    • Make your functions generic and flexible, using control flow.
    • Document your functions.
  • Follow a R style guide. This will make your code more readable! Most important are:
    • Meaningful and consistent naming of files, functions, variables...
    • Indentation (like in Python: use spaces or tab to indent code in functions or loops etc.).
    • Consistent use of the assignment operator: either <- or = in all your code. The former is used by core R and allows assigning in function calls, the latter is shorter and consistent with most other programming languages.
    • Consistent placement of curly braces.
  • Make your own packages.
  • Work with projects.
  • Keep a similar directory structure across your projects.
  • Use version control to develop/maintain your projects and packages.

Note that R IDEs like RStudio make a lot of these good practices a lot easier and you should try to take maximum advantage of them. Take a moment to explore the menus of the RStudio session that should already be open on your machine. Particular emphasis will be given later in this tutorial on projects, project structure and use of version control.

Below is an example of a function written with good practices and without. First the good example:

ageCalculator <- function(x) {
    # Function to calculate age from birth year
    # x (numeric) is the year you were born
    if(!is.numeric(x)) {
        stop("x must be of class numeric")
    } else { # x is numeric
        # Get today's date
        date <- Sys.Date()
        # extract year from date and subtract
        year <- as.numeric(format(date, "%Y"))
        if(year <= x) {
            stop("You aren't born yet")
        age <- year - x

## [1] 33

31, what a beautiful age for learning geo-scripting.

Then the bad example:

funTest_4 <- function(x) {
if( !is.numeric(x))
stop("x must be of class numeric"  )
else {
a = Sys.Date()
b<- as.numeric( format( a,"%Y"))

## [1] 33

It also works, but which of the two is the easiest to read, understand, and modify if needed? ... Exactly, the first one. So let's look back at the examples and identify some differences:

  • Function name: Not very self descriptive in the second example.
  • Function description: Missing in the second example.
  • Arguments description: Missing in the second example.
  • Comments: The second example has none (okay, the first one really has a lot, but that's for the example).
  • Variables naming: use of a and b not very self descriptive in second example.
  • Indentation: Missing in the second example.
  • Control flow: Second example does not check for implausible dates.
  • Consistency: Second example uses spaces, assigment operators and curly braces inconsistently.

You haven't fully understood what control flow is or you are not fully comfortable with function writing yet? We'll see more of that in the following sections.

Function writing

A function is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. -Wikipedia

The objective of this section is to provide some help on effective function writing. That is functions that are:

  • simple,
  • generic, and
  • flexible.

They should integrate well in a processing/analysis chain and be easily be re-used in a slightly different chain if needed. More flexibility in your function can be achieved through some easy control flow tricks. The following section develops this concept and provides examples.

Control flow

Control flow refers to the use of conditions in your code that redirect the flow to different directions depending on variables values or class. Make use of that in your code, as this will make your functions more flexible and generic.

Object classes and Control flow

You have seen in a previous lesson already that every variable in your R working environment belongs to a class. You can take advantage of that, using control flow, to make your functions more flexible.

A quick reminder on classes:

# 5 different objects belonging to 5 different classes
a <- 12
## [1] "numeric"
b <- "I have a class too"
## [1] "character"
c <- raster(ncol=10, nrow=10)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
d <- stack(c, c)
## [1] "RasterStack"
## attr(,"package")
## [1] "raster"
e <- brick(d)
## [1] "RasterBrick"
## attr(,"package")
## [1] "raster"

Controlling the class of input variables of a function

One way of making functions more auto-adaptive is by adding checks of the input variables. Using object class can greatly simplify this task. For example let's imagine that you just wrote a simple Hello World function.

HelloWorld <- function (x) {
    hello <- sprintf('Hello %s', x)

# Let's test it
## [1] "Hello john"

Obviously, the user is expected to pass an object of character vector to x. Otherwise the function will return an error. But you can make it handle such cases gracefully and print an informative message by controlling the class of the input variable. For example.

HelloWorld <- function (x) {
    if (is.character(x)) {
      hello <- sprintf('Hello %s', x)
    } else {
      hello <- warning('Object of class character expected for x')

## Warning in HelloWorld(21): Object of class character expected for x
## [1] "Object of class character expected for x"

The function does not crash anymore, but returns a warning instead.

Note that most common object classes have their own logical function, that returns TRUE or FALSE. For example.

## [1] TRUE
# is equivalent to 
class('john') == 'character'
## [1] TRUE
## [1] FALSE
## [1] TRUE

You should always try to take maximum advantage of these small utilities and check for classes and properties of your objects.

Also note that is.character(32) == TRUE is equivalent to is.character(32). Therefore when checking logical arguments, you don't need to use the == TRUE. As an example, a function may have an argument (say, plot) that, if set to TRUE will generate a plot, and if set to FALSE does not generate a plot. It means that the function certainly contains an if statement. if(plot) in that case is equivalent to if(plot == TRUE), it's just shorter (and very slightly faster).

An example, with a function that subtracts 2 RasterLayers, with the option to plot the resulting RasterLayer, or not.

## Function to subtract 2 rasterLayers
minusRaster <- function(x, y, plot=FALSE) { 
    z <- x - y
    if (plot) {

# Let's generate 2 rasters 
# that first one is the R logo raster
# converted to the raster package file format.
r <- raster(system.file("external/rlogo.grd", package="raster")) 
# The second RasterLayer is derived from the initial RasterLayer in order
# to avoid issues of non matching extent or resolution, etc
r2 <- r
## Filling the rasterLayer with new values
# The /10 simply makes the result more spectacular
r2[] <- (1:ncell(r2)) / 10
## Simply performs the calculation
r3 <- minusRaster(r, r2) 
## Now performs the calculation and plots the resulting RasterLayer
r4 <- minusRaster(r, r2, plot=TRUE) 

try and debugging

Use of try for error handling

The try() function may help you writing functions that do not stop with a cryptic error whenever they encounter an unknown of any kind. Anything (sub-function, piece of code) that is wrapped into try() will not interrupt the bigger function that contains try(). So for instance, this is useful if you want to apply a function sequentially but independently over a large set of raster files, and you already know that some of the files are corrupted and might return an error. By wrapping your function into try() you allow the overall process to continue until its end, regardless of the success of individual layers. So try() is a perfect way to deal with heterogeneous/unpredictable input data.

Also try() returns an object of different class when it fails. You can take advantage of that at a later stage of your processing chain to make your function more adaptive. See the example below that illustrate the use of try() for sequentially calculating frequency on a list of auto-generated RasterLayers.


## Create a raster layer and fill it with "randomly" generated integer values
a <- raster(nrow=50, ncol=50)
a[] <- floor(rnorm(n=ncell(a)))

# The freq() function returns the frequency of a certain value in a RasterLayer
# We want to know how many times the value -2 is present in the RasterLayer
freq(a, value=-2)
## [1] 362
# Let's imagine that you want to run this function over a whole list of RasterLayer
# but some elements of the list are impredictibly corrupted
# so the list looks as follows
b <- a
c <- NA
list <- c(a,b,c)
# In that case, b and a are raster layers, c is ''corrupted''
## Running freq(c) would return an error and stop the whole process
out <- list()
for(i in 1:length(list)) {
    out[i] <- freq(list[[i]], value=-2)
## If you wrap the call in a try(), you still get an error, but it's non-fatal
out <- list()
for(i in 1:length(list)) {
    out[i] <- try(freq(list[[i]], value=-2))
# By building a function that includes a try()
# we are able to catch the error without having it printed,
# allowing the process to handle the error gracefully.
fun <- function(x, value) {
    tr <- try(freq(x=x, value=value), silent=TRUE)
    if (class(tr) == 'try-error') {
        return('This object returned an error')
    } else {

## Let's try to run the loop again
out <- list()
for(i in 1:length(list)) {
    out[i] <- fun(list[[i]], value=-2)
## [[1]]
## [1] 362
## [[2]]
## [1] 362
## [[3]]
## [1] "This object returned an error"
# Note that using a function of the apply family would be a more
# elegant/shorter way to obtain the same result
(out <- sapply(X=list, FUN=fun, value=-2))
## [1] "362"                           "362"                          
## [3] "This object returned an error"

Function debugging

Debugging a single line of code is usually relatively easy; simply double checking the classes of all input arguments often gives good pointers to why the line crashes. But when writing more complicated functions where objects created within the function are reused later on in that same function or in a nested function, it is easy to lose track of what is happening, and debugging can then become a nightmare. A few tricks can be used to make that process less painful.

traceback() and debugonce()

Here are the manual commands, which also work with RStudio and other IDEs:

  • The first thing to investigate right after an error occurs is to run the traceback() function; just like that without arguments.
  • Carefully reading the return of that function will tell you where exactly in your function the error occurred.
foo <- function(x) {
    x <- x + 2

bar <- function(x) { 
    x <- x + a.variable.which.does.not.exist 

## gives an error

## 2: bar(2) at #1
## 1: foo(2)
# Ah, bar() is the problem

# Debug it by declaring what to debug and running it

Depending on the IDE you are using, you may be presented with tools for stepping through the function line by line, as well as a Browse console, which allows you to query the state of the variables involved so that you can identify exactly what is going on in the function call. For instance, in RKWard, the Debugging Frames pane on the right shows which line you are stepping through.

For another example see: rfunction.com.


RStudio has integration with the debugging tools in R, so you can use a point-and-click interface. However, some parts of it are specific to the RStudio IDE.

  • To force them to catch every error, select Debug - On Error - Break in Code in the main menu.
  • Run again foo(2).
  • RStudio will stop the execution where the error happened. The traceback appears in a separate pane on the right.
  • You can and use the little green "Next" button to go line by line through the code, or the red Stop button to leave the debugging mode.
  • Reset the On Error behaviour to Error Inspector. In this default setting, RStudio will try to decide whether the error is complex enough for debugging, and then offer the options to "traceback" or "rerun the code with debugging" with two buttons in the console.

Finally, solve the problem:

## redefine bar
bar <- function(x) {
    x + 5
## [1] 4
## [1] 7

Refer to the reference section of this document for further information on function debugging.

(optional) Writing packages

The next step to write re-usable code is packaging it, so others can simply install and use it. If followed the steps to here, this step is not very big anymore! For this course, it is optional. Find instructions here and in the references below.

Exercise 3

Your task

Create a RStudio project, with git version control. The project should contain a simple function to calculate whether or not a year is a leap year. Use control flow, and provide some examples of how the function works in the main.R. The function should behave as follows:

> is.leap(2000)
[1] TRUE

> is.leap(1581)
[1] "1581 is out of the valid range"

> is.leap(2002)

> is.leap('john') #should return an error 
Error: argument of class numeric expected

Useful resources


Assessment will consider whether the function works as intended, but also its readability and completeness (try as much as possible to use all good practices mentioned in this lecture). The structure of your project and the appropriate use of git will also be assessed.

How to submit?

Create a private GitLab project with the name Geoscripting-Exercise<id>-<teamname>, where <id> is the number of the exercise (3 in today's case) and <teamname> is the name of your team. In the Members section of the project, add the staff member(s) responsible for checking your exercises (in 2018 it is @swink019 and @almei006, see Blackboard for any updates) as members of the project and grant them "Master" privileges. Finish the exercise by 17:15. The staff will check it and publish your answers on the student group Geoscripting<year> on GitLab after the deadline.

You will need to give the team you are reviewing feedback on their exercise solution before 11:00 the next day (use the review team generator Shiny app to know who you are reviewing and who you are reviewed by). Answers from other groups will be available on the Geoscripting<year> group on GitLab. For reviewing other teams' answers:

  • Clone the repository of the team you have to review to your computer and test it.
  • Add an issue to their project and write out your review. Make sure to mention your team name in the review.

This is the way the exercises will need to be submitted and reviewed from this lesson on.