Week 1, Lesson 1: Introduction to the geo-scripting course

WUR Geoscripting WUR logo

Week 1, Lesson 1: Introduction to the geo-scripting

format(Sys.time(), '%d %B, %Y')

[1] "10 January, 2018"

Good morning! Here is what you will do today:

Time Activity
Morning Self-study: go through the tutorial below and answer the questions mentioned in bold within the self-study section.
13:30 to 14:30 Course introduction, planning and teams selection (C3020, ORION):
Rest of the afternoon Do/finalise the exercise.

To Do in the morning: self-study

You need to complete before 13h today:

  • See information provided via blackboard in the section course description: lesson 1!

In the morning you can work from home or can come to the PC8020 and PC8045 rooms in ORION. You need to have RStudio Desktop (Free) and R (Free) installed (already available in the PC lab). For more information about R and RStudio installation: see Geo-scripting system set-up

If you have questions you can use the Question and Answer Website of the Geo-scripting course. Try to ask or answer a question today. We will explain how this can be used during this course at 13:30 today.

Objective and Learning outcomes of today:

  • Refreshing R skills and assessing scripting skills to see if you have the necessary scripting skills to continue with the course
  • Able to write a function
  • Know how to visualise data (spatial map) in R

Basic R and RStudio setup

This preliminary section will cover some basic details about R. For this course we will use Rstudio to write and run scripts. In case you are working on your own computer and would like to know how to install R and RStudio: https://geoscripting-wur.github.io/system_setup/.

Getting started with RStudio and R on your own computer.

A summary youtube movie about how to use Rstudio and writing a function is here: Intro to RStudio. Have a look at this movie if you do not know how to work with RStudio and then do the following section. See the following tutorial for a short introduction on the RStudio interface.

Now, Open RStudio and type the following script in the R console of RStudio:

rm(list = ls()) # Clear the workspace!
ls() # No objects left in the workspace

A good way to start most R scripts

a <- 1
## [1] 1

The first line you passed to the console created a new object named a in memory. The symbol '<-' is somewhat equivalent to an equal sign but recommended as it is used internally. In the second line you printed a to the console by simply typing it's name.

What is the class of this object?

## [1] "numeric"

You now have requested the class attribute of a and the console has returned the attribute: numeric. R possesses a simple mechanism to support an object-oriented style of programming. All objects (a in this case) have a class attribute assigned to them. R is quite forgiving and will assign a class to an object even if you haven't specified one (as you didn't in this case). Classes are a very important feature of the R environment. Any function or method that is applied to an object takes into account its class and uses this information to determine the correct course of action.

Set Your Working Directory

Let's do some basic set up first.

  • Create a folder which will be your working directory e.g. Lesson1
  • Create an R script within that folder
  • Set your working directory to the Lesson1 folder
  • Create a data folder within your working directory

In the code block below type in the file path to where your data is being held and then (if you want) use the setwd() (set working directory) command to give R a default location to look for data files.

## This sets the working directory (where R looks for files)
getwd() # Double check your working directory
datdir <- file.path("data") # Path

Basic R knowledge useful for Geo-Scripting

Scripting with R to handle spatial problems requires a core set of basic R skills. To prepare you well for the coming weeks, we've summarized what we think are important skills for geo scripting below.

Vector handling and vector arithmetic

You'll be introduced in a following lesson with the raster package and the raster classes that have been developed as part of that package. All objects that belong to the class raster are vectors with a few more spatial attributes.

In R we call a collection of numbers a vector.

As a consequence, handling vectors is a crucial skill for processing raster data in R.

Reference manual for vector handling and vector arithmetic can be found here.

## Create a vector
a <- c(3,6,8,1)
## [1] 3 6 8 1
## Any mathematical operation can be performed on vectors
(b <- a * 2)
## [1]  6 12 16  2
(d <- a + 6)
## [1]  9 12 14  7
## Two vectors of same length can also be added with each other
(e <- a + b)
## [1]  9 18 24  3

Value replacement

## Values in a vector that satisfy a certain condition can be replaced by other values
a <- c(2,5,2,5,6,9,2,12)

## Values inferior or equal to 5 are replaced by 0
a[a <= 5] <- 0
## [1]  0  0  0  0  6  9  0 12
## Condition can be defined using another vector of equal length
a <- c(2,5,2,5,6,9,2,12)
b <- c(1,1,0,1,0,0,1,0)

## Change the values of a based on b values
a[b == 0] <- NA
## [1]  2  5 NA  5 NA NA  2 NA

More complex value replacement:

a <- c(2,5,2,5,6,9,2,12)
b <- c(1,1,2,1,0,0,1,2)
## a values at which b is equal to either 0 or 1 are replaced by NA
a[b %in% c(0,1)] <- NA
## [1] NA NA  2 NA NA NA NA 12

Question 1: How do I replace values in b by 3 at which a is 6, 9 and 12? (Just note down these questions as they will be discussed during the discussion session at 13:30 today.)

Character vector handling

When working with real data, such as shapefiles or satellite imagery, the data always needs to be read from files. File names and file paths define the location of a file on a computer hard drive. A great advantage of scripting is that locating data, reading and writing data can be fairly easily automated (generate automatic file names, recognise patterns in file names, etc). That requires from the user some basic string handling skills.

Key functions for handling character strings are listed below:

  • list.files(): to list files in a directory.
  • glob2rx(): to select files using a wildcard. Note: if you know regular expressions, you do not need that function. But most people are more comfortable working with wildcards.
  • paste(), paste0(), sprintf(): useful to combine vectors e.g. to create a filename.
  • strsplit(): to split up a string.

Example of list.files()

getwd() # check your working directory
list.files() # list the files available in this directory

Question 2: List the directories in your working directory.

Example of glob2rx()

## List all .txt files in working directory
list.files(getwd(), pattern = glob2rx("*.txt"))

Example of paste() and paste0()

## two handy examples
paste("Today is", date())
## [1] "Today is Wed Jan 10 17:44:55 2018"
paste0("A", 1:6)
## [1] "A1" "A2" "A3" "A4" "A5" "A6"

Question 3: Create one variable containing a sentence "geo-scripting is fun", by combining a, b, and c:

a <- "geo-scripting"
b <- "is"
c <- "fun"

Example of strsplit()

# I have the following character string
name <- 'today_is_friday_12-12-2014'

# I want to extract the date contained in it, I can split it based on the underscores and the fourth element should be the date
date0 <- unlist(strsplit('today_is_friday_12-12-2014', split = '_'))[4]

# Which can then be formatted as a date object (until now it is a character string)
## [1] "character"
(date <- as.Date(date0, format = '%m-%d-%Y'))
## [1] "2014-12-12"
## [1] "Date"

Question 4: How do we select friday from the name variable?

See also ?substr, this can be handy too.

Reading and writing data

In the following lessons we will show you how you can read and write different spatial objects (e.g. vector and raster files).

Here, an example is given how you can read (import into R) and write a text file (i.e. export from R).

The most common way to read in spread sheet tables is with the read.csv() command. However you can read in virtually any type of text file. Type ?read.table in your console for some other examples.

# getwd() ## Check your working directory
(test <- c(1:5, "6,7", "8,9,10"))
## [1] "1"      "2"      "3"      "4"      "5"      "6,7"    "8,9,10"
write.csv(test, file = "testing.csv") # Write to your working directory
rm(test) # Remove the variable "test" from the R working environment
ls() # Check the objective in the working environment
## [1] "a"     "b"     "d"     "date"  "date0" "e"     "name"
(test = read.csv("testing.csv")) # Read from your working directory
##   X      x
## 1 1      1
## 2 2      2
## 3 3      3
## 4 4      4
## 5 5      5
## 6 6    6,7
## 7 7 8,9,10

Question 5: Do you know how to read an Excel file into R? What are the different options? How can you find help?

Writing a function

It is hard to unleash the full potential of R without writing your own functions. Luckily it's very easy to do. Here are some trivial examples:

## Put the function arguments in () and the evaluation in {}
add <- function(x){
  x + 1
## [1] 5

Set the default argument values for your function:

add <- function(x = 5) {
  z <- x + 1
## [1] 6
## [1] 7

That's about all there is to it. The function will generally return the result of the last line that was evaluated.

Question 6: How do you write a function that returns x and z?

Now, let's declare a new object, a new function, newfunc (this is just a name and if you like you can give this function another name). Appearing in the first set of brackets is an argument list that specifies (in this case) two names. The value of the function appears within the second set of brackets where the process applied to the named objects from the argument list is defined.

newfunc <- function(x, y) {
  z <- 2*x + y
a2b <- newfunc(2, 4)
## [1] 8 2 4

Next, a new object a2b is created which contains the result of applying newfunc to the two objects you have defined earlier. The second last R command prints this new object to the console. Finally, you can now remove the objects you have created to make room for the next exercise by selecting and running the last line of the code.

rm(a, newfunc, a2b)

Creating a map within R - a simple demo

Here is an example of how you can create a map in R. It is a function to get public data from anywhere in the world. See help of the getData function in the raster package.


Read the help to find out how we can find the country codes.

Question 7: What is the country code of Belgium?

More info about the datasets see Global Adminstrative Areas database (GADM).


Now we will download the administrative boundaries of the Philippines:

datdir <- 'data'
dir.create(datdir, showWarnings = FALSE)
adm <- raster::getData("GADM", country = "PHL",
                       level = 2, path = datdir)
plot(adm[adm$NAME_1 == "Tarlac",])

Try to understand the code below, and let me know if you have questions. Feel free to use this code as an example and use this for the excercise below.

mar <- adm[adm$NAME_1 == "Marinduque",]
plot(mar, bg = "dodgerblue", axes=TRUE)
plot(mar, lwd = 10, border = "skyblue", add=TRUE)
plot(mar, col = "green4", add = TRUE)
labels = as.character(mar$NAME_2), cex = 1.1, col = "white", font = 2))
## Warning: use coordinates method
mtext(side = 3, line = 1, "Provincial Map of Marinduque", cex = 2)
mtext(side = 1, "Longitude", line = 2.5, cex=1.1)
mtext(side = 2, "Latitude", line = 2.5, cex=1.1)
mtext(side = 1, line = -2,
"Projection: Geographic\n
Coordinate System: WGS 1984    \n
Data Source: GADM.org    ", adj = 1, cex = 0.5, col = "grey20")

Lesson 1 exercise

Write your own function to create a map. Submit a clear, reproducible, and documented script containing a function to create a spatial map for a country of your choice:

  • Define a function.
  • Demonstrate the function (i.e. use it to plot a map of a country and a certain level as an example).
  • Keep it simple (!) e.g. just plot the adminstrative boundaries.
  • The function should accept country and level as input arguments.
  • Optional (for a bonus): Getting the administrative labels and legend correct.
  • Use the script review center in BlackBoard to submit your "documented script" in time before 17.15. See description in BlackBoard - follow the file naming precisely.
  • Tomorrow your team has to review one script of the other team before 11:00.
# Name: Team name and members of the team
# Date: 8 January 2018

# Import packages

# Define the function

# An example based on that function
  • Tips:
    • Do not use any paths e.g. mydocument/John/blabla specific for your computer. In this excercise you should not need any paths.
    • Do not set the working directory (see above i.e. also specific to your computer).
    • Use the script template above.

Assessment and evaluation criteria for the excercise

It needs to work (yes/no) when we test your script on our computer within R.

References and more info