Week 3: Python for geo-scripting

WUR Geoscripting WUR logo

"Week 3: Python for geo-scripting"

Good morning! Today we will start working with Python for geo-scripting and do a refresher of functions in Python. First complete the Intro to Python course in Datacamp and then go through today's tutorial.

Today's schedule

  • Follow DataCamp Intro to Python course
  • Learn how to work with virtual environments: Conda
  • Learn how to create Jupyter Notebook
  • Refresher for python

Using Python within Linux:

  • Wide user community and support
  • Free
  • Flexiblility
  • Open-source

How?! via:

  • GDAL/OGR
  • GEOS
  • Rasterio
  • Shapely
  • Mapplotlib
  • Folium
  • Fiona
  • GeoPandas
  • ArcPy (requires ArcMap license)

Have a look at this question on GIS StackExchange:

Python editors and IDEs

  • Most modern text editors do nice python highlighting, e.g. Sublime Text can be set up nicely for Python.
  • Jupyter notebook or Ipython Notebook are good choices for short scripts. The notebooks allows you to have your source code, equations, visualizations, results and comments in one document! Jupyter Notebook is a continuation of Ipython Notebook. The name Jupyter was inspired by the leading open languages for science (that is, Julia, Python, and R) (blog about history of Jupyter Notebook). See here for a simple Jupyter notebook example. More info.
  • There are a number of proper Integrated Development Environments [IDE] for Python. An IDE is a software application that provides facilities for software development. Personally I have good experience with PyCharm. For running on a server, rodeo gives a similar interface as RStudio server.
  • Spyder is nice a lightweight IDE and can be installed from the terminal (sudo apt-get install spyder).

Python package management

For Python, a set of tools co-exist for installing and managing packages. From most to least desirable, you should try to install packages by:

  • Using the distribution's package manager (on Ubuntu, that's sudo apt-get install python-*). This is the easiest and guaranteed to work right, although the package version might be older than you expect.
  • Using conda to install packages and to keep separate sets of packages. With Python, often the dependencies and versions can differ from project to project. If you want to specifally use a combination of packages with certain versions and keep that package set saved in an environment, then conda has conda environments to save this set of packages. A conda environment is similar to a virtual environment, but different from a virtual machine. Conda is available for Windows, macOS and Linux.
  • Using pip to manage packages and virtualenv to manage your environment. pip is a python package installer, that is pretty much standard nowadays. It is recommended to run it in a virtualenv to prevent conflicts, however, virtualenvs can be problematic when it comes to package dependencies. virtualenv can be used to keep separate sets of packages in its own directory tree. A demonstration of pip and virtualenv can be found here here

Anaconda or Miniconda

Anaconda is a Python virtual environment and package manager that is very useful to automatically keep track of package versions and dependencies. Anaconda itself comes already preinstalled with a large number of packages such as numpy and gdal. Instead of using the whole Anaconda, you can instead use the base version called Miniconda. Miniconda has the same features, but only installs the base package and requires manual installation for extra packages.

It is useful to learn how to use Conda, because it is an easy and cross-platform way of installing latest Python packages without affecting any Python installations that you already have on your system. In addition, Anaconda is required to set up sen2cor, the Sentinel-2 imagery atmospheric correction tool.

Miniconda installation

To install Miniconda in your Linux environment, we have prepared a short Bash script for you. Just run the following lines of code, line by line, in a new terminal window.

MINICONDA_VERSION="Miniconda3-latest-Linux-x86_64"
pushd /tmp
curl -O https://repo.continuum.io/miniconda/${MINICONDA_VERSION}.sh
## This installation script will require user input
bash ${MINICONDA_VERSION}.sh
rm ${MINICONDA_VERSION}.sh
popd

When prompted, you can just use the default options (i.e. press Enter). However, if you don't want Conda to replace the default Python interpreter in your system, you should say No in the last prompt. In that case, every time you wish to use Conda you need to run the following line of code in each terminal window.

# $HOME expands to your home directory
export PATH=$HOME/miniconda3/bin:$PATH

For installation instructions in other operating systems, please go to the Miniconda installation page.

Usage

The basic usage of Conda, after installed, is as follows.

To search for a package:

conda search spyder

This would give you a list of all packages that have "spyder" in the name and list all available versions.

conda install spyder

This would install the latest version of the spyder package (Python IDE). Note that this would install it into your user's root virtual environment (by default it is $HOME/miniconda3). Conda is able to create any number of isolated virtual environments, for example:

conda create --name geotest python=2.7 numpy

This would create a new environment called geotest with Python 2.7 and numpy installed into it. To list the available environments:

conda info --envs

Conda puts an asterisk (*) in front of the active environment. To activate an environment:

## Linux, macOS
source activate geotest
## Windows
activate geotest

After this, the current environment is shown in (parentheses) or [brackets] in front of your prompt ((astrolab)$). To deactivate the environment and go back to the default one:

## Linux, macOS
source deactivate
## Windows
deactivate

To remove the environment geotest:

conda remove --name geotest --all

Note that the activated environment is only valid for the shell in which you activated it. For instance, if you close the shell window and open a new one you will have to activate it again. Additionally, if you use sudo commands to call Python, it will use the system's Python interpreter and not the active environment for security reasons. There should be no reason to call Python code in Conda with sudo rights in any case, since all packages are installed with your user permissions rather than root's.

In addition, as you saw before, Conda is able to install some non-Python packages that have Python bindings, such as Spyder and GDAL. This is useful for making sure your Python and binary versions match and do not interfere with the system-wide ones. However, since those packages are installed into a virtual environment, they will not be accessible from your system menu. Instead, to run e.g. Spyder from within a Conda environment called ide, you would need to do something like this:

source activate ide
spyder

It is useful to check whether the executable comes from the system or a virtual environment by using the which or type commands:

type gdalinfo
## gdalinfo is /usr/bin/gdalinfo

This shows that you would be running GDAL from the system rather than the Conda virtual environment, otherwise the path would include miniconda and the virtual environment name.

Some helpful utilities are:

  • conda list to check which packages are installed in root or in the active environment;
  • which python or python --version to check which python verison is used in the environment;
  • conda install --name astrolab matplotlib to install extra modules in your (running) conda environment.

Getting started with Python within a Linux OS

  • Launch a Linux virtual machine and login.

  • Open the Terminal and type the following to check the installed GDAL version:

## from R: system("gdal-config --version")
## From the terminal:
python2 --version
python3 --version
gdal-config --version
## Python 2.7.13
## Python 3.6.3
## 2.2.3
  • Type the following to start Python 2.7:
python3 # type this in the terminal to start the python interpreter

An example script to find out what the installed Python version is (more info in question asked on stackoverflow)

import sys
print(sys.version)
## 3.6.3 (default, Oct 11 2017, 14:49:33) [GCC]
To exit Python in the terminal:
exit()
# or 
quit()

Running Python in Jupyter Notebooks

You can program Python in your terminal, but more facilities are available to make coding and documenting in Python easier through notebooks or Python IDEs. Today we will have a go with Jupyter Notebooks. In other lessons you can use the Python IDE Spyder. For now try the following commands in your terminal:

# Set directory at home
cd

# Create conda environment
conda create -n geoscripting numpy jupyter # geoscripting is name of your new conda environment

# Activate conda environment
source activate geoscripting

By creating your conda environment, you created a set of packages and a Python version to be used only in that specific conda environment. In your terminal you can see the name of your conda environment at the start of the command line before your user information. Now that we are working in our conda environment with all the necessary packages with correct versions, we can start with setting up our notebook.

# Start a Jupyter Notebook from the terminal
jupyter notebook

If everything goes according to plan, Jupyter will pop up in your browser. You will see a menu with all the files in your working directory. Note: the Jupyter notebook will only be able to see files that are accessible from the working directory in which from which you launch it! So keep track of the working directory in your terminal. A good practice is to start it in your project's directory.

Once you are in the desired working directory, in the right top click on New <86><92> Folder (if you don't have folder/project structure yet) and/or click on Python 3 to create a Jupyter Notebook. Click on help and have a go at the User Interface Tour. Give your notebook a name.

These are the basic functions you will need today:

  • Save and checkpoint
  • Insert cell below
  • Run
  • Code/Markdown/Heading

Similar to RMarkdown, Jupyter Notebooks has code cells (called Code) and text cells (called Markdown). Insert some extra cells by clicking the + button and change the first cell from code to markdown. Enter some documentation for your code (e.g. your team name, exercise and date). Leave the other cells on code. To run code in a code cell, select it and press the Run button.

A short Python refresher

Finding help

Now we can try some coding. First we learn how to look for help while coding in Python. In the second cell type the code below and run it (ctrl + enter is shortcut for run cell).

import sys
help(sys)
print("-------------------------------------")
help(1)

See how the functions in the sys module got listed and how we got information how to work with integers. Sometimes you also need to use the internet to find information.

Question 1: What does this mean __ __ around words: e.g: __doc__?

Try out the following!!!

help('hamster')
## No Python documentation found for 'hamster'.
## Use help() to get the interactive help utility.
## Use help(str) for help on the str class.

See also:

Finding information via Pydoc

Type the script below in your terminal to start a HTTP server with information from pydoc or go to https://docs.python.org/2/library/pydoc.html.

pydoc -p 1234
echo "pydoc server ready at http://localhost:1234/"

Then go to http://localhost:1234/ via your preferred browser. You can see a list of built-in modules and available modules.

Numbers and variables

We continue working in Python.

Question 2: What is the difference between 10 and 10.0 when dealing with data types in Python?

print(int(10.6))
## 10

Variable is a storage location or symbolic name to a value e.g.:

building = 'Gaia'
buildingNumber = 101
'Gaia'
"doesn't"
'Gaia' + ' is in Wageningen'

There is no need to say or define the datatype, python has a loose type variable declaration.

If it walks like a duck, swims like a duck and quacks like a duck I call it a duck

Python is basically a list of objects.

Lists

Now we will have a go with lists.

Tip: Variables, functions and methods that you define in one of your Jupyter Notebook cells can be used in other cells too.

Run this code in one cell:
campus = ['Gaia', 'Lumen', 'Radix', 'Forum']
# how to can we print Forum?
print(campus[3])
# how to access the end of the list (while having no idea how big it is)
print(campus[-1])
# how to access the first 3 items
print(campus[0:3])
## Forum
## Forum
## ['Gaia', 'Lumen', 'Radix']
Run this code in another cell. We will do some appending, inserting, extending and steps:
campus.append("Atlas")
campus.insert(1,"SoilMuseum")
campus.extend(["Action", "Vitae", "Zodiac"])
print(campus)
print(campus[::2])
## list[start:end:step]

See how the notebook remembered how you set the variable campus.

Question 3: What are the major differences between Append/Extend?

Question 4: What building is campus[-2]?

Dictionaries, loops, if/else

Let there be dictionaries... A dictionary is an unordered set of key:value pairs. Like in the dictionary, 'food':'voedsel'.

# dictionary
campusDic = {101:'Gaia',
             100:'Lumen',
             107:'Radix',
             102:'Forum',
             104:'Altas'}
print(campusDic[102])
## Forum

Loops: watch out here with code indentation. Python uses indentation to define code blocks and not {} like other languages. print(building) in the following code has to be indented by 1 tab or 4 spaces ( recommended ).

campus = ['Gaia','Lumen', 'Radix', 'Forum']
for building in campus:
    print(building)
## Gaia
## Lumen
## Radix
## Forum

Here, building is a variable that will contain any item in the campus list.

Generic loops in Python have to interact over a sequence of objects e.g.

range(5)
for number in range(5):
    print(number)
## 0
## 1
## 2
## 3
## 4

Object interaction and functional programming is an important part of Python programming and its tools are extensive. if/else:

x = 3
if x < 3:
    print("below 3")
else:
    print("above 3")
## above 3
x = 3
if x == 1:
    print("it is one")
elif x==2:
    print("it is two")
elif x==3:
    print("it is three")
else: 
    print("above 3")
## it is three

Functions

A function is a section of code that does something specific that you want to use multiple times without having to type the full function again but just call the function by its name.

def printPotato():
    print("potato")

printPotato()
## potato

Functions accept arguments and return variables e.g.:

def printHelloName(name):
    print("Good morning " + name)

printHelloName("Jan")
## Good morning Jan

return is used to indicate what you want to obtain from the function, you can return multiple items and return can be used to assign output to variables outside of the function.

def times3(number):
    tmp = number*3
    return tmp, number
    
print(times3(4))

output, input = times3(4)
print(output)
print(input)
## (12, 4)
## 12
## 4

Importing modules

Try this!

import this

This poem is called the Zen of Python and describes how Python should be used. It is an inside joke, but has some good practices to it. There are more best practice guides for Python best of best practices guide in Python.

from __future__ import braces

Another inside joke ... where your Python says that it will never delimit coding blocks by braces instead of indentation. Let's continue with more serious programming.

import math
print(dir(math)) #show names in math module
## ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
  • The best way to learn a module is by checking documentation.
  • Modules are Python's butter and bread.
  • A module contains code that can be used or excuted.
Basically there are three ways to load a module:
import math
print(math.pi)

from math import pi
print(pi)

import numpy as np
print(np.pi)
## 3.141592653589793
## 3.141592653589793
## 3.141592653589793

Question 5: Which is the best way to import modules?

Some important internal modules:

  • os: Access to operating system features
  • os.path: Manipulating of file names
  • sys: System specific configuration
  • glob: Filename pattern matching
  • math: Mathametical functions
  • datetime: Date/Time manipulation

Question 6: What is the difference between os and os.path?

Some examples:

import glob
glob.glob("*")
from datetime import timedelta, date
delta = timedelta(days=7)
print(date.today())
print(date.today()+delta)
## 2017-12-18
## 2017-12-25

File access

File access is very simple for 99% of the cases.

Write something to file:

fileObj = open('test.txt','w')
fileObj.write('some simple text')
fileObj.close()

And read something from a file:

fileObj = open('test.txt','r')
a = fileObj.read()
print(a)
fileObj.close()
## some simple text

Question 7: What does w and r mean?

Error handling

Sometime problems occur... Errors detected during execution are called exceptions.

Good code deals with exceptions:

open("/foo0")
## Traceback (most recent call last):
##   File "", line 1, in 
## FileNotFoundError: [Errno 2] No such file or directory: '/foo0'

The file doesn't exist, so the script stops and outputs an ugly message.

How to deal with this I/O error? Good programming!


try:
    open("foo")
except IOError:
    print("no file")

## we can be more precise:
try:
    open("/foo")
except IOError:
    print("no file")
## no file
## no file

Visualization

Jupyter Notebooks can display output of your code, such as graphs, images and maps in the notebook. A lot of cool visualizations including code in Python are available from the Python Graph Gallery. Before we can do the visualizations, we want to add some Python modules via the terminal.

## Add Matplotlib and seaborn modules
conda install --name geoscripting matplotlib seaborn
# as conda install --name env_name pythonmodule pythonmodule

We will make a graph with the Seaborn module and plot it with Matplotlib. Give it a try! (If you have just installed it, you might need to close the Jupyter notebook and reopen it.)

# Load library
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = sns.load_dataset('iris')
 
# Create plot
sns.pairplot(df, kind = "reg", hue = "species")
plt.show()

Nice visualization huh! Good job.

Now make a map with Folium. Note: Folium is very new and so is not in the main conda channel, so install it with conda install folium -c conda-forge.

import folium

SF_COORDINATES = (51.9871868, 5.6593948)

map = folium.Map(location=SF_COORDINATES, tiles='Mapbox Control Room', zoom_start=5)
display(map)

Exiting the Jupyter Notebook

Your Jupyter Notebook is automatically saved as an .ipynb file (extension comes from the historic name "IPython Notebook") on your computer, but you can also download it as a python script, pdf or html. You can also save it manually. To exit a notebook properly, use File <86><92> Close and Halt.

By pressing Ctrl + c in the terminal where Jupyter notebook server is running, you cancel the running process. The terminal goes back to command line and you can exit the virtual environment by typing source deactivate.

source deactivate

Python statements

  • if
  • for
  • while
  • try
  • type shows type (e.g. int, float, str) of object
  • class which executes a block of code and attaches its local names to a class, for use in object oriented programming
  • def which defines a function or statement
  • with which encloses a code block within a context manager
  • pass statement, which serves as a NOP (no operation)
  • assert, used during debugging to check for conditions that ought to apply
  • yield
  • import

Python from R

By the way, as you may have noticed, this document is created from an RMarkdown source, which can also include Python code blocks (have a look at the source on github). Similarly Jupyter Notebooks can also use R and other languages!

Assignment

The assignment for today is to finish the datacamp course: Intro to Python for Data Science. If you finished early and still want to write more scripts, then you can follow one of the fun tutorials below.

More info