Week 3: Python for geo-scripting

WUR Geoscripting WUR logo

"Week 3: Python for geo-scripting"

Python week Friday to Wednesday:

Schedule Overview

  • Friday:
    • Follow DataCamp Intro to Python course.
    • See blackboard: course description section
  • Monday:
    • Morning: Vector. This also contains an assignment!
    • Afternoon: Lecture by Jorge on Python
  • Tuesday:
    • Tutorial: Raster. This also contains an assignment!
    • Afternoon: Presentation by Erik van Schaik about cloud computing
  • Wednesday:
    • In the afternoon you will get an intro to ArcPy by Aldo. On Blackboard you can find examples and a small assignment!
    • In the morning check out the following IPython Notebooks created by Aldo Bergsma, to compare the open source implementation with ArcPy:

Reminder: Self-study is critical for the completion of the excercises at the end of the tutorial!

Learning outcomes

  • Knowing how to handle spatial data using Python:
    • vector data handling
      • creating a point, writing and modifying a shape file
    • raster data handling
      • reading and writing raster data
      • calculating indices
      • projection raster data

Intro

Using Python within Linux:

  • Wide user community and support
  • Free
  • Flexiblility
  • Open-source

How?! via: - GDAL/OGR - GEOS

Have a look at this question on GIS StackExchange:

  • https://gis.stackexchange.com/questions/34509/alternatives-to-using-arcpy
  • https://gis.stackexchange.com/questions/16657/clipping-raster-with-vector-layer-using-gdal

Python editors and IDEs

  • Most modern text editors do nice pyhton highlighting, e.g. Sublime Text can be set up nicely for python
  • Jupyter notebook is a good choice for short scripts, e.g. the the execises here. It gives you a nice option with source code results and comments in one document ! See here for a simple Jupyter notebook example. More info
  • There are a number of propper IDE,s for python. Personnaly I made good experience with PyCharm. For running on a server, rodeo gives a similar interface as RStudio server.
  • Spyder is nice a lightweigth and can be installed from the terminal (sudo apt-get install spyder)

Getting started with Python within a Linux OS

  • Launch a Linux virtual machine and login.

  • Open the Terminal and type the following to check the installed GDAL version:

## from R: system("gdal-config --version")
## From the terminal:
which python
gdal-config --version
## /usr/bin/python
## 2.1.0
  • type the following to start python and find out what the python version is:
python # type this in the terminal to start the python interpreter

An example script to find out what the installed Python version is (more info in question asked on stackoverflow)

import sys
print sys.version #parentheses necessary in python 3. 
## 2.7.12 (default, Nov 19 2016, 06:48:10) 
## [GCC 5.4.0 20160609]
To exit python:
exit()
  • Open the python script within a Python editor

A short Python refresher

Finding help

import sys
help(sys)
help(1)

Question: What does this mean ___ ___ around words: e.g: ___doc___

Try out the following!!!

help('hamster')
## no Python documentation found for 'hamster'

see also:

Finding information via Pydoc

Go to: https://docs.python.org/2/library/pydoc.html

pydoc -p 1234
echo "pydoc server ready at http://localhost:1234/"

Then go to http://localhost:1234/ via your preferred browser.

Numbers and variables

Question 1: What is the difference between 10 and 10.0 when dealing with datatypes in Python?

print(int(10.6))
## 10

Variable is a storage location or symbolic name to a value e.g.

building = 'Gaia'
buildingNumber = 101
'Gaia'
"doesn't"
'Gaia' + 'is in Wageningen'

There is no need to say or define the datatype, python has a loose type variable declaration.

If it walks like a duck, swims like a duck and quacks like a duck I call it a duck

Python is basically a list of objects: List are organised with indexes. E.g.

Lists

campus = ['Gaia','Lumen', 'Radix', 'Forum']
# how to can we print Forum?
print(campus[3])
# how to access the end of the list (while having no idea how big it is)
print(campus[-1])
# how to access the first 3 items
print(campus[0:3])
## Forum
## Forum
## ['Gaia', 'Lumen', 'Radix']
Appending, inserting, extending and steps:
campus = ['Gaia','Lumen', 'Radix', 'Forum']
campus.append("Atlas")
campus.insert(1,"SoilMuseum")
campus.extend(["Action","Vitae", "Zodiac"])
print campus
print campus[::2]
## list[start:end:step]
## ['Gaia', 'SoilMuseum', 'Lumen', 'Radix', 'Forum', 'Atlas', 'Action', 'Vitae', 'Zodiac']
## ['Gaia', 'Lumen', 'Forum', 'Action', 'Zodiac']

Question 2: What are the major differences between Append/Extend?

Question 3: What building is campus[-2]?

Dictionaries, loops, if/else

Let there be Dictionaries... Dictionary is an unordered set of key:value pairs. Like in the dictionary, 'food':'voedsel'.

# dictionary
campusDic = {101:'Gaia',
             100:'Lumen',
             107:'Radix',
             102:'Forum',
             104:'Altas'}
print campusDic[102]
## Forum

Loops: watch out here with code indentation. Python uses indentation to define code blocks and not {} like other languages. print building has to be indented by 1 tab or 4 spaces (recommended).

campus = ['Gaia','Lumen', 'Radix', 'Forum']
for building in campus:
    print building
## Gaia
## Lumen
## Radix
## Forum

Here, building is a variable that will contain any item in the campus list.

Generic loops in Python have to interact over a sequence of objects e.g.

range(5)
for number in range(5):
    print number
## 0
## 1
## 2
## 3
## 4

Object interaction and functional programming is an important part of Python programming and its tools are extensive. if/else:

x = 3
if x < 3:
    print "below 3"
else:
    print "above 3"
## above 3
x = 3
if x == 1:
    print "it is one"
elif x==2:
    print "it is two"
elif x==3:
    print "it is three"
else: 
    print "above 3"
## it is three

Functions

A function is a section of code that does something specific that you want to use multiple times without having to type the full function again but just call the function by its name.

def printPotato():
    print "potato"
printPotato()
## potato

Functions accept arguments and return variables e.g.:

def printPotato(something):
    print something
printPotato("test")
## test

return is used to indicate what you want to obtain from the function, and you can return multiple items.

def times3(number):
    tmp = number*3
    return tmp, number
print times3(4)
## (12, 4)

Importing modules

Try this!

import this
from __future__ import braces
import math
print dir(math)
## ['__doc__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'hypot', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc']
  • The best way is to check documentation: https://docs.python.org/2/.
  • Modules are Python's butter and bread.
  • A module contains code that can be used or excuted.
Basically 2 ways to load a module:
import math
print math.pi

from math import pi
print pi
## 3.14159265359
## 3.14159265359

Question 4: Which is the best way to import modules?

Some important internal modules:

  • os: Access to operating system features
  • os.path: Manipulating of file names
  • sys: System specific configuration
  • glob: Filename pattern matching
  • math: Mathametical functions
  • datetime: Date/Time manipulation

Question 5: What is the difference between os and os.path?

Some examples:

import glob
glob.glob("*")
from datetime import timedelta,date
delta = timedelta(days=7)
print date.today()
print date.today()+delta
## 2017-01-20
## 2017-01-27

File access

File access is very simple for 99% of the cases.

Write something to file:

fileObj = open('test.txt','w')
fileObj.write('some simple text')
fileObj.close()
fileObj = open('test.txt','r')
a = fileObj.read()
print a
fileObj.close()
## some simple text

Question 6: What does r and w mean?

Error handling

Sometime problems occur... Errors detected during execution are called exceptions.

Good code deals with exceptions:

open("/foo0")
## Traceback (most recent call last):
##   File "", line 1, in 
## IOError: [Errno 2] No such file or directory: '/foo0'

The file doesn't exist, so the script stops and outputs an ugly message.

How to deal with this I/O error? Good progamming!


try:
    open("foo")
except IOError:
    print "no file"

## we can be more precise:
try:
    open("/foo")
except IOError:
    print "no file"
## no file
## no file

Python statements

  • if
  • for
  • while
  • try
  • class which executes a block of code and attaches its local names to a class, for use in object oriented programming
  • def which defines a function or statement
  • with which encloses a code block within a context manager
  • pass statement, which serves as a NOP (no operation)
  • assert, used during debugging to check for conditions that ought to apply
  • yield
  • import

R from Python

A very simple example using the Python Rpy2 module (only possible on Mac/Linux). This is working with your OS-GEO live linux virtual machine. Try it out!

import rpy2.robjects as robjects
pi = robjects.r['pi']
print(pi[0])
## 3.14159265359

By the way, as you may have noticed, this document is created from an RMarkdown source, which can also include Python code blocks (have a look at the source on github). Similary, the newest version of IPython notebook, now can also use R and other languages!

Python package management

For Python, a set of tools co-exist for installing and managing packages. From most to least desirable, you should try to install packages:

  • Using the distribution's package manager (on Ubuntu, that's sudo apt-get install python-*). This is the easiest and guaranteed to work right, although the package version might be older than you expect.
  • With Python, often the dependencies and and versions can differ from project to project. Also, installing packages not via the distribution package manager can cause conflicts between packages. Virtual environments can be used to keep seperate sets of packages in its own directory tree. See an intro to pip and virtualenv here
  • pip is a python package installer, that is pretty much standard nowadays. It is recommended to run it in a virtualenv to prevent conflicts, however, virtualenvs can be problematic when it comes to package dependencies.
  • A different solution is conda which makes it very easy to install and maintain different external dependencies such as gdal and even R accross different operating systems. However, same caveats as for pip apply, and it is not available as a package in Ubuntu.

Handling Vector data with Python

Go to Vector Data handling in Python

Raster data with Python

Go to Raster Data handling in Python

Note: this is an Ipython notebook (now called Jupyter Notebook!)

More info