Good morning! Today we will start working with Python for geo-scripting and do a refresher of functions in Python. First complete the Intro to Python course in Datacamp and then go through today's tutorial.
Using Python within Linux:
How?! via:
Have a look at this question on GIS StackExchange:
sudo apt-get install spyder
).For Python, a set of tools co-exist for installing and managing packages. From most to least desirable, you should try to install packages by:
sudo apt-get install python-*
). This is the easiest and guaranteed to work right, although the package version might be older than you expect.pip
to manage packages and virtualenv
to manage your environment. pip is a python package installer, that is pretty much standard nowadays. It is recommended to run it in a virtualenv to prevent conflicts, however, virtualenvs can be problematic when it comes to package dependencies. virtualenv
can be used to keep separate sets of packages in its own directory tree. A demonstration of pip
and virtualenv
can be found here hereAnaconda is a Python virtual environment and package manager that is very useful to automatically keep track of package versions and dependencies. Anaconda itself comes already preinstalled with a large number of packages such as numpy
and gdal
. Instead of using the whole Anaconda, you can instead use the base version called Miniconda. Miniconda has the same features, but only installs the base package and requires manual installation for extra packages.
It is useful to learn how to use Conda, because it is an easy and cross-platform way of installing latest Python packages without affecting any Python installations that you already have on your system. In addition, Anaconda is required to set up sen2cor
, the Sentinel-2 imagery atmospheric correction tool.
To install Miniconda in your Linux environment, we have prepared a short Bash script for you. Just run the following lines of code, line by line, in a new terminal window.
MINICONDA_VERSION="Miniconda3-latest-Linux-x86_64"
pushd /tmp
curl -O https://repo.continuum.io/miniconda/${MINICONDA_VERSION}.sh
## This installation script will require user input
bash ${MINICONDA_VERSION}.sh
rm ${MINICONDA_VERSION}.sh
popd
When prompted, you can just use the default options (i.e. press Enter). However, if you don't want Conda to replace the default Python interpreter in your system, you should say No
in the last prompt. In that case, every time you wish to use Conda you need to run the following line of code in each terminal window.
# $HOME expands to your home directory
export PATH=$HOME/miniconda3/bin:$PATH
For installation instructions in other operating systems, please go to the Miniconda installation page.
The basic usage of Conda, after installed, is as follows.
To search for a package:
conda search spyder
This would give you a list of all packages that have "spyder" in the name and list all available versions.
conda install spyder
This would install the latest version of the spyder
package (Python IDE). Note that this would install it into your user's root virtual environment (by default it is $HOME/miniconda3
). Conda is able to create any number of isolated virtual environments, for example:
conda create --name geotest python=2.7 numpy
This would create a new environment called geotest
with Python 2.7 and numpy
installed into it. To list the available environments:
conda info --envs
Conda puts an asterisk (*) in front of the active environment. To activate an environment:
## Linux, macOS
source activate geotest
## Windows
activate geotest
After this, the current environment is shown in (parentheses) or [brackets] in front of your prompt ((astrolab)$
). To deactivate the environment and go back to the default one:
## Linux, macOS
source deactivate
## Windows
deactivate
To remove the environment geotest
:
conda remove --name geotest --all
Note that the activated environment is only valid for the shell in which you activated it. For instance, if you close the shell window and open a new one you will have to activate it again. Additionally, if you use sudo
commands to call Python, it will use the system's Python interpreter and not the active environment for security reasons. There should be no reason to call Python code in Conda with sudo
rights in any case, since all packages are installed with your user permissions rather than root's.
In addition, as you saw before, Conda is able to install some non-Python packages that have Python bindings, such as Spyder and GDAL. This is useful for making sure your Python and binary versions match and do not interfere with the system-wide ones. However, since those packages are installed into a virtual environment, they will not be accessible from your system menu. Instead, to run e.g. Spyder from within a Conda environment called ide
, you would need to do something like this:
source activate ide
spyder
It is useful to check whether the executable comes from the system or a virtual environment by using the which
or type
commands:
type gdalinfo
## gdalinfo is /usr/bin/gdalinfo
This shows that you would be running GDAL from the system rather than the Conda virtual environment, otherwise the path would include miniconda
and the virtual environment name.
Some helpful utilities are:
conda list
to check which packages are installed in root
or in the active environment;which python
or python --version
to check which python verison is used in the environment;conda install --name astrolab matplotlib
to install extra modules in your (running) conda environment.Launch a Linux virtual machine and login.
Open the Terminal and type the following to check the installed GDAL version:
## from R: system("gdal-config --version")
## From the terminal:
python2 --version
python3 --version
gdal-config --version
## Python 2.7.13
## Python 3.6.3
## 2.2.3
python3 # type this in the terminal to start the python interpreter
An example script to find out what the installed Python version is (more info in question asked on stackoverflow)
import sys
print(sys.version)
## 3.6.3 (default, Oct 11 2017, 14:49:33) [GCC]
exit()
# or
quit()
You can program Python in your terminal, but more facilities are available to make coding and documenting in Python easier through notebooks or Python IDEs. Today we will have a go with Jupyter Notebooks. In other lessons you can use the Python IDE Spyder. For now try the following commands in your terminal:
# Set directory at home
cd
# Create conda environment
conda create -n geoscripting numpy jupyter # geoscripting is name of your new conda environment
# Activate conda environment
source activate geoscripting
By creating your conda environment, you created a set of packages and a Python version to be used only in that specific conda environment. In your terminal you can see the name of your conda environment at the start of the command line before your user information. Now that we are working in our conda environment with all the necessary packages with correct versions, we can start with setting up our notebook.
# Start a Jupyter Notebook from the terminal
jupyter notebook
If everything goes according to plan, Jupyter will pop up in your browser. You will see a menu with all the files in your working directory. Note: the Jupyter notebook will only be able to see files that are accessible from the working directory in which from which you launch it! So keep track of the working directory in your terminal. A good practice is to start it in your project's directory.
Once you are in the desired working directory, in the right top click on New User Interface Tour
. Give your notebook a name.
These are the basic functions you will need today:
Save and checkpoint
Insert cell below
Run
Code/Markdown/Heading
Similar to RMarkdown, Jupyter Notebooks has code cells (called Code) and text cells (called Markdown). Insert some extra cells by clicking the + button and change the first cell from code to markdown. Enter some documentation for your code (e.g. your team name, exercise and date). Leave the other cells on code. To run code in a code cell, select it and press the Run button.
Now we can try some coding. First we learn how to look for help while coding in Python. In the second cell type the code below and run it (ctrl + enter is shortcut for run cell).
import sys
help(sys)
print("-------------------------------------")
help(1)
See how the functions in the sys
module got listed and how we got information how to work with integers. Sometimes you also need to use the internet to find information.
Question 1: What does this mean
__ __
around words: e.g:__doc__
?
Try out the following!!!
help('hamster')
## No Python documentation found for 'hamster'.
## Use help() to get the interactive help utility.
## Use help(str) for help on the str class.
See also:
Type the script below in your terminal to start a HTTP server with information from pydoc or go to https://docs.python.org/2/library/pydoc.html.
pydoc -p 1234
echo "pydoc server ready at http://localhost:1234/"
Then go to http://localhost:1234/
via your preferred browser. You can see a list of built-in modules and available modules.
We continue working in Python.
Question 2: What is the difference between 10 and 10.0 when dealing with data types in Python?
print(int(10.6))
## 10
Variable is a storage location or symbolic name to a value e.g.:
building = 'Gaia'
buildingNumber = 101
'Gaia'
"doesn't"
'Gaia' + ' is in Wageningen'
There is no need to say or define the datatype, python
has a loose type variable declaration.
If it walks like a duck, swims like a duck and quacks like a duck I call it a duck
Python is basically a list of objects.
Now we will have a go with lists.
Tip: Variables, functions and methods that you define in one of your Jupyter Notebook cells can be used in other cells too.
Run this code in one cell:campus = ['Gaia', 'Lumen', 'Radix', 'Forum']
# how to can we print Forum?
print(campus[3])
# how to access the end of the list (while having no idea how big it is)
print(campus[-1])
# how to access the first 3 items
print(campus[0:3])
## Forum
## Forum
## ['Gaia', 'Lumen', 'Radix']
campus.append("Atlas")
campus.insert(1,"SoilMuseum")
campus.extend(["Action", "Vitae", "Zodiac"])
print(campus)
print(campus[::2])
## list[start:end:step]
See how the notebook remembered how you set the variable campus
.
Question 3: What are the major differences between Append/Extend?
Question 4: What building is
campus[-2]
?
Let there be dictionaries... A dictionary is an unordered set of key:value pairs. Like in the dictionary, 'food':'voedsel'.
# dictionary
campusDic = {101:'Gaia',
100:'Lumen',
107:'Radix',
102:'Forum',
104:'Altas'}
print(campusDic[102])
## Forum
Loops: watch out here with code indentation. Python uses indentation to define code blocks and not {}
like other languages. print(building)
in the following code has to be indented by 1 tab or 4 spaces ( recommended ).
campus = ['Gaia','Lumen', 'Radix', 'Forum']
for building in campus:
print(building)
## Gaia
## Lumen
## Radix
## Forum
Here, building
is a variable that will contain any item in the campus list.
Generic loops in Python have to interact over a sequence of objects e.g.
range(5)
for number in range(5):
print(number)
## 0
## 1
## 2
## 3
## 4
Object interaction and functional programming is an important part of Python programming and its tools are extensive. if
/else
:
x = 3
if x < 3:
print("below 3")
else:
print("above 3")
## above 3
x = 3
if x == 1:
print("it is one")
elif x==2:
print("it is two")
elif x==3:
print("it is three")
else:
print("above 3")
## it is three
A function is a section of code that does something specific that you want to use multiple times without having to type the full function again but just call the function by its name.
def printPotato():
print("potato")
printPotato()
## potato
Functions accept arguments and return variables e.g.:
def printHelloName(name):
print("Good morning " + name)
printHelloName("Jan")
## Good morning Jan
return
is used to indicate what you want to obtain from the function, you can return
multiple items and return can be used to assign output to variables outside of the function.
def times3(number):
tmp = number*3
return tmp, number
print(times3(4))
output, input = times3(4)
print(output)
print(input)
## (12, 4)
## 12
## 4
Try this!
import this
This poem is called the Zen of Python and describes how Python should be used. It is an inside joke, but has some good practices to it. There are more best practice guides for Python best of best practices guide in Python.
from __future__ import braces
Another inside joke ... where your Python says that it will never delimit coding blocks by braces instead of indentation. Let's continue with more serious programming.
import math
print(dir(math)) #show names in math module
## ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
import math
print(math.pi)
from math import pi
print(pi)
import numpy as np
print(np.pi)
## 3.141592653589793
## 3.141592653589793
## 3.141592653589793
Question 5: Which is the best way to import modules?
os
: Access to operating system featuresos.path
: Manipulating of file namessys
: System specific configurationglob
: Filename pattern matchingmath
: Mathametical functionsdatetime
: Date/Time manipulationQuestion 6: What is the difference between
os
andos.path
?
Some examples:
import glob
glob.glob("*")
from datetime import timedelta, date
delta = timedelta(days=7)
print(date.today())
print(date.today()+delta)
## 2017-12-18
## 2017-12-25
File access is very simple for 99% of the cases.
Write something to file:
fileObj = open('test.txt','w')
fileObj.write('some simple text')
fileObj.close()
And read something from a file:
fileObj = open('test.txt','r')
a = fileObj.read()
print(a)
fileObj.close()
## some simple text
Question 7: What does
w
andr
mean?
Sometime problems occur... Errors detected during execution are called exceptions.
Good code deals with exceptions:
open("/foo0")
## Traceback (most recent call last):
## File "", line 1, in
## FileNotFoundError: [Errno 2] No such file or directory: '/foo0'
The file doesn't exist, so the script stops and outputs an ugly message.
How to deal with this I/O error? Good programming!
try:
open("foo")
except IOError:
print("no file")
## we can be more precise:
try:
open("/foo")
except IOError:
print("no file")
## no file
## no file
Jupyter Notebooks can display output of your code, such as graphs, images and maps in the notebook. A lot of cool visualizations including code in Python are available from the Python Graph Gallery. Before we can do the visualizations, we want to add some Python modules via the terminal.
## Add Matplotlib and seaborn modules
conda install --name geoscripting matplotlib seaborn
# as conda install --name env_name pythonmodule pythonmodule
We will make a graph with the Seaborn
module and plot it with Matplotlib
. Give it a try! (If you have just installed it, you might need to close the Jupyter notebook and reopen it.)
# Load library
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset
df = sns.load_dataset('iris')
# Create plot
sns.pairplot(df, kind = "reg", hue = "species")
plt.show()
Nice visualization huh! Good job.
Now make a map with Folium
. Note: Folium is very new and so is not in the main conda channel, so install it with conda install folium -c conda-forge
.
import folium
SF_COORDINATES = (51.9871868, 5.6593948)
map = folium.Map(location=SF_COORDINATES, tiles='Mapbox Control Room', zoom_start=5)
display(map)
Your Jupyter Notebook is automatically saved as an .ipynb file (extension comes from the historic name "IPython Notebook") on your computer, but you can also download it as a python script, pdf or html. You can also save it manually. To exit a notebook properly, use File
By pressing Ctrl + c in the terminal where Jupyter notebook server is running, you cancel the running process. The terminal goes back to command line and you can exit the virtual environment by typing source deactivate
.
source deactivate
if
for
while
try
type
shows type (e.g. int, float, str) of objectclass
which executes a block of code and attaches its local names to a class
, for use in object oriented programmingdef
which defines a function or statementwith
which encloses a code block within a context managerpass
statement, which serves as a NOP
(no operation)assert
, used during debugging to check for conditions that ought to applyyield
import
By the way, as you may have noticed, this document is created from an RMarkdown source, which can also include Python code blocks (have a look at the source on github). Similarly Jupyter Notebooks can also use R and other languages!
The assignment for today is to finish the datacamp course: Intro to Python for Data Science. If you finished early and still want to write more scripts, then you can follow one of the fun tutorials below.