Tutorial 11: Python and Google Earth Engine
A note on running this tutorial
We recommend using Jupyter Notebook to run the code in this tutorial.
The code today makes use of interactive widgets, which will display
inline when Jupyter Notebooks. You can run the code in an IDE like
Spyder, but where the variable Map
is called, you will have
to export the map and look at the exported html in for example a
browser. To export the map to html you can use the following code:
Map.to_html(filename=‘map.html’, title=‘My Map’, width=‘100%’,
height=‘880px’)
Learning goals for today:
- Be able to read javascript documentation
- Visualize raster and vector data using Google Earth Engine in python
- Use the python api for Google Earth Engine for doing spatial analysis
You may already be familiar with or have explored Google Earth (https://earth.google.com/web/). Within Google Earth, you can virtually explore the Earth from your computer. The application offers various maps and information, including the familiar Google Maps base map and additional layers such as the age of the sea floor. Behind Google Earth lies it’s engine. It is a powerful cloud-based computational platform that can be used for analyzing all data available in Google Earth and more. It provides a wealth of information and tools that are accessible to everyone with a Google account.
Javascript, API’s and Object Oriented Programming
Google Earth Engine (GEE) is the underlying platform behind Google Earth, initially developed and released as a JavaScript API. But what exactly does that mean? Let’s break it down into JavaScript and API.
API
An API stands for “Application Programming Interface” and serves as a standardized language or set of rules that allows different software components to communicate with each other. APIs are commonly used for tasks such as accessing data (e.g., weather data from OpenWeatherMap) or interacting with a command line tool: GDAL with python or R, as we covered in the previous tutorials.
JavaScript
JavaScript (JS) is a widely used programming language primarily employed for web development. It follows an object-oriented programming (OOP) paradigm, which emphasizes creating reusable code through objects. Objects in JS can encapsulate both data (properties) and functionality (methods) within a single entity. OOP differs somewhat from the programming style we have used thus far, although Python can also be used as an OOP language. Working with the GEE Python API offers a fantastic opportunity to become familiar with object-oriented programming while leveraging the immense power of Google’s engine.
Google Earth Engine
Google Earth Engine (GEE) is a comprehensive platform that offers a vast collection of geospatial data along with powerful computational capabilities. It enables users to analyze and visualize geospatial data on a global scale, making it a valuable tool for various research domains such as environmental monitoring and resource management.
One of the prominent features of GEE is its extensive collection of raster data sources. You can explore the catalog of available datasets here. Additionally, GEE provides access to raster time series and non-raster data as well. For specific use cases and examples, you can refer to the article by Pérez-Cutillas and colleagues published this year here. They conducted a systematic review of research conducted using GEE. From their article:
The results of the meta-analysis following the systematic review showed that: (i) the Landsat 8 was the most widely-used satellite (25%); (i) the non-parametric classification methods, mainly Random Forest, were the most recurrent algorithms (31%); and (iii) the water resources assessment and prediction were the most common methodological applications (22%).
– Pérez-Cutillas et al., 2023
Nighttime light emission
In this tutorial, we will conduct a relatively straightforward analysis to demonstrate commonly used tools in Google Earth Engine (GEE). Our main focus is to show you how to work with GEE in Python, locate tools and documentation, and read the JavaScript (JS) documentation.
The analysis we will perform involves comparing the amount of light emitted at night in the Netherlands over time. However, you can choose a different area for your analysis, as we will be using a global dataset from 2012 to the present.
Environment set up
To set up the required environment, you will need the following
dependencies. Save the following YAML code in a new file named
env.yaml
then create and activate the environment using
mamba
name: pythonGee
channels:
- conda-forge
dependencies:
- python=3.10
- jupyter
- earthengine-api
- geemap
- pygadm
- geopandas
- geojson
- spyder
Once the environment is activated, run the code
jupyter notebook
in your terminal to open up a Jupyter
Notebook for your project (similiar to how you open Spyder). You can
refer to the Python Refresher or the Jupyter
Notebook Documentation for a quick guide on how to use the
notebooks. Once you have the notebook open, create a new file with the
Python3 kernel. You will write and run the rest of your code here.
To test the environment, try importing the following packages:
To work with Google Earth Engine (GEE), you will need a Google
account for authorization. The authentication process can be initiated
by calling the Authenticate
function from the
ee
module. Follow these steps:
- After running the
Authenticate
function, a URL will be printed in the output. - Click on the URL, which will redirect you to a Google account login page.
- Log in to your Google account to create an authorization token.
- When prompted with a warning stating that Google has not verified the app, click “Continue”. This warning is shown because the code we are about to run is not verified by Google, ensuring your account’s security.
- On the authorization page, select all the checkboxes. This grants permission to access GEE data and manage data and permissions in the cloud storage where the analysis will be performed.
- Finally, you will see an authorization code. Copy the code and paste
it into the text box that appears after running the
Authenticate
function. Press Enter to complete the authentication process.
You now have (hopefully) succesfully authenticated. Next we have to
initialize using the Initialize()
method:
If you do not encounter any warnings or errors, you have successfully authenticated and are now ready to work with the Earth Engine library in Python! Congratulations!
The next time you work in the same environment, you won’t need to
authenticate again. Simply run the Initialize
function, and
you’ll be all set to use the Earth Engine library.
Some notes on documentation
In the documentation we will refer to today, you will notice that there are examples available for both JavaScript and Python (Google Colab). This is because Google is working on implementing all the JavaScript functionality in Python. Not all the documentation or examples are at this time available in Python. Luckily, although the languages are quite different, the JavaScript and Python implementation for this package are very similair. It is a good skill to be able to read some JavaScript. Additionally, we are not sure what documentation is available for python at the time of following this tutorial. Therefor we will refer to the JavaScript documentation, but we will highlight any important differences and provide guidance on using this code in Python.
Some common differences are the following. To initialize a variable, in Python you can simply type:
In Javascript we can initialize a variable in various ways. The difference is where this variable in the code will be known:
Another often occuring difference in the documentation is how variables are passed to a function. In python we can do that as we are used to:
In Javascript we have to pass an object, similar to a dictionary in python:
var multiStats = img.reduceRegions({
collection: regionCol,
reducer: reducer,
scale: 30,
crs: 'EPSG:3310',
});
You do not need to understand this code itself for now, just note the differences in syntax.
Accessing Night Light Data
To analyze the development of light emission over the past 10 years, we will compare the amount of emitted light per province on a monthly basis.
For this analysis, we will need data on emitted light. We will use
the VIIRS Nighttime data, which you can learn more about here.
This dataset provides monthly average radiance composite images using
nighttime data from the Visible Infrared Imaging Radiometer Suite
(VIIRS) Day/Night Band (DNB). You can also explore the data in this
Esri viewer. To work with this data, we can create an
ImageCollection
object. You can refer to the documentation
for more details.
To create an ImageCollection
, you can use the following
command:
# Import the night time light emision data.
viirs_image_collection = ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG")
Note that this is the exact same code as in the JS documentation, but
adapted for this tutorial. In Python, to retrieve information about an
object in the form of a dictionary, we need can call the
getInfo()
method.
In Python, we use the following command:
Here, we call the getInfo
method to convert the object
information to a dictionary that we can extract useful information from.
However, when working in Jupyter notebooks, we can simply call the
object itself to display it inline:
# Note: we don't recomend running this line if you are using VirtualBox virtual machines
# as they might cause it to crash, however, we still would like to show this option of getting info,
# as it is displayed more clearly within Jupyter Notebook.
# ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG")
In Jupyter, this will display the ImageCollection object.
Inspecting the ImageCollection object, we can see that it has several
properties. It includes type
, id
,
version
, an empty list for bands
, 23
properties
, and 132 features
. These properties
are similar to what you would expect in a GeoJSON representation of a
FeatureCollection.
The type
, id
, and version
provide information about the data and the object type. At this point,
the object has an empty list for bands, indicating that there are no
bands defined yet. It has 23 properties, which describe information
about the collection as a whole (e.g., it is a monthly collection). The
collection contains features, which are individual images. Each feature
(image) has its own type
, id
,
version
, bands
, and properties
.
There are no features within these images themselves, as are no
collections.
Earth Engine as a viewer
Now, let’s explore how we can use Google Earth Engine as a viewer. As
mentioned earlier, Google Earth Engine is not only a data collection but
also a platform with tools and a viewer. To interact with the viewer, we
will utilize a package called geemap
. Geemap also provides
useful tools, such as data conversion to and from GEE native objects.
We’ll cover this in more detail later.
To create a map, we will follow a few steps outlined in the geemap getting started page, modifying it for our use case:
- Initialize a map with a starting position and zoom level. This map will serve as our base, and we will add layers to it.
- Select one image from our image collection.
- Add the selected image to the map.
- Add a boundary of the Netherlands to the same map.
Initialize the map
By following these steps, we will create a map visualization of the selected image.
# set our initial zoom position for the netherlands
center_lon = 51.962589
center_lat = 5.669627
zoomlevel = 8
# The following line will initialize the map
Map = geemap.Map(center=[center_lon, center_lat], zoom=zoomlevel)
# The following line will actually call and show the map:
Map
The map will stay and remain in this window unless we instruct it
otherwise. It is an interactive map with 1 standard basemap at the
moment. To start over, you can call Map.clear()
, which will
remove all layers and other objects from the map, requiring you to
initialize it again. Sometimes, when working with an interactive widget
like this map, Jupyter might become unresponsive. If you no longer
receive any response (you can test this by running 1 + 1
in
a new code block), you might need to restart the kernel and begin
again.
Select 1 image and add it to the map
Step 1 is complete! The next step is to select one image from the image collection and add it to the map.
# Initial date of interest (inclusive).
from_date = '2017-01-01'
# Final date of interest (exclusive).
to_date = '2023-01-01'
image = viirs_image_collection\
.filterDate(from_date, to_date)\
.sort("system:time_start", True)\
.first()
visualization_params = {
'min': 0,
'max': 10,
'opacity': 0.5,
'bands': ['avg_rad']
}
Map.addLayer(ee_object = image,
vis_params = visualization_params,
name = 'My First Image')
In the code snippet provided, several steps are performed to add an image to the map. In the first line the following happened:
Filtering Images: The
viirs_image_collection
is filtered to select only the images between two specified dates using thefilterDate(from_date, to_date)
method.Sorting the Collection: The filtered image collection is sorted based on the
"system:time_start"
property in ascending order using thesort("system:time_start", ascending)
method. The resulting collection is then assigned to the variableimage
.Selecting an image: We selected the first image out of the collection using the
first()
method.
We then define visualization parameters and add the image to the map:
Visualization Parameters: Visualization parameters are defined to specify how the image should be displayed on the map. This includes the desired range of values, opacity, and the specific band to visualize (
avg_rad
in this case).Adding the Image to the Map: The
image
is added to the map using theAddLayer
method. This method requires the Earth Engine object (image
), the visualization parameters, and an optional name for the layer.
By following these steps, the code successfully adds the selected image to the map, great!
Add the boundaries of the Netherlands to the map
The next code will add the boundaries of the Netherlands to the map.
To get the boundary data we use the pygadm
package.
pygadm
is a dataset project from the University of Berkeley
that provides administrative boundary data on multiple scales for
various regions worldwide. Be careful! The returned data is not always
completely correct, but it suits our purposes for now.
# Get data
netherlands = pygadm.get_items(admin='NLD', content_level=1)
# Define the projection
netherlands.crs = "EPSG:4326"
# Import gpd dataframe to earth engine
ee_netherlands = geemap.geopandas_to_ee(netherlands)
# Add data to map
Map.addLayer(ee_object = ee_netherlands,
name = 'Netherlands Boundaries')
Map.addLayerControl()
Using the get_items
function, we can obtain the
boundaries at content level 1, which represent the provinces in the
Netherlands. This data is retrieved in the
GeoPandas GeoDataFrame
format, which cannot be added to the
map directly. However, we can utilize a convenient conversion tool
provided by the geemap
package to convert it into an Earth
Engine feature. The resulting object can then be easily placed on the
map.
For additional examples and tools, we recommend checking out the examples
provided by geemap
themselves! It’s worth noting that you
can also export
a map to HTML, which might be useful for your project.
Take a look at these examples to further enhance your project and
explore the capabilities offered by geemap
.
Great, we have now made a simple map in google earth engine. We added an image for the whole world and we imported external data and also visualized that. And it works very quick!
Analysis in GEE
For the analysis we want to clip the imagery to the boundaries. After that we want to take the average of all the pixels per province. This will result in a single value per province per month. Schematically the steps will look like this:
- Select data: from the ImageCollection, select the band we want to do the analysis on and filter the collection to the time range we want to do the analysis on.
- Single image: Collapse the image collection into a single image with many bands
- Clip: Clip the images on the boundaries
- Calculate the statistics per province
Selecting the data and collapsing it
The first step can be done in one line. Several times we call a method on an image collection, which will result in an image collection. On this collection we will call another method and so fort. Look at the following line of code and try to understand what is happening before you read the explanation. Use the documentation!
viirs_image = viirs_image_collection.select('avg_rad') \
.filterDate(from_date, to_date) \
.toBands()
4 methods are called in one line. In this code we go from an image collection over all available time steps to a single image of the time we are interested in, with one band representing an image per month. Why we collapse it into this single image will become clear in the next step. The separate methods are described below.
- First, we select the band we are interested in, the
avg_rad
band, containing the average nighttime radiation per month. For this we use theselect
method. - We then select the images from the dates we are interested in using
the
filterDate
method. - Finally we call the
toBands
method. Note that this method results in anImage
instead of anImageCollection
.
When you run the code above note that the code is run instantly, no
matter the processing power of your laptop or virtual machine, while
actually it is quite a large computation. How can this be? Actually the
code and the calculation is not done on your computer, in fact there is
no computing done at all yet for this code. This code creates an API
Call, which will be sent to the Earth Engine cloud platform once it
needs to be called. For example when you now call the
viirs_image
variable, it takes longer. The API call is sent
and the results are returned to your environment.
Clipping the data
Next, we will clip the image to the desired area, the boundary of the
Netherlands. This can be done by using the clip
method. We could have also added this to the already long line of code
above. Functionally, this makes no difference. The decision on how to do
this is personal; every programmer might do it differently. The most
important aspect of this choice is to make it clear for yourself and
others what each line is doing and to define clear variable names.
# Function to clip an image to the specified geometry
viirs_image_clipped = viirs_image.clip(ee_netherlands)
viirs_image_clipped
now contains an image with 72 bands.
Please verify this in your own environment.
Question 1: Do you understand the band names?
Click for answer
One way of checking the amount of bands returned will be with the lineprint(len(viirs_image_clipped.getInfo()[‘bands’]))
. The
naming should start with something like “20170101_avg_rad” the first
part refers to the date of the image Jan 01, 2017, and the later is an
abbreviation of average radiation.
Calculating statistics
Now, we want to calculate the average radiance per province. We can
achieve this using a reducer, which we apply over an area. A
reducer is an algorithm that is applied over a set of values. In our
case, the set of values will be the pixel values within a province. We
will use a method called reduceRegions
to gather this set of values. According to the documentation, this
method requires a collection and a reducer as inputs. We will provide
the polygons of the provinces in the Netherlands, referred to as
ee_netherlands
, as the input collection. As for the
reducer, there are various options available. To explore the available
options, you can search for reducer
using the filter in the
documentation.
Since we want to take the average over the collection, we will
utilize the ee.Reducer.mean()
function. It is important to
note that this function itself is not a reducer, but rather it
returns a reducer. Therefore, we need to actually call the
function as part of the input. Additionally, we provide the optional
parameter scale
with a value of 463.83 meters.
Question 2: Why do we choose 463.83 meters as scale? Find out using the documentation of the imagery and the documentation of
reduceRegions
.
Click for answer
In case you had trouble finding the reduceRegions documentation you can find it here. The reducer asks for a scale so that it knows which units to do the analysis in.External analysis and visualization
All right! We now have a FeatureCollection
with the
average radiation per province, per month over the given time period. To
perform analysis and visualization, our next step is to calculate the
largest difference within the time period, classify this difference, and
display the results on a map. While this is all possible within Earth
Engine, since you are familiar with working with GeoPandas, it makes
sense use what we know. Exporting the output to a GeoPandas DataFrame
and continuing from there is not a problem at all, so let’s do that.
The steps we will follow are as follows: 1. Export the statistics FeatureCollection to a GeoPandas DataFrame. 2. Calculate the difference in radiation per province. 3. Classify this difference into three classes: small difference, medium difference, and large difference. 4. Convert the resulting DataFrame back into an Earth Engine object. 5. Visualize the results.
Exporting
Geemap
provides useful tools for exporting data. The
most convenient format for us is a GeoPandas
object, but there are many other options available as well. Please refer
to the documentation for additional options.
It is, obviously, also possible to export other earth engine objects like images or image collections, as is shown in the documentation. At the end of this tutorial we will show how to export the data used in this tutorial to a geotiff file.
Calculating differences
From here calculating the difference is straight forward, since we know (geo)pandas well. We now also have all the functionality that comes with GeoPandas, such as very easily writing the results to a geojson!
Classifying and visualizing trends
Now to see how the radiation behaved over time we have to do a little more work, since the dataframe in this format is not very useful for this. Read and understand the code below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
# Clean the dataframe a little
clean_df = df.drop(['GID_0', 'GID_1', 'ISO_1', 'HASC_1', 'NAME_0', 'NL_NAME_1', 'VARNAME_1', 'TYPE_1','CC_1', 'ENGTYPE_1'], axis=1)
# Reformat the dataframe so the rows depict measurements per timestamp per province, instead of an attribute per timestamp
melted_df = pd.melt(clean_df, id_vars=['NAME_1', 'geometry', 'diff'] , var_name='date', value_name='value')
# Convert the time string to a datetime format
melted_df['short_date'] = melted_df['date'].str[:8].str.strip()
melted_df['datetime'] = pd.to_datetime(melted_df['short_date'], format="%Y%m%d")
# Initialize the plot
fig, ax = plt.subplots(figsize=(9, 6))
# Create an empty list to save all lines in for the legend
lines = []
# For each province, add a line with a different color
for prov in melted_df['NAME_1'].unique():
# Create a dataframe for the province of interest
d = melted_df[melted_df['NAME_1'] == prov]
# Generate a random color
random_rgb_values = list(np.random.random(3))
random_color = matplotlib.colors.rgb2hex(random_rgb_values, keep_alpha=False)
# Add a line for each province
lines += ax.plot( d["datetime"].values, d["value"].values, color= random_color, lw=1, zorder=10, label=prov)
# Add the legend
ax.legend(handles = lines)
We can see that there are some differences per province, but that there does not seem to a nationwide trend. We can also see that in the summer there is no data for the Netherlands, or in fact for entire northern europe. This has to do with the nights being so long here in the summer. Per province there might be a larger difference. To find out, we can add a trendline to the graph, but this is out of scope for this tutorial.
Classify differences
For the visualization we now want to categorize the differences in 3
classes, a little difference, medium difference and a large difference.
To do this we will use a numpy function called linspace
and
a pandas function called cut
. This results in an array with
datatype CategorizalClasses
, which earth engine does not
understand so we convert it to a string.
# Create 3 classes
breaks = np.linspace(df['diff'].min() - 0.0001, df['diff'].max(), 4)
# Assign values to classes
classes = ['Low', 'Mid', 'High']
df['class'] = pd.cut(df['diff'], bins=breaks, labels=classes, right=True)
# Convert the datatype to strings so earth engine understands
df['class'] = df['class'].astype('str')
Show results on a map
So the next step is to show the differences on an earth engine map. To do this, we have to convert the geopandas dataframe back to a earth engine object. Before we do this we have to tell geopandas what coordinate refrence system the geometries are in. Strangely the coordinate system is not exported from earth engine, but it does need it to be assigned to be imported…
df = df.set_crs(4326)
new_ee_statistics = geemap.geopandas_to_ee(df)
# If you run the entire tutorial in one go, the map will show at the same location as it was first initialized.
# To show it hear, clear the map and create a new one.
# If the map was not initizalized before, this line will throw an error.
Map.clear()
Style the map
Now we have the polygons back in the right format with the classes. To visualize we define a palette, which tells earth engine which color we will give to each class. For the legend we can use this dictionary directly. Of course we also want to add the latest night time imagery behind it and we want to add a layer control.
Map = geemap.Map(zoomlevel=8)
palette = {
'High': 'B81D13', # red
'Mid': 'EFB700', # yellow
'Low': '008450' # green
}
visualization_params = {
'min': 0,
'max': 10,
'opacity': 0.7,
'bands': ['20221201_avg_rad']
}
Map.addLayer(ee_object = viirs_image_clipped,
vis_params = visualization_params,
name = 'Night Time')
Map.add_styled_vector(new_ee_statistics, column="class", palette=palette, layer_name="Styled vector")
Map.center_object(new_ee_statistics)
Map.add_legend(legend_title="Legend", legend_dict=palette, position = 'topright')
Map.addLayerControl()
Map
Exporting rasters
To file
As promised earlier in this tutorial we will show how to export raster data to a local file for visualization or analysis outside of GEE. We will use the tools as described in the geemap tutorials. Before we export an image we have to get some data from GEE. We will use the same ata as used in the rest of the tutorial.
# Initial date of interest (inclusive).
from_date = '2017-01-01'
# Final date of interest (exclusive).
to_date = '2023-01-01'
image = viirs_image_collection\
.filterDate(from_date, to_date)\
.sort("system:time_start", True)\
.first()
# Get geometry data and convert to earth engine object
netherlands = pygadm.get_items(admin='NLD', content_level=1)
netherlands.crs = "EPSG:4326"
ee_netherlands = geemap.geopandas_to_ee(netherlands)
After collecting the data we can use a simple command for downloading
the data. In this case the image we are working with consists of two
bands. Setting the file_per_band
parameter to
True
will result in a file for each of the bands.
To python
For further analysis within python it might be easier to convert a
raster to a numpy.array
. This stackexchange
question and answer by Justin Braaten shows nicely how this can be
done. Notice that for further spatial analysis this might be
inconvenient, since the resulting numpy array contains only the pixel
values, without any spatial information. When importing to, for example,
rasterio we need additional sptial information such as the origin (north
west corner) and the pixel size, however. For more information see also
this stackexchange
thread
Conclusion
That marks the end of todays tutorial. Today you learned about many different things. We started with understanding JavaScript a little better and we can now read the documentation of the Earth Engine API, to apply the functionality in Python. We then used the Python implementation of this API to create maps in Earth Engine and we did some analysis about light radiation in the Netherlands. We exported and imported earth engine objects to other formats we can easier work with, so we can use the tools we know in combination with the power offered by Google Earth Engine.