Tutorial 11: Python and Google Earth Engine

A note on running this tutorial

We recommend using Jupyter Notebook to run the code in this tutorial. The code today makes use of interactive widgets, which will display inline when Jupyter Notebooks. You can run the code in an IDE like Spyder, but where the variable Map is called, you will have to export the map and look at the exported html in for example a browser. To export the map to html you can use the following code:

Map.to_html(filename=‘map.html’, title=‘My Map’, width=‘100%’, height=‘880px’)

Learning goals for today:

Be able to read javascript documentation
Visualize raster and vector data using Google Earth Engine in python
Use the python api for Google Earth Engine for doing spatial analysis

You may already be familiar with or have explored Google Earth (https://earth.google.com/web/). Within Google Earth, you can virtually explore the Earth from your computer. The application offers various maps and information, including the familiar Google Maps base map and additional layers such as the age of the sea floor. Behind Google Earth lies it’s engine. It is a powerful cloud-based computational platform that can be used for analyzing all data available in Google Earth and more. It provides a wealth of information and tools that are accessible to everyone with a Google account.

Javascript, API’s and Object Oriented Programming

Google Earth Engine (GEE) is the underlying platform behind Google Earth, initially developed and released as a JavaScript API. But what exactly does that mean? Let’s break it down into JavaScript and API.

API

An API stands for “Application Programming Interface” and serves as a standardized language or set of rules that allows different software components to communicate with each other. APIs are commonly used for tasks such as accessing data (e.g., weather data from OpenWeatherMap) or interacting with a command line tool: GDAL with python or R, as we covered in the previous tutorials.

JavaScript

JavaScript (JS) is a widely used programming language primarily employed for web development. It follows an object-oriented programming (OOP) paradigm, which emphasizes creating reusable code through objects. Objects in JS can encapsulate both data (properties) and functionality (methods) within a single entity. OOP differs somewhat from the programming style we have used thus far, although Python can also be used as an OOP language. Working with the GEE Python API offers a fantastic opportunity to become familiar with object-oriented programming while leveraging the immense power of Google’s engine.

Google Earth Engine

Google Earth Engine (GEE) is a comprehensive platform that offers a vast collection of geospatial data along with powerful computational capabilities. It enables users to analyze and visualize geospatial data on a global scale, making it a valuable tool for various research domains such as environmental monitoring and resource management.

One of the prominent features of GEE is its extensive collection of raster data sources. You can explore the catalog of available datasets here. Additionally, GEE provides access to raster time series and non-raster data as well. For specific use cases and examples, you can refer to the article by Pérez-Cutillas and colleagues published this year here. They conducted a systematic review of research conducted using GEE. From their article:

The results of the meta-analysis following the systematic review showed that: (i) the Landsat 8 was the most widely-used satellite (25%); (i) the non-parametric classification methods, mainly Random Forest, were the most recurrent algorithms (31%); and (iii) the water resources assessment and prediction were the most common methodological applications (22%).

– Pérez-Cutillas et al., 2023

Nighttime light emission

In this tutorial, we will conduct a relatively straightforward analysis to demonstrate commonly used tools in Google Earth Engine (GEE). Our main focus is to show you how to work with GEE in Python, locate tools and documentation, and read the JavaScript (JS) documentation.

The analysis we will perform involves comparing the amount of light emitted at night in the Netherlands over time. However, you can choose a different area for your analysis, as we will be using a global dataset from 2012 to the present.

Environment set up

To set up the required environment, you will need the following dependencies. Save the following YAML code in a new file named env.yaml then create and activate the environment using mamba

name: pythonGee
channels:
  - conda-forge
dependencies:
  - python=3.10
  - jupyter
  - earthengine-api
  - geemap
  - pygadm
  - geopandas
  - geojson
  - spyder

Once the environment is activated, run the code jupyter notebook in your terminal to open up a Jupyter Notebook for your project (similiar to how you open Spyder). You can refer to the Python Refresher or the Jupyter Notebook Documentation for a quick guide on how to use the notebooks. Once you have the notebook open, create a new file with the Python3 kernel. You will write and run the rest of your code here.

To test the environment, try importing the following packages:

import ee
import geemap 
import pygadm
import geopandas as gpd
import geojson

To work with Google Earth Engine (GEE), you will need a Google account for authorization. The authentication process can be initiated by calling the Authenticate function from the ee module. Follow these steps:

After running the Authenticate function, a URL will be printed in the output.
Click on the URL, which will redirect you to a Google account login page.
Log in to your Google account to create an authorization token.
When prompted with a warning stating that Google has not verified the app, click “Continue”. This warning is shown because the code we are about to run is not verified by Google, ensuring your account’s security.
On the authorization page, select all the checkboxes. This grants permission to access GEE data and manage data and permissions in the cloud storage where the analysis will be performed.
Finally, you will see an authorization code. Copy the code and paste it into the text box that appears after running the Authenticate function. Press Enter to complete the authentication process.

ee.Authenticate()

You now have (hopefully) succesfully authenticated. Next we have to initialize using the Initialize() method:

ee.Initialize()

If you do not encounter any warnings or errors, you have successfully authenticated and are now ready to work with the Earth Engine library in Python! Congratulations!

The next time you work in the same environment, you won’t need to authenticate again. Simply run the Initialize function, and you’ll be all set to use the Earth Engine library.

Some notes on documentation

In the documentation we will refer to today, you will notice that there are examples available for both JavaScript and Python (Google Colab). This is because Google is working on implementing all the JavaScript functionality in Python. Not all the documentation or examples are at this time available in Python. Luckily, although the languages are quite different, the JavaScript and Python implementation for this package are very similair. It is a good skill to be able to read some JavaScript. Additionally, we are not sure what documentation is available for python at the time of following this tutorial. Therefor we will refer to the JavaScript documentation, but we will highlight any important differences and provide guidance on using this code in Python.

Some common differences are the following. To initialize a variable, in Python you can simply type:

variablename = 'value'

In Javascript we can initialize a variable in various ways. The difference is where this variable in the code will be known:

var variablename = 'value'
let variablename = 'value'
const variablename = 'value'

Another often occuring difference in the documentation is how variables are passed to a function. In python we can do that as we are used to:

multiStats = img.reduceRegions(
  collection=regionCol,
  reducer=reducer,
  scale=30,
  crs='EPSG:3310',
)

In Javascript we have to pass an object, similar to a dictionary in python:

var multiStats = img.reduceRegions({
  collection: regionCol,
  reducer: reducer,
  scale: 30,
  crs: 'EPSG:3310',
});

You do not need to understand this code itself for now, just note the differences in syntax.

Accessing Night Light Data

To analyze the development of light emission over the past 10 years, we will compare the amount of emitted light per province on a monthly basis.

For this analysis, we will need data on emitted light. We will use the VIIRS Nighttime data, which you can learn more about here. This dataset provides monthly average radiance composite images using nighttime data from the Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB). You can also explore the data in this Esri viewer. To work with this data, we can create an ImageCollection object. You can refer to the documentation for more details.

To create an ImageCollection, you can use the following command:

# Import the night time light emision data.
viirs_image_collection = ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG")

Note that this is the exact same code as in the JS documentation, but adapted for this tutorial. In Python, to retrieve information about an object in the form of a dictionary, we need can call the getInfo() method.

print('Image collection from a string', ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG"));

In Python, we use the following command:

print(ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG").getInfo()['properties'])

Here, we call the getInfo method to convert the object information to a dictionary that we can extract useful information from. However, when working in Jupyter notebooks, we can simply call the object itself to display it inline:

# Note: we don't recomend running this line if you are using VirtualBox virtual machines
#   as they might cause it to crash, however, we still would like to show this option of getting info, 
#   as it is displayed more clearly within Jupyter Notebook.
# ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG")

In Jupyter, this will display the ImageCollection object.

Inspecting the ImageCollection object, we can see that it has several properties. It includes type, id, version, an empty list for bands, 23 properties, and 132 features. These properties are similar to what you would expect in a GeoJSON representation of a FeatureCollection.

The type, id, and version provide information about the data and the object type. At this point, the object has an empty list for bands, indicating that there are no bands defined yet. It has 23 properties, which describe information about the collection as a whole (e.g., it is a monthly collection). The collection contains features, which are individual images. Each feature (image) has its own type, id, version, bands, and properties. There are no features within these images themselves, as are no collections.

Earth Engine as a viewer

Now, let’s explore how we can use Google Earth Engine as a viewer. As mentioned earlier, Google Earth Engine is not only a data collection but also a platform with tools and a viewer. To interact with the viewer, we will utilize a package called geemap. Geemap also provides useful tools, such as data conversion to and from GEE native objects. We’ll cover this in more detail later.

To create a map, we will follow a few steps outlined in the geemap getting started page, modifying it for our use case:

Initialize a map with a starting position and zoom level. This map will serve as our base, and we will add layers to it.
Select one image from our image collection.
Add the selected image to the map.
Add a boundary of the Netherlands to the same map.

Initialize the map

By following these steps, we will create a map visualization of the selected image.

# set our initial zoom position for the netherlands
center_lon = 51.962589
center_lat = 5.669627
zoomlevel = 8

# The following line will initialize the map
Map = geemap.Map(center=[center_lon, center_lat], zoom=zoomlevel)

# The following line will actually call and show the map:
Map

The map will stay and remain in this window unless we instruct it otherwise. It is an interactive map with 1 standard basemap at the moment. To start over, you can call Map.clear(), which will remove all layers and other objects from the map, requiring you to initialize it again. Sometimes, when working with an interactive widget like this map, Jupyter might become unresponsive. If you no longer receive any response (you can test this by running 1 + 1 in a new code block), you might need to restart the kernel and begin again.

Select 1 image and add it to the map

Step 1 is complete! The next step is to select one image from the image collection and add it to the map.

# Initial date of interest (inclusive).
from_date = '2017-01-01'
# Final date of interest (exclusive).
to_date = '2023-01-01'

image = viirs_image_collection\
            .filterDate(from_date, to_date)\
            .sort("system:time_start", True)\
            .first()

visualization_params = {
    'min': 0,
    'max': 10,
    'opacity': 0.5,
    'bands': ['avg_rad']
}

Map.addLayer(ee_object = image, 
             vis_params = visualization_params,
             name = 'My First Image')

In the code snippet provided, several steps are performed to add an image to the map. In the first line the following happened:

Filtering Images: The viirs_image_collection is filtered to select only the images between two specified dates using the filterDate(from_date, to_date) method.
Sorting the Collection: The filtered image collection is sorted based on the "system:time_start" property in ascending order using the sort("system:time_start", ascending) method. The resulting collection is then assigned to the variable image.
Selecting an image: We selected the first image out of the collection using the first() method.

We then define visualization parameters and add the image to the map:

Visualization Parameters: Visualization parameters are defined to specify how the image should be displayed on the map. This includes the desired range of values, opacity, and the specific band to visualize (avg_rad in this case).
Adding the Image to the Map: The image is added to the map using the AddLayer method. This method requires the Earth Engine object (image), the visualization parameters, and an optional name for the layer.

By following these steps, the code successfully adds the selected image to the map, great!

Add the boundaries of the Netherlands to the map

The next code will add the boundaries of the Netherlands to the map. To get the boundary data we use the pygadm package. pygadm is a dataset project from the University of Berkeley that provides administrative boundary data on multiple scales for various regions worldwide. Be careful! The returned data is not always completely correct, but it suits our purposes for now.

# Get data
netherlands = pygadm.get_items(admin='NLD', content_level=1)

# Define the projection
netherlands.crs = "EPSG:4326"

# Import gpd dataframe to earth engine
ee_netherlands = geemap.geopandas_to_ee(netherlands)

# Add data to map
Map.addLayer(ee_object = ee_netherlands, 
             name = 'Netherlands Boundaries')
Map.addLayerControl()

Using the get_items function, we can obtain the boundaries at content level 1, which represent the provinces in the Netherlands. This data is retrieved in the GeoPandas GeoDataFrame format, which cannot be added to the map directly. However, we can utilize a convenient conversion tool provided by the geemap package to convert it into an Earth Engine feature. The resulting object can then be easily placed on the map.

For additional examples and tools, we recommend checking out the examples provided by geemap themselves! It’s worth noting that you can also export a map to HTML, which might be useful for your project.

Take a look at these examples to further enhance your project and explore the capabilities offered by geemap.

Great, we have now made a simple map in google earth engine. We added an image for the whole world and we imported external data and also visualized that. And it works very quick!

Analysis in GEE

For the analysis we want to clip the imagery to the boundaries. After that we want to take the average of all the pixels per province. This will result in a single value per province per month. Schematically the steps will look like this:

Select data: from the ImageCollection, select the band we want to do the analysis on and filter the collection to the time range we want to do the analysis on.
Single image: Collapse the image collection into a single image with many bands
Clip: Clip the images on the boundaries
Calculate the statistics per province

Selecting the data and collapsing it

The first step can be done in one line. Several times we call a method on an image collection, which will result in an image collection. On this collection we will call another method and so fort. Look at the following line of code and try to understand what is happening before you read the explanation. Use the documentation!

viirs_image = viirs_image_collection.select('avg_rad') \
                .filterDate(from_date, to_date) \
                .toBands()

4 methods are called in one line. In this code we go from an image collection over all available time steps to a single image of the time we are interested in, with one band representing an image per month. Why we collapse it into this single image will become clear in the next step. The separate methods are described below.

First, we select the band we are interested in, the avg_rad band, containing the average nighttime radiation per month. For this we use the select method.
We then select the images from the dates we are interested in using the filterDate method.
Finally we call the toBands method. Note that this method results in an Image instead of an ImageCollection.

When you run the code above note that the code is run instantly, no matter the processing power of your laptop or virtual machine, while actually it is quite a large computation. How can this be? Actually the code and the calculation is not done on your computer, in fact there is no computing done at all yet for this code. This code creates an API Call, which will be sent to the Earth Engine cloud platform once it needs to be called. For example when you now call the viirs_image variable, it takes longer. The API call is sent and the results are returned to your environment.

viirs_image

Clipping the data

Next, we will clip the image to the desired area, the boundary of the Netherlands. This can be done by using the clip method. We could have also added this to the already long line of code above. Functionally, this makes no difference. The decision on how to do this is personal; every programmer might do it differently. The most important aspect of this choice is to make it clear for yourself and others what each line is doing and to define clear variable names.

# Function to clip an image to the specified geometry
viirs_image_clipped = viirs_image.clip(ee_netherlands)

viirs_image_clipped now contains an image with 72 bands. Please verify this in your own environment.

Question 1: Do you understand the band names?

Click for answer

One way of checking the amount of bands returned will be with the line print(len(viirs_image_clipped.getInfo()[‘bands’])). The naming should start with something like “20170101_avg_rad” the first part refers to the date of the image Jan 01, 2017, and the later is an abbreviation of average radiation.

Calculating statistics

Now, we want to calculate the average radiance per province. We can achieve this using a reducer, which we apply over an area. A reducer is an algorithm that is applied over a set of values. In our case, the set of values will be the pixel values within a province. We will use a method called reduceRegions to gather this set of values. According to the documentation, this method requires a collection and a reducer as inputs. We will provide the polygons of the provinces in the Netherlands, referred to as ee_netherlands, as the input collection. As for the reducer, there are various options available. To explore the available options, you can search for reducer using the filter in the documentation.

Since we want to take the average over the collection, we will utilize the ee.Reducer.mean() function. It is important to note that this function itself is not a reducer, but rather it returns a reducer. Therefore, we need to actually call the function as part of the input. Additionally, we provide the optional parameter scale with a value of 463.83 meters.

Question 2: Why do we choose 463.83 meters as scale? Find out using the documentation of the imagery and the documentation of reduceRegions.

Click for answer

In case you had trouble finding the reduceRegions documentation you can find it here. The reducer asks for a scale so that it knows which units to do the analysis in.

province_statistics = viirs_image_clipped.reduceRegions(
    collection = ee_netherlands,
    reducer = ee.Reducer.mean(),
    scale = 463.83
)

External analysis and visualization

All right! We now have a FeatureCollection with the average radiation per province, per month over the given time period. To perform analysis and visualization, our next step is to calculate the largest difference within the time period, classify this difference, and display the results on a map. While this is all possible within Earth Engine, since you are familiar with working with GeoPandas, it makes sense use what we know. Exporting the output to a GeoPandas DataFrame and continuing from there is not a problem at all, so let’s do that.

The steps we will follow are as follows: 1. Export the statistics FeatureCollection to a GeoPandas DataFrame. 2. Calculate the difference in radiation per province. 3. Classify this difference into three classes: small difference, medium difference, and large difference. 4. Convert the resulting DataFrame back into an Earth Engine object. 5. Visualize the results.

Exporting

Geemap provides useful tools for exporting data. The most convenient format for us is a GeoPandas object, but there are many other options available as well. Please refer to the documentation for additional options.

df = geemap.ee_to_geopandas(province_statistics)

It is, obviously, also possible to export other earth engine objects like images or image collections, as is shown in the documentation. At the end of this tutorial we will show how to export the data used in this tutorial to a geotiff file.

Calculating differences

From here calculating the difference is straight forward, since we know (geo)pandas well. We now also have all the functionality that comes with GeoPandas, such as very easily writing the results to a geojson!

df['diff'] = df['20170101_avg_rad'] - df['20221201_avg_rad']
df.to_file('test.geojson', drive='GeoJSON')

Classifying and visualizing trends

Now to see how the radiation behaved over time we have to do a little more work, since the dataframe in this format is not very useful for this. Read and understand the code below.

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import matplotlib

# Clean the dataframe a little
clean_df = df.drop(['GID_0', 'GID_1', 'ISO_1', 'HASC_1', 'NAME_0', 'NL_NAME_1', 'VARNAME_1', 'TYPE_1','CC_1', 'ENGTYPE_1'], axis=1)

# Reformat the dataframe so the rows depict measurements per timestamp per province, instead of an attribute per timestamp
melted_df = pd.melt(clean_df, id_vars=['NAME_1', 'geometry', 'diff'] , var_name='date', value_name='value')

# Convert the time string to a datetime format
melted_df['short_date'] = melted_df['date'].str[:8].str.strip()
melted_df['datetime'] = pd.to_datetime(melted_df['short_date'], format="%Y%m%d")


# Initialize the plot
fig, ax = plt.subplots(figsize=(9, 6))

# Create an empty list to save all lines in for the legend 
lines = []

# For each province, add a line with a different color 
for prov in melted_df['NAME_1'].unique():    
    
    # Create a dataframe for the province of interest
    d = melted_df[melted_df['NAME_1'] == prov]
    
    # Generate a random color
    random_rgb_values = list(np.random.random(3))
    random_color = matplotlib.colors.rgb2hex(random_rgb_values, keep_alpha=False)
    
    # Add a line for each province
    lines += ax.plot( d["datetime"].values, d["value"].values, color= random_color, lw=1, zorder=10, label=prov)
    
    
# Add the legend    
ax.legend(handles = lines)

We can see that there are some differences per province, but that there does not seem to a nationwide trend. We can also see that in the summer there is no data for the Netherlands, or in fact for entire northern europe. This has to do with the nights being so long here in the summer. Per province there might be a larger difference. To find out, we can add a trendline to the graph, but this is out of scope for this tutorial.

Classify differences

For the visualization we now want to categorize the differences in 3 classes, a little difference, medium difference and a large difference. To do this we will use a numpy function called linspace and a pandas function called cut. This results in an array with datatype CategorizalClasses, which earth engine does not understand so we convert it to a string.

# Create 3 classes
breaks = np.linspace(df['diff'].min() - 0.0001, df['diff'].max(), 4)

# Assign values to classes
classes = ['Low', 'Mid', 'High']
df['class'] = pd.cut(df['diff'], bins=breaks, labels=classes, right=True)

# Convert the datatype to strings so earth engine understands
df['class'] = df['class'].astype('str')

Show results on a map

So the next step is to show the differences on an earth engine map. To do this, we have to convert the geopandas dataframe back to a earth engine object. Before we do this we have to tell geopandas what coordinate refrence system the geometries are in. Strangely the coordinate system is not exported from earth engine, but it does need it to be assigned to be imported…

df = df.set_crs(4326)
new_ee_statistics = geemap.geopandas_to_ee(df)

# If you run the entire tutorial in one go, the map will show at the same location as it was first initialized.
# To show it hear, clear the map and create a new one. 
# If the map was not initizalized before, this line will throw an error. 
Map.clear()

Style the map

Now we have the polygons back in the right format with the classes. To visualize we define a palette, which tells earth engine which color we will give to each class. For the legend we can use this dictionary directly. Of course we also want to add the latest night time imagery behind it and we want to add a layer control.

Map = geemap.Map(zoomlevel=8)

palette = {
    'High': 'B81D13', # red
    'Mid': 'EFB700',  # yellow
    'Low': '008450'   # green
}


visualization_params = {
    'min': 0,
    'max': 10,
    'opacity': 0.7,
    'bands': ['20221201_avg_rad']
}

Map.addLayer(ee_object = viirs_image_clipped, 
             vis_params = visualization_params,
             name = 'Night Time')

Map.add_styled_vector(new_ee_statistics, column="class", palette=palette, layer_name="Styled vector")
Map.center_object(new_ee_statistics)

Map.add_legend(legend_title="Legend", legend_dict=palette, position = 'topright')

Map.addLayerControl()

Map

Exporting rasters

To file

As promised earlier in this tutorial we will show how to export raster data to a local file for visualization or analysis outside of GEE. We will use the tools as described in the geemap tutorials. Before we export an image we have to get some data from GEE. We will use the same ata as used in the rest of the tutorial.

# Initial date of interest (inclusive).
from_date = '2017-01-01'
# Final date of interest (exclusive).
to_date = '2023-01-01'

image = viirs_image_collection\
            .filterDate(from_date, to_date)\
            .sort("system:time_start", True)\
            .first() 

# Get geometry data and convert to earth engine object
netherlands = pygadm.get_items(admin='NLD', content_level=1)
netherlands.crs = "EPSG:4326"
ee_netherlands = geemap.geopandas_to_ee(netherlands)

After collecting the data we can use a simple command for downloading the data. In this case the image we are working with consists of two bands. Setting the file_per_band parameter to True will result in a file for each of the bands.

# Export to geotiff
geemap.ee_export_image(image, 
                       filename='clipped_image.tif', 
                       region=ee_netherlands.geometry(), 
                       scale=285,
                       file_per_band=True
                       )

To python

For further analysis within python it might be easier to convert a raster to a numpy.array. This stackexchange question and answer by Justin Braaten shows nicely how this can be done. Notice that for further spatial analysis this might be inconvenient, since the resulting numpy array contains only the pixel values, without any spatial information. When importing to, for example, rasterio we need additional sptial information such as the origin (north west corner) and the pixel size, however. For more information see also this stackexchange thread

Conclusion

That marks the end of todays tutorial. Today you learned about many different things. We started with understanding JavaScript a little better and we can now read the documentation of the Earth Engine API, to apply the functionality in Python. We then used the Python implementation of this API to create maps in Earth Engine and we did some analysis about light radiation in the Netherlands. We exported and imported earth engine objects to other formats we can easier work with, so we can use the tools we know in combination with the power offered by Google Earth Engine.

Arno Timmer

04 October, 2023