Introduction to GEE#

Background#

So far in this course, we have covered the use of python in handling geodata. Specifically, we have learned how to manage (large) raster and vector data and we got in contact with a large variety of tools and techniques. What all these had in common, was that usually you were provided with some kind of data, prepared by us. In addition, so far there was always enough computing power and disc space available for you. However, in your own research or in work you do outside of the university you will likely face the issue that you want to work with large data, but (a) won’t have the infrastructure to process them, and (b) may face restrictions because you want to access data as a private person. This is, why we want to expose you during the next few weeks to Google Earth Engine (GEE) as a huge data and processing engine, that you can use for your own work. GEE is around since ~2012 and it received a lot of attention with the publication by Hansen et al. (2013) in Science. Since then, the number of studies using GEE has exploded and the data catalogue seems to be growing day after day.

There are a lot of different tutorials and packages that help you doing some sort of analysis. Many of them use the language JavaScript , which GEE was initially prepared for, but increasingly you find tutorials and tools that use Python as an API. What these tools and tutorials all have in common is that they help you running some analysis in GEE and visualize the results. This can include time series analyses, image classifications, or simply the visualization of datasets (which there are a lot in GEE). They are designed to visualize the results directly in the browser, and, as we will learn, compute the results on-the-fly when zooming in/out or moving to a different place on the Earth. This is all pretty fantastic, and - as already said - there are a lot of tutorials out there, many of which we will provide links to. Though, GEE also has a number of limitations, such as a limited amount of algorithms implemented that can be applied. While for example in case of “simple” classifications this is not too big of a problem, in our daily research we often want to do more, for example designing new methods and tools that use image classifications or regressions as input. In the majority of these cases, GEE does not have the capabilities to flexibly implement and test new analysis methods. Thus, at some point in our analysis we need to export large amounts of pre-processed data from GEE to our own drives and work on them further. Here is, where it becomes tricky as GEE has mechnisms in its architecture that prevents exports of large data at once.

This part of the course therefore will not provide yet another tutorial of doing image classifications in GEE, but we want to continue with the general research line of large-scale processing. Specifically, we want to teach you in these sessions on how to use GEE as a large pre-processing-engine from which you export intermediate results to your own computer. Focussing on this part, will in our eyes nicely complement existing tutorials, which we will provide links to.

../../_images/gee_overview.png

Fig. 4 Overview about the different components, that GEE offers. The figure is derived from the GEE-website).#

We will focus on the first two elements (from the left) and we will use python as our API. In short, what GEE provides to us, is:

  • An extensive archive of data, including many complete satellite image archives (e.g., Landsat, Sentinel-1/2, MODIS, Planet-NICFI mosaics) as well as land-cover products (e.g., global forest watch, ESA World-Cover) both at the global and national (primarily United States) scale.

  • Simple (e.g., linear regression, random forests) & more advanced processing algorithms, particularly in the satellite image domain (e.g., LandTrendr, CCDC)

  • Nearly unlimited computing power

On the opposite, what it does not provide to us:

  • advanced processing solutions, particularly with respect to remote sensing data processing (e.g., FORCE); but also a higher diversity of classification and rgression algorithm (e.g., gradient boosting regression)

  • Although GEE provides some detail on how it handles data and applies algorithms, for many instances it remains a “black box”.

  • Nice cartography (although this is getting better)

  • DIY-parallelization (“The price of liberation from these details is that the user is unable to influence them.” (Gorelick, et al. 2017))

  • Easy extrapolation across large geographic spaces and export/download of these processing results

Having these elements in mind, what we want to do in this course, is:

  • to advertise GEE as a great source to do large area processing (but not necessarily only)

  • Show its power, but also its limitations and develop ways to work around this

  • Motivate you to use the best of the two worlds: (1) Data archives and processing power in GEE, (2) Advanced functionality of local python processing and use of gdal and osr

There are many important aspects with regards to GEE that you will probably learn if you keep using it. Though, one fundamental principle is important for our understanding: the distinction between server- and client-side operations. In general, we have a client side library that will be translated into complex geospatial analyses into EE requests:

../../_images/server_client.png

Fig. 5 Schematic representation about the differentiation between server-side and client-side operations (Source: https://geohackweek.github.io/GoogleEarthEngine/).#

This is an important aspect, as we will have to always think about what a certain operation is done locally on your computer, and what operation we want GEE to do. In other words, on the client-side (i.e., our computers) we create and manipulate “proxy objects”, which do not contain any data, but are just handles for objects on the server-side. This is really important to consider, because it means that, compared to the processes we have been thinking so far, we hardly can create/modify variables step-by-step.

Client library#

The client library lets you run processing tasks on the Google servers. The library can be accessed via a JavaScript and Python API (Application Interface). To better understand the API, check out the documentation. Below is a list of some examples.

Client Libraries:

  • ee.Algorithms

  • ee.Array

  • ee.Blob

  • ee.Classifier

  • ee.Image

  • ee.ImageCollection

  • ee.Reducer

Some libraries such as ee.Algorithms and ee.Reducer contain functions that are applied to other EE objects. They work like a collection of tools. You can access them as follows:

ee.Algorithms.HillShadow(image, azimuth, zenith, neighborhoodSize, hysteresis)

Other libraries correspond to EE data types that have their own functions (methods), e.g., ee.Dictionary, ee.Image.

Data types#

There are many good GEE tutorials online, including this one. We do not want to re-invent those tutorials and possibly create a terrible new version of it. Instead, we want to focus on doing these types of analyses and extract the data to our own drive - both for individual locations but also across large geographic extents. Here, we briefly introduce the basic elements of geographic features that we will work with in the context of the course. For a more complete overview of Earth Engine’s data types see here.

The most fundamental geographic data types we will be working with are Images and Features.

  1. Images: ee.Image in GEE consist of bands and a dictionary of properties. Many images together form an ImageCollection. An ImageCollection could be for example all Landsat-9 imagery within a region of interest.

  2. Features: ee.Feature can be generally considered vector elements (i.e., Points, Lines and Polygons). A feature consists of a Geometry and a dictionary of properties. Many features together form a FeatureCollection. An example for a FeatureCollection could be a collection of points.

In addition, there are a number of other important data structures in GEE, for example (including the syntax to create them):

  1. Dictionaries: ee.Dictionary({'e': np.e, 'pi': np.pi})

  2. Lists: ee.List([1, 2, 3, 4, 5])

  3. Dates: ee.Date('2022-12-06') or ee.Date.fromYMD(2017, 1, 13)

  4. Numbers: ee.Number(1)``` or ```ee.Number(np.pi)

  5. Strings: ee.String("Today is Nikolaus")

Interactive and batch environment#

GEE data types are server-side objects that are created and processed entirely on GEE’s cloud infrastructure. This means that your Jupyter server does not know anything about the content of ee-objects in your script unless you explicitly request information about them. At the same time, you must explicitly request the information in order for GEE to compute it. Earth Engine provides two different processing environments for you to request information: 1) an interactive environment optimized for speedy and small requests, and 2) a batch environment optimized for high-latency parallel processing of large amounts of data.

Interactive environment

  • print() returns a representation of a server-side object to the client.

  • getInfo() fetch the actual data or metadata of an Earth Engine object from the servers.

  • ee.data.computePixels() extracts and processes image data for a specific region of interest as numpy array or image file formats

  • Map display

Batch environment

  • ee.Export()

The batch environment let’s you export large datasets to your Google Assets or Google Cloud Drive. To access server-side objects on-the-fly, i.e., inside your python code, you must use the interactive environment. The function print() is useful for quick checks, debugging, and understanding the structure of Earth Engine objects. It does not necessarily trigger a full server-side computation, so it will not catch all potential errors. The function getInfo() is essential when you need to work with the actual data of an object, such as performing calculations, visualizations, or exporting it. Unlike print(), getInfo() blocks the rest of your code until the information is fetched, so it should be used with caution. Therefore getInfo() is often used to fetch results from calculations or properties of objects, such as image bands, projections, and other metadata. You can imagine, that if your request is large (e.g., because you have a large study area and you want to do a computation, such as the number of Landsat imagery inside the study area) the time to compute the message can be large. As such, you will need to be cautious with such.

Data converters#

Data converters are a relatively recent development for getting data into your python session. Data converters convert Earth Engine objects into Python objects (e.g., Pandas DataFrame and xarray) for interactive exploration and visualization. The function ee.data.computePixels() is one such example. The size of the data that can be piped through data converters is limited. To download larger amounts of data the export functions of the batch environment may be more appropriate.

Visualization with geemap#

The package we will use for visualizations in this course is the package geemap. This package has become quite popular in the last years. The author/developer is Quisheng Wu, a professor at the University of Tenessee. Originally being a thematic expert in Wetland mapping, he has become an important developer for GEE using python. The strength of his tool is that it brings together the functionality of QGIS in a ipynb. Many tools and tricks for visualization are available through his package, and we really recommend checking it out. For example, the package allows to download TimeLapse videos (see some examples for what a TimeLaps is here) and also some image composites and image classifications. The package also builds on different visualization backends, all of which have a different functionality. You can find out more about this here. You may guess: the downside is that it is not perfect if we want to extract large amounts of data from GEE as the package also has to deal with memory and computation restrictions that GEE gives. So, not for all applications it is perfect - but for visualizing and creating some easy applications it is great, and I am using it a lot (e.g., for training data collection based on image interpretation).

We start by importing the package and by authenticating our computer with google. This basically creates a connection between you and your computer and stores the connection details in something similar to a cookie. This is done by the code line ee.Authenticate() and has only be execute once. What you need to do, however, every time you start working is to instantiate a GEE session via ee.Initialize(). Below is a good code chunk that you can possibly use as a standard chunk inside your scripts.

import ee
import geemap.foliumap as geemap
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

The last thing we do now with this session is to show how to create a map object and how to load some simple images, in this particular example the Global Forest Watch Data. We can subdivide this into several steps:

Initialization of a Map object#

Map = geemap.Map(zoom=10) # instantiates a map object with a zoom lelve

Load the GFW data#

The GFW data are stored as an image in GEE with multiple bands. We here want to load both the % tree cover as well as the annual loss layer. We clip the images based on our geometry. This is not strictly necessary, but is more for visualization purposes. After we defined the images, we add them to the layer. Have a look and see how we define the visulization parameters. IN addition, we load the GoogleMaps base layer so that we can see the GFW data on top of a satellite image.

geom = ee.Geometry.Rectangle([-61, -22, -60, -23])
tc = ee.Image("UMD/hansen/global_forest_change_2022_v1_10").select(['treecover2000']).clip(geom)
lossYR = ee.Image("UMD/hansen/global_forest_change_2022_v1_10").select(['lossyear']).clip(geom)
Map.add_basemap('SATELLITE')
Map.addLayer(tc, {'min': 0, 'max': 100, 'palette': ['black', 'green']}, 'Tree Cover')
Map.addLayer(lossYR, {'min': 1, 'max': 22, 'palette': ['yellow', 'red']}, 'Loss Year')
Map.addLayer(geom, {'color': '000000', 'width': 1, 'lineType': 'solid', 'fillColor': '00000000'}, 'Geometry')

Run the map#

Now, we visuaize the map. Important here to know is, only when calling the Map object, the request is actually sent to GEE to process the data. Until then, all the vriables we defined are only containers with the rules. Before we call the Map object, we center the view to our geometry:

Map.centerObject(geom)
Map

And that is it. Pretty cool! Although we haven’t done any analysis, I hope that this shows to you already how powerful GEE can be simply for visualization. What years ago meant (a) downloading a bunch of files, (b) loading them into QGIS and © finding good visualizations can now be done in just a few seconds. In addition, using the tools in the window, we can now also do some simple digitizations, etc. Now, what you should do is to load a few more datasets and get used to the environment. Below are a number of sources that will show you the richness of the available data. Try to build yourself a quick script that will allow you to quickly examine any region of the world in just a few seconds - a script that you are able to call whenever you want.

Data sources#

Further readings#