Introduction to GEE#

Background#

So far in this course, we have covered the use of python in handling geodata. Specifically, we have learned how to manage (large) raster and vector data and we got in contact with a large variety of tools and techniques. What all these had in common, was that usually you were provided with some kind of data, prepared by us. In addition, so far there was always enough computing power and disc space available for you. However, in your own research or in work you do outside of the university you will likely face the issue that you want to work with large data, but (a) won’t have the infrastructure to process them, and (b) may face restrictions because you want to access data as a private person. This is, why we want to expose you during the next few weeks to Google Earth Engine (GEE) as a huge data and processing engine, that you can use for your own work.

There are a lot of different tutorials and packages that help you doing some sort of analysis. Many of them use the language JavaScript , which GEE was initially prepared for, but increasingly you find tutorials and tools that use Python as an API. What these tools and tutorials all have in common is that they help you running some analysis in GEE and visualize the results. This can include time series analyses, image classifications, or simply the visualization of datasets (which there are a lot in GEE). They are designed to visualize the results directly in the browser, and, as we will learn, compute the results on-the-fly when zooming in/out or moving to a different place on the Earth. This is all pretty fantastic, and - as already said - there are a lot of tutorials out there, many of which we will provide links to. Though, GEE also has a number of limitations, such as a limited amount of algorithms implemented that can be applied. While for example in case of “simple” classifications this is not too big of a problem, in our daily research we often want to do more, for example designing new methods and tools that use image classifications or regressions as input. In the majority of these cases, GEE does not have the capabilities to flexibly implement and test new analysis methods. Thus, at some point in our analysis we need to export large amounts of pre-processed data from GEE to our own drives and work on them further. Here is, where it becomes tricky as GEE has mechnisms in its architecture that prevents exports of large data at once.

This part of the course therefore will not provide yet another tutorial of doing image classifications in GEE, but we want to continue with the general research line of large-scale processing. Specifically, we want to teach you in these sessions on how to use GEE as a large pre-processing-engine from which you export intermediate results to your own computer. Focussing on this part, will in our eyes nicely complement existing tutorials, which we will provide extensive links to

Introduction#

GEE is around since ~2012 and it received a lot of attention with the publication by Hansen et al. (2013) in Science. Since then, the number of studies using GEE has exploded and the data catalogue seems to be growing day after day.

../../_images/gee_overview.png

Fig. 5 Overview about the different components, that GEE offers. The figure is derived from the GEE-website).#

We will focus on the first two elements (from the left) and we will use python as our API. In short, what GEE provides to us, is:

  • An extensive archive of data, including many complete satellite image archives (e.g., Landsat, Sentinel-1/2, MODIS, Planet-NICFI mosaics) as well as land-cover products (e.g., global forest watch, ESA World-Cover) both at the global and national (primarily United States) scale.

  • Simple (e.g., linear regression, random forests) & more advanced processing algorithms, particularly in the satellite image domain (e.g., LandTrendr, CCDC)

  • Nearly unlimited computing power

On the opposite, what it does not provide to us:

  • advanced processing solutions, particularly with respect to remote sensing data processing (e.g., FORCE); but also a higher diversity of classification and rgression algorithm (e.g., gradient boosting regression)

  • Although GEE provides some detail on how it handles data and applies algorithms, for many instances it remains a “black box”.

  • Nice cartography (although this is getting better)

  • DIY-parallelization (“The price of liberation from these details is that the user is unable to influence them.” (Gorelick, et al. 2017))

  • Easy extrapolation across large geographic spaces and export/download of these processing results

Having these elements in mind, what we want to do in this course, is:

  • to advertise GEE as a great source to do large area processing (but not necessarily only)

  • Show its power, but also its limitations and develop ways to work around this

  • Motivate you to use the best of the two worlds: (1) Data archives and processing power in GEE, (2) Advanced functionality of local python processing and use of gdal and osr

There are many important aspects with regards to GEE that you will probably learn if you keep using it. Though, one fundamental principle is important for our understanding: the distinction between server- and client-side operations. In general, we have a client side library that will be translated into complex geospatial analyses into EE requests:

../../_images/server_client.png

Fig. 6 Schematic representation about the differentiation between server-side and client-side operations (Source: https://geohackweek.github.io/GoogleEarthEngine/).#

This is an important aspect, as we will have to always think about what a certain operation is done locally on your computer, and what operation we want GEE to do. In other words, on the client-side (i.e., our computers) we create and manipulate “proxy objects”, which do not contain any data, but are just handles for objects on the server-side. This is really important to consider, because it means that, compared to the processes we have been thinking so far, we hardly can create/modify variables step-by-step.

Data types and objects we use in our course#

Using the sources above, you will get extensive background knowledge on GEE. You will see, that many of the sources end with for example an image classification and a subsequent visualization - again: what we want to do in this course (with very limited time) is not to re-invent those tutorials and possibly create a terrible new version of it. Instead, we want to focus on doing these types of analyses and extract the data to our own drive - both for individual locations but also across large geographic extents. Yet, we still briefly introduce the basic elements of geographic features that will work with in the context of the course - we caution, that this is by no means complete and we refer you to one of the sources we provide you above. The most fundamental geographic data types we will be working with are Images and Features.

  1. Images: Images in GEE consist of bands and a dictionary of properties. Many images together form an ImageCollection. An ImageCollection could be for example all Landsast-9 imagery within a region of interest.

  2. Features: can generally considered vector elements (i.e., Points, Lines and Polygons). A feature consists of a Geometry and a dictionary of properties. Many features together form a FeatureCollection. An example for a FeatureCollection could be a collection of points.

In addition, there are a number of other important data structures in GEE, for example (including the syntax to create them):

  1. Dictionaries: ee.Dictionary({'e': np.e, 'pi': np.pi})

  2. Lists: ee.List([1, 2, 3, 4, 5])

  3. Dates: ee.Date('2022-12-06') or ee.Date.fromYMD(2017, 1, 13)

  4. Numbers: ee.Number(1) or ee.Number(np.pi)

  5. Strings: ee.String("Today is Nikolaus")

What is important - we already briefly mentioned it above - is that when you instantiate these in your ipynb, then these are containers that exist on the server side objects. This means that your jupyter server does not know anything about the objects in your script unless you explicitly request information about them. At the same time, without specifically requesting the information, GEE won’t compute them. You can imagine, that if your request is large (e.g., beause you have a large study area and you want to do a computation, such as the number of Landsat imagery inside the study area) the time to compute the message can be large. As such, you will need to be cautious with your requests. Generally, we will use two ways of requesting information:

  1. print() returns the petition (as a JSON) to the server

  2. *.getInfo() returns the contents of the container

With this information we are all set to go, and we can have a look at using GEE for some simple visualizations in python. We will use the pacakge geemap to do so. Until then, we recommend reading a bit further on data types and methods) to become more familiar with the structure of GEE.

Starting with the package geemap#

The package we will use for visualizations in this course is the package geemap. This package has become quite popular in the last years. The author/developer is Quisheng Wu, a professor at the University of Tenessee. Originally being a thematic expert in Wetland mapping, he has become an important developer for GEE using python. The strength of his tool is that it brings together the functionality of QGIS in a ipynb. Many tools and tricks for visualization are available through his package, and we really recommend checking it out. For example, the package allows to download TimeLapse videos (see some examples for what a TimeLaps is here) and also some image composites and image classifications. The package also builds on different visualization backends, all of which have a different functionality. You can find out more about this here. You may guess: the downside is that it is not perfect if we want to extract large amounts of data from GEE as the package also has to deal with memory and computation restrictions that GEE gives. So, not for all applications it is perfect - but for visualizing and creating some easy applications it is great, and I am using it a lot (e.g., for training data collection based on image interpretation).

We start by importing the package and by authenticating our computer with google. This basically creates a connection between you and your computer and stores the connection details in something similar to a cookie. This is done by the code line ee.Authenticate() and has only be execute once. What you need to do, however, every time you start working is to instantiate a GEE session via ee.Initialize(). Below is a good code chunk that you can possibly use as a standard chunk inside your scripts.

import ee
import geemap.foliumap as geemap
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

The last thing we do now with this session is to show how to create a map object and how to load some simple images, in this particular example the Global Forest Watch Data. We can subdivide this into several steps:

Initialization of a Map object#

Map = geemap.Map(zoom=10) # instantiates a map object with a zoom lelve

Load the GFW data#

The GFW data are stored as an image in GEE with multiple bands. We here want to load both the % tree cover as well as the annual loss layer. We clip the images based on our geometry. This is not strictly necessary, but is more for visualization purposes. After we defined the images, we add them to the layer. Have a look and see how we define the visulization parameters. IN addition, we load the GoogleMaps base layer so that we can see the GFW data on top of a satellite image.

geom = ee.Geometry.Rectangle([-61, -22, -60, -23])
tc = ee.Image("UMD/hansen/global_forest_change_2022_v1_10").select(['treecover2000']).clip(geom)
lossYR = ee.Image("UMD/hansen/global_forest_change_2022_v1_10").select(['lossyear']).clip(geom)
Map.add_basemap('SATELLITE')
Map.addLayer(tc, {'min': 0, 'max': 100, 'palette': ['black', 'green']}, 'Tree Cover')
Map.addLayer(lossYR, {'min': 1, 'max': 22, 'palette': ['yellow', 'red']}, 'Loss Year')
Map.addLayer(geom, {'color': '000000', 'width': 1, 'lineType': 'solid', 'fillColor': '00000000'}, 'Geometry')

Run the map#

Now, we visuaize the map. Important here to know is, only when calling the Map object, the request is actually sent to GEE to process the data. Until then, all the vriables we defined are only containers with the rules. Before we call the Map object, we center the view to our geometry:

Map.centerObject(geom)
Map

And that is it. Pretty cool! Although we haven’t done any analysis, I hope that this shows to you already how powerful GEE can be simply for visualization. What years ago meant (a) downloading a bunch of files, (b) loading them into QGIS and © finding good visualizations can now be done in just a few seconds. In addition, using the tools in the window, we can now also do some simple digitizations, etc. Now, what you should do is to load a few more datasets and get used to the environment. Below are a number of sources that will show you the richness of the available data. Try to build yourself a quick script that will allow you to quickly examine any region of the world in just a few seconds - a script that you are able to call whenever you want.

Sources for datasets in GEE#

Below are the, in my view, best overviews of the datasets you can access in GEE. Explore them - many of them also have some code snippets that give you hints on how to properly visualize the data.

  1. Google Earth Engine Data Catalog

  2. Awesome GEE datasets

Further readings#