Getting data from Google Earth Engine#
In the previous section we have looked at the different data types, how to create them, filter them and visualize them. This was already great, but what we want is to either get these data for some further local analyses, or to actually apply some operations to the data in GEE
. The latter, we will cover in a different chapter. Here, we will show some basic ways to quickly extract data from GEE based on information we have stored locally.
To better understand how we can exxtract information, we need to reiterate the following: all GEE
data types are server-side objects that are created and processed entirely on GEE
’s cloud infrastructure. This means that your Jupyter server does not know anything about the content of ee-objects (e.g., ee.Image()
, ee.Feature()
) in your script unless you explicitly request information about them. At the same time, you must explicitly request the information in order for GEE
to compute it. Earth Engine provides two different processing environments for you to request information: (1) an interactive environment optimized for speedy and small requests, and (2) a batch environment optimized for high-latency parallel processing of large amounts of data. In this section, we will introduce the interactive environment, while later in the course we will exemplify the use of batch processing. First, we initialize our GEE
session:
import ee
try:
ee.Initialize()
except Exception as e:
ee.Authenticate()
ee.Initialize()
Interactive environment#
Vector data#
The interactive environment is ideal for working in .ipynb
, like we do in the course. This is, because through the structure of individual cells (instead of a continuous script) we can look at intermediate results after each operation. Let’s assume, we have our ee.FeatureCollection
from our previous chapter that contains two points: fc
fc
<ee.featurecollection.FeatureCollection at 0x285a6442fe0>
The output here indicates, that this is a server-side object. If we had not created the feature collection ourselves, we would not know what this variable fc
contains. To access the information of the object, we can call it in two ways. (1) Using print()
we get a representation of a server-side object to the client. This is useful e.g., when we only want to understand what a variable contains but we do not want to extract information from it and store it as variable:
print(fc)
ee.FeatureCollection({
"functionInvocationValue": {
"functionName": "Collection",
"arguments": {
"features": {
"arrayValue": {
"values": [
{
"functionInvocationValue": {
"functionName": "Feature",
"arguments": {
"geometry": {
"functionInvocationValue": {
"functionName": "GeometryConstructors.Point",
"arguments": {
"coordinates": {
"constantValue": [
-60,
-20
]
}
}
}
},
"metadata": {
"constantValue": {
"ID": 1
}
}
}
}
},
{
"functionInvocationValue": {
"functionName": "Feature",
"arguments": {
"geometry": {
"functionInvocationValue": {
"functionName": "GeometryConstructors.Point",
"arguments": {
"coordinates": {
"constantValue": [
-61,
-21
]
}
}
}
},
"metadata": {
"constantValue": {
"ID": 2
}
}
}
}
}
]
}
}
}
}
})
If we want to actually create a client-side copy of the server-object, we need to call getInfo()
. This command fetches the actual data or metadata of an Earth Engine object from the server to our client.
fc_client = fc.getInfo()
fc_client
{'type': 'FeatureCollection',
'columns': {'ID': 'Integer', 'system:index': 'String'},
'features': [{'type': 'Feature',
'geometry': {'type': 'Point', 'coordinates': [-60, -20]},
'id': '0',
'properties': {'ID': 1}},
{'type': 'Feature',
'geometry': {'type': 'Point', 'coordinates': [-61, -21]},
'id': '1',
'properties': {'ID': 2}}]}
You probably recognize the structure of the object: it is a dictionary. This is one important reason why we have been re-iterating the importance of dictionaries and their suitability. In GEE
all information we acquire through getInfo()
come in form of dictionaries (more precisely: geoJSON objects). All that that would be left now, is to convert the dictionary into e.g., a .gpkg
file to store the files locally. Now, since we have started creating this online-book, GEE
has become much more comfortable in extracting information. The most recent development are so-called Data converters. They address the increasing popularity in of using python and allow you getting data easily into your python session. Data converters convert Earth Engine objects into Python objects (e.g., Pandas DataFrame and xarray) for interactive exploration and visualization. In case of our fc
we would for example want to convert it into a data frame that we can use in pandas:
df = ee.data.computeFeatures({
'expression': fc,
'fileFormat': 'PANDAS_DATAFRAME'})
df
geo | ID | |
---|---|---|
0 | {'type': 'Point', 'coordinates': [-60, -20]} | 1 |
1 | {'type': 'Point', 'coordinates': [-61, -21]} | 2 |
Pretty cool! Now with tools you learned in the session on pandas you could work with the data and prepare some visualizations. Now, this table here does not contain any meaningful information, but we will learn some more tools through which we can actually extract information.
Raster data#
Now, let’s quickly move towards raster data. We use the Global Forest Watch Data as an example to extract information. I pre-select the bands we have used in the previous section: treecover2000
and lossyear
:
gfw = ee.Image("UMD/hansen/global_forest_change_2023_v1_11").select(['treecover2000', 'lossyear'])
gfw.getInfo()
{'type': 'Image',
'bands': [{'id': 'treecover2000',
'data_type': {'type': 'PixelType',
'precision': 'int',
'min': 0,
'max': 255},
'dimensions': [1440000, 560000],
'crs': 'EPSG:4326',
'crs_transform': [0.00025, 0, -180, 0, -0.00025, 80]},
{'id': 'lossyear',
'data_type': {'type': 'PixelType',
'precision': 'int',
'min': 0,
'max': 255},
'dimensions': [1440000, 560000],
'crs': 'EPSG:4326',
'crs_transform': [0.00025, 0, -180, 0, -0.00025, 80]}],
'version': 1711144720400111,
'id': 'UMD/hansen/global_forest_change_2023_v1_11',
'properties': {'system:time_start': 946684800000,
'system:footprint': {'type': 'LinearRing',
'coordinates': [[-180, -90],
[180, -90],
[180, 90],
[-180, 90],
[-180, -90]]},
'system:time_end': 1672444800000,
'system:asset_size': 1358313037330}}
We can see that only the metadata have been converted into a client-side object, whereas the exact image data were omitted. This is for good reason, as you can imagine that this would pretty quickly result in a massive amount of data that needs to be transfered. In the example here - without any geographic focus - we would have requested a global coverage of a raster dataset at 30m spatial resolution for two bands!! Instead, what we now get here are some basic information, such as the (a) the extent of the dataset, (2) the data format, (3) the coordinate system etc. Already useful - take a minute and check out the information!
Now, similar to the case of vector data we can also enjoy the receent development of data converters here and get the data in form of a numpy structured array. We are making a test and run it using the polygon_geom
of the previous chapter, which we first convert into a ee.FeatureCollection()
:
polygon_geom = ee.Geometry.Polygon([[[-60, -20], [-62, -20], [-62, -22], [-60, -22], [-60, -20]]])
polygon_feat = ee.Feature(polygon_geom, {'ID': 3})
polygon_fc = ee.FeatureCollection([polygon_feat])
gfw_sa = gfw.select(['lossyear']).clipToBoundsAndScale(
geometry=polygon_geom, scale=30)
gfw_sa_npy = ee.data.computePixels({
'expression': gfw_sa,
'fileFormat': 'NUMPY_NDARRAY'})
---------------------------------------------------------------------------
HttpError Traceback (most recent call last)
File c:\Users\baumamat\AppData\Local\miniforge3\envs\py310\lib\site-packages\ee\data.py:406, in _execute_cloud_call(call, num_retries)
405 try:
--> 406 return call.execute(num_retries=num_retries)
407 except googleapiclient.errors.HttpError as e:
File c:\Users\baumamat\AppData\Local\miniforge3\envs\py310\lib\site-packages\googleapiclient\_helpers.py:130, in positional.<locals>.positional_decorator.<locals>.positional_wrapper(*args, **kwargs)
129 logger.warning(message)
--> 130 return wrapped(*args, **kwargs)
File c:\Users\baumamat\AppData\Local\miniforge3\envs\py310\lib\site-packages\googleapiclient\http.py:938, in HttpRequest.execute(self, http, num_retries)
937 if resp.status >= 300:
--> 938 raise HttpError(resp, content, uri=self.uri)
939 return self.postproc(resp, content)
HttpError: <HttpError 400 when requesting https://earthengine.googleapis.com/v1/projects/earthengine-legacy/image:computePixels? returned "Total request size (110350296 bytes) must be less than or equal to 50331648 bytes.". Details: "Total request size (110350296 bytes) must be less than or equal to 50331648 bytes.">
During handling of the above exception, another exception occurred:
EEException Traceback (most recent call last)
Cell In[14], line 3
1 gfw_sa = gfw.select(['lossyear']).clipToBoundsAndScale(
2 geometry=polygon_geom, scale=30)
----> 3 gfw_sa_npy = ee.data.computePixels({
4 'expression': gfw_sa,
5 'fileFormat': 'NUMPY_NDARRAY'})
File c:\Users\baumamat\AppData\Local\miniforge3\envs\py310\lib\site-packages\ee\data.py:952, in computePixels(params)
948 params['fileFormat'] = _cloud_api_utils.convert_to_image_file_format(
949 converter.expected_data_format()
950 )
951 _maybe_populate_workload_tag(params)
--> 952 data = _execute_cloud_call(
953 _get_cloud_projects_raw()
954 .image()
955 .computePixels(project=_get_projects_path(), body=params)
956 )
957 if converter:
958 return converter.do_conversion(data)
File c:\Users\baumamat\AppData\Local\miniforge3\envs\py310\lib\site-packages\ee\data.py:408, in _execute_cloud_call(call, num_retries)
406 return call.execute(num_retries=num_retries)
407 except googleapiclient.errors.HttpError as e:
--> 408 raise _translate_cloud_exception(e)
EEException: Total request size (110350296 bytes) must be less than or equal to 50331648 bytes.
This error here is left by intention. The important line is the one on the very bottom:
EEException: Total request size (110350296 bytes) must be less than or equal to 50331648 bytes.
It points to the fact that there is a maximum size of data that can be requested through GEE
at a time. A very good limitation, as otherwise even Google-Servers may collapse. All packages and wrapper functions (e.g., geemap or Xee) face the same issue, and will throw an error as soon as the requeted data size is too large. However, this also means that for large-scale applications - which are at the core of this course - we have to re-think using the functions (hint: in case of the vector downloads above the same data limitation applies). This will be part of the lab. For showing that the function actually works, we will just make the polygon a bit smaller:
# Build a smaller polygon
polygon2_geom = ee.Geometry.Polygon([[[-60, -20], [-61, -20], [-61, -21], [-60, -21], [-60, -20]]])
polygon2_feat = ee.Feature(polygon2_geom, {'ID': 1})
polygon2_fc = ee.FeatureCollection([polygon2_feat])
# Run the extraction
gfw_sa = gfw.select(['lossyear']).clipToBoundsAndScale(
geometry=polygon2_geom, scale=30)
gfw_sa_npy = ee.data.computePixels({
'expression': gfw_sa,
'fileFormat': 'NUMPY_NDARRAY'})
gfw_sa_npy.shape
(3715, 3711)
This seemed to have worked - and in addition really fast. So, it is something we can work with easily using the tools and methods you have learned previously! Carefully examine the functions again, and then already think of how you could scale it up!