Using GEE as a large data processing engine#
Introduction#
By now we have gained some experience in working with GEE. If you have gone through the Introduction and the chapter on image workflows, then you are already well equiped with techniques and tools that allow you to do some easy visualizations in GEE, up- and download features and feature collections, and also be able to do some advanced analyses, such as masking and reducing or image classifications.
What this (last) section will address, is the following: while GEE is meant to be run over large areas, it is much easier to get data for large areas and/or long time series down to your local machine. Very commonly, one receives a calculation timeout or a user memory issue - and in the beginning it is very frustrating, as one would expect that Google’s servers should be able to deal with large data. Yet, they do - the only problem is to get the data out. The built-in tools in the geemap
package are not really made for that, and many online courses and tutorials do not consider this issue. Though, one often wants to do some additional (secondary) analysis (e.g., advanced segmentation using the skimage
-library) for which the available methods in GEE do not suffice.
This chapter is specifically designed to provide some tools/techniques for the use with GEE that allow you to do exactly that. Much of these tools were born and developed through experience and by constantly being confronted with this problem. This also means, that they are not perfect and coded in the most efficient way. Keep this in mind, when going over this tutorial.
The elements, that we will apply in this context area:
How to split large tasks into smaller chunks?
How to access exported datasets in
GEEassets
andGDrive
?How to keep track of export tasks running submit tasks continuously?
As usual, however, we start the engine and load the geemap
package:
import ee
import geemap.foliumap as geemap
try:
ee.Initialize()
except Exception as e:
ee.Authenticate()
ee.Initialize()
Split large tasks into smaller chunks#
This is strongly connected to the lab we have been working on in the context of our vector sessions (here or here). The idea is to subdivide a point dataset into chunks of a relatively limited geographic extent, do the job/extract the data, and then puzzle them together again. Besides the chapters mentioned before, we need the following GEE techniques, that you can look for in the introduction to GEE:
How to create geometries - from scratch and from existing .gpkg files
How to convert local vector files into feature collections.
Have a look in the respective chapters. Here, we won’t repeat these, but encourage you to have a look at them yourself if you feel unsure.
Asset management#
In the chapter of image workflows we have learned that we can export datasets into assets. What is important to know here, is that we can do this both with raster (i.e., images) and vector data (i.e., ee.FeatureCollection()
). Two examples for such a code are presented below:
# Export vector data to assets
exporttask = ee.batch.Export.table.toAsset(
collection=, # add here the feature collection
description=, # add here a description
assetId=) # here comes the asset
exporttask.start()
The question here is how to best manage the assets? One way I like to do - and how it is probably easiest - is to work with a string
that guides you to your asset. In the case of this book/script, this could be for example:
asset = "projects/ee-matthiasbaumann84/assets/geopy"
Now, with some easy techniques we have learned already (e.g., list files, iterations), we can do the same using assets as we do for our local file system. In this concrete example, we can list all files inside the asset folder. The two files I prepared in the folder are the same shapefiles I used in the chapter on image classification in GEE. Important here is, that these are server-side objects that we can work with in GEE, so there won’t be any risk of running into memory and timeout issues (except for when using .getInfo()
).
Let’s get a list of the feature collections in the asset and see what we get (and how to download it)
fc_list = ee.data.getList({'id': asset})
print(fc_list)
[{'type': 'Table', 'id': 'projects/ee-matthiasbaumann84/assets/geopy/01_ROI_shape'}, {'type': 'Table', 'id': 'projects/ee-matthiasbaumann84/assets/geopy/02_RandomPoints_1000'}]
We have a list of assets (as expected). What we do now is the following: we will access the first element (i.e., the ROI), so that we can work with it. To do that we need to define the string indicating the asset as an ee.FeatureCollection()
:
fc = ee.FeatureCollection(fc_list[0]['id'])
fc
- type:FeatureCollection
- id:projects/ee-matthiasbaumann84/assets/geopy/01_ROI_shape
- version:1703063418961277
- id:Long
- system:index:String
- system:asset_size:7836
- type:Feature
- id:00000000000000000000
- type:Polygon
- 0:13.21344907908601
- 1:52.41519894344392
- 0:13.25764981406319
- 1:52.41519893048363
- 0:13.301850773898002
- 1:52.415198968888156
- 0:13.301850718069888
- 1:52.487324890321936
- 0:13.257649857986694
- 1:52.48732494115902
- 0:13.213449051922053
- 1:52.48732492617653
- 0:13.21344907908601
- 1:52.41519894344392
- id:1
Pretty simple, and also in-line with everything we have learned so far. We will have an exercise like this in the course, but think for the moment about the following questions:
How to use the
ee.batch.Export.table.toAsset()
function to export e.g., STMs for many points across large areas \(\rightarrow\) think again about grids and how they help you subdividing a large task into many smaller tasks.Using the
fc_list
and the possibility to iterate over it: would it be possible to merge many small assets into one single larger one? This would essentially mean the reversed step of 1. Have a look at how to merge multiple feature collections and then how to export those againAssume that you only want the result of 2. and you want to delete the results of 1. as they only are temporary elements that you won’t need anymore. Below is some code that helps you deleting an asset.
listToDelete = ee.data.getList({'id': asset})
for coll in listToDelete:
file = coll['id']
ee.data.deleteAsset(file)
Now you are ready to work with feature collections and images as assets in the same way you can do it in a local file system. This immensely helped me organizing large data (e.g., at the continental scale) and make use of GEE in form of a gigantic processing and data engine.
Google Drive Manangement#
You can of course do the same using the exports in to your google Drive. What we know already is how export images to your GDrive - we learned this in previous sections. Below is again an example for this, and you can imagine to do this also with featurecollections, right?
task = ee.batch.Export.image.toDrive(image=, # here comes the image object you want to export
folder='', # the folder in your Gdrive to work in
description= '', # some description for the export
region=, # some geometry that tells GEE the ROI.
scale=30) # the saptial resolution in meters for the export
task.start()
So far, so good. But how do we access our GDrive so that we can do the same operations (e.g., list files, download files, delete files) as we do with assets and in our local file system? To dot this, we need start with some parameters to be set and steps to be done. Specifically:
Install and load the package
pydrive
and its submodulesCreate a
client_secrets
file and authenticate with yur GDrive.Access the folder in your GDrive
Install and load pydrive
#
this is an easy one. You know how to install packages, and we need two submodules of this package. Afterwards you can authenticate with your google drive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
Before we can conenct and authenticate with our GDrive, we need to create our client_secrets
to be able to connect with our GDrive. To do this, you need to follow the steps outlined on this website or alternatively here. After that, store your client_secrets
file where you want and, and start the authentication:
GoogleAuth.DEFAULT_SETTINGS['client_config_file'] = "PATH_TO_YOUR_FILE/client_secrets.json"
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth?client_id=9291758365-jcrgrd0a3gfv10ipqfc5vq675ua5hve4.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&access_type=offline&response_type=code
Authentication successful.
Now we can start managing the data. To do this, however (and I have not found a better solution yet), is that we need to provide the physical address of your folder in your GDrive. For that reason I have create a folder in my GDrive with the name that clearly indicates that GEE stuff will be stored there (e.g., geopy
). Once you have defined that, you can get the physical adddress from the address field:
With this, we can now access the files in GDrive:
file_list = drive.ListFile({'q': "'1O_SIblsy57bc6G_9DYctnnQnab3nkiFz' in parents and trashed=false"}).GetList()
file_list
[GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH/permissions/me', 'etag': '"Yy5GsXS7YmvEvwOmp0KzkBLA6uk"', 'pendingOwner': False}, 'fileExtension': 'shx', 'md5Checksum': 'eec4664bbaeb1b157837a0c60b8cbdfb', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRTHp1aTdZeGxnckhDMGlVR29IMWxPbXN2WVY4PQ', 'copyable': True, 'etag': '"MTY3MjY3OTM1MjAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH&export=download', 'fileSize': '8100', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1NTTAt3xwO_-P0HZ0R9a-oOSPgXCnMnvH', 'title': '02_RandomPoints_1000.shx', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T17:09:12.000Z', 'modifiedByMeDate': '2023-01-02T17:09:12.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '8100', 'version': '1', 'originalFilename': '02_RandomPoints_1000.shx', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk/permissions/me', 'etag': '"-p-AOmPYeSOL2aUYnU2krufMQdU"', 'pendingOwner': False}, 'fileExtension': 'shp', 'md5Checksum': '4e3e98457fdf2ccb0202368610179ef7', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRdGJLL1JQYUxHRDFuMTJwN0pPOXBodC9JazRjPQ', 'copyable': True, 'etag': '"MTY3MjY3OTM1MjAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk&export=download', 'fileSize': '28100', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1lXxg6DAvttlP4_VwC8rdm4LmC2gokSGk', 'title': '02_RandomPoints_1000.shp', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.106Z', 'modifiedDate': '2023-01-02T17:09:12.000Z', 'modifiedByMeDate': '2023-01-02T17:09:12.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.106Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '28100', 'version': '1', 'originalFilename': '02_RandomPoints_1000.shp', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr/permissions/me', 'etag': '"rnebP488S1hVlkcEOChrIvxk66s"', 'pendingOwner': False}, 'fileExtension': 'prj', 'md5Checksum': 'c742bee3d4edfc2948a2ad08de1790a5', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRdk5EMUI1LzBJbmwxVGovSGdZeXlBMDY1ZEVVPQ', 'copyable': True, 'etag': '"MTY3MjY3OTM1MTAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr&export=download', 'fileSize': '145', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1y6dNDyT2YuW2jhoYlx7_-jBX_bx1snCr', 'title': '02_RandomPoints_1000.prj', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T17:09:11.000Z', 'modifiedByMeDate': '2023-01-02T17:09:11.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '145', 'version': '1', 'originalFilename': '02_RandomPoints_1000.prj', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo/permissions/me', 'etag': '"Xatx2jHR2JV8GOOCMxdfNc5WJcE"', 'pendingOwner': False}, 'fileExtension': 'dbf', 'md5Checksum': 'f4d6e8e9f131774c739e545c4bddc6fc', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRZU1kL25EbGsxRjNaZkFYRFdYQW84TVFoMWE4PQ', 'copyable': True, 'etag': '"MTY3MjY3OTM1MTAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo&export=download', 'fileSize': '21066', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1ytpac-M_M36uDd5XRHuEcK5poeJfk6zo', 'title': '02_RandomPoints_1000.dbf', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T17:09:11.000Z', 'modifiedByMeDate': '2023-01-02T17:09:11.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '21066', 'version': '1', 'originalFilename': '02_RandomPoints_1000.dbf', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA/permissions/me', 'etag': '"iLSqHlahRvHR0Qjjaj1mTGsAZ34"', 'pendingOwner': False}, 'fileExtension': 'shp', 'md5Checksum': '948f7db2e214e0371c1b49f780a0fbc4', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRRU02bVR0Z21USnhGZ3dNaGlGVmN6S2dJUTVZPQ', 'copyable': True, 'etag': '"MTY3MjY3NDQxNDAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA&export=download', 'fileSize': '236', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1sDd7ODR3HpHBQc8qEE6BZRodQ98YeECA', 'title': '01_ROI_shape.shp', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T15:46:54.000Z', 'modifiedByMeDate': '2023-01-02T15:46:54.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '236', 'version': '1', 'originalFilename': '01_ROI_shape.shp', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N/permissions/me', 'etag': '"KTenL6rfFHJGdDPTA6hNCCahz2c"', 'pendingOwner': False}, 'fileExtension': 'shx', 'md5Checksum': 'fe690051714c00499c00efc9eaea7de2', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRUHBSdVRDL3M5dDQyQ1ZFSnJWWFlTbVVYV2o4PQ', 'copyable': True, 'etag': '"MTY3MjY3NDQxNDAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N&export=download', 'fileSize': '108', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1rjIcwoYmyCJ06kngU7-EWy_hN57ta25N', 'title': '01_ROI_shape.shx', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T15:46:54.000Z', 'modifiedByMeDate': '2023-01-02T15:46:54.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '108', 'version': '1', 'originalFilename': '01_ROI_shape.shx', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO/permissions/me', 'etag': '"UrSc2gjj9I8evijv5xG2CCZEG5w"', 'pendingOwner': False}, 'fileExtension': 'dbf', 'md5Checksum': '19f50c389d308341b27af5f6487c93f3', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRZWJQSlhkY3d2YTgzT0pmNWFRYVdHelM4aERZPQ', 'copyable': True, 'etag': '"MTY3MjY3NDQxNDAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO&export=download', 'fileSize': '77', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1_Tw7-7hdSn9wWcbXfx2HDLzGQqoSUaGO', 'title': '01_ROI_shape.dbf', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T15:46:54.000Z', 'modifiedByMeDate': '2023-01-02T15:46:54.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '77', 'version': '1', 'originalFilename': '01_ROI_shape.dbf', 'capabilities': {'canEdit': True, 'canCopy': True}}),
GoogleDriveFile({'kind': 'drive#file', 'userPermission': {'id': 'me', 'type': 'user', 'role': 'owner', 'kind': 'drive#permission', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T/permissions/me', 'etag': '"yfBH8DYQ73OXEKpZAlF6sB5EBHo"', 'pendingOwner': False}, 'fileExtension': 'prj', 'md5Checksum': 'c742bee3d4edfc2948a2ad08de1790a5', 'selfLink': 'https://www.googleapis.com/drive/v2/files/1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T', 'ownerNames': ['Matthias Baumann'], 'lastModifyingUserName': 'Matthias Baumann', 'editable': True, 'writersCanShare': True, 'downloadUrl': 'https://www.googleapis.com/drive/v2/files/1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T?alt=media&source=downloadUrl', 'mimeType': 'application/octet-stream', 'parents': [{'selfLink': 'https://www.googleapis.com/drive/v2/files/1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T/parents/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'id': '1O_SIblsy57bc6G_9DYctnnQnab3nkiFz', 'isRoot': False, 'kind': 'drive#parentReference', 'parentLink': 'https://www.googleapis.com/drive/v2/files/1O_SIblsy57bc6G_9DYctnnQnab3nkiFz'}], 'appDataContents': False, 'iconLink': 'https://drive-thirdparty.googleusercontent.com/16/type/application/octet-stream', 'shared': False, 'lastModifyingUser': {'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}, 'owners': [{'displayName': 'Matthias Baumann', 'kind': 'drive#user', 'isAuthenticatedUser': True, 'permissionId': '07917266221281590687', 'emailAddress': 'matthias.baumann84@gmail.com', 'picture': {'url': 'https://lh3.googleusercontent.com/a/ACg8ocLu9eMixfRmjTw8KO3g_zH0jFrrzSHCVSMMbatP7SE9yEzm=s64'}}], 'headRevisionId': '0B9hZR9DKK3xRTXB1Tlp6NjB5OXFpNDRwU1A4R3N6TFowbVE4PQ', 'copyable': True, 'etag': '"MTY3MjY3NDExODAwMA"', 'alternateLink': 'https://drive.google.com/file/d/1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T/view?usp=drivesdk', 'embedLink': 'https://drive.google.com/file/d/1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T/preview?usp=drivesdk', 'webContentLink': 'https://drive.google.com/uc?id=1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T&export=download', 'fileSize': '145', 'copyRequiresWriterPermission': False, 'spaces': ['drive'], 'id': '1gCIXdHLUr435KBpAt4RrrG3P9CMJYr1T', 'title': '01_ROI_shape.prj', 'labels': {'viewed': True, 'restricted': False, 'starred': False, 'hidden': False, 'trashed': False}, 'explicitlyTrashed': False, 'createdDate': '2023-12-20T10:01:58.986Z', 'modifiedDate': '2023-01-02T15:41:58.000Z', 'modifiedByMeDate': '2023-01-02T15:41:58.000Z', 'lastViewedByMeDate': '2023-12-20T10:01:58.986Z', 'markedViewedByMeDate': '1970-01-01T00:00:00.000Z', 'quotaBytesUsed': '145', 'version': '1', 'originalFilename': '01_ROI_shape.prj', 'capabilities': {'canEdit': True, 'canCopy': True}})]
Pretty complicated output. But if you go once through this list, then you may see some pattern and dictionary keys that you recognize, such as the ['id']
and the ['title']
keys. Below is an example on how to use this for getting the files:
for file in file_list:
file_id = drive.CreateFile({'id': file['id']})
file_name =file['title']
print(file_name)
02_RandomPoints_1000.shx
02_RandomPoints_1000.shp
02_RandomPoints_1000.prj
02_RandomPoints_1000.dbf
01_ROI_shape.shp
01_ROI_shape.shx
01_ROI_shape.dbf
01_ROI_shape.prj
Last, we ammend this code in two ways:
For downloading the files in the list. To do this, we need to do some string-concatenation.
For deleting the files on your GDrive and for cleaning up GDrive
# 1. Download the files
outFolder = "PATH_TO_YOUR_LOCAL_FOLDER"
for file in file_list:
file_id = drive.CreateFile({'id': file['id']})
fname = file["title"] # This is the filename the file is stored on google drive
# Define output file name and download
outName = outFolder + fname
file_id.GetContentFile(outName)
for file in drive_list:
file_id = drive.CreateFile({'id': file['id']})
file_id.Delete()
And that is it. Now, we are ready to work with GDrive in the same way we know it from file systems. This has the advantage that we can think of loops (e.g., while-loops) that continuously check in GDrive whether new files arrived and the download them. This can be for example helpful, if you don’t have a paid GoogleOne account, and hence only limited storage capacities that might not be sufficient to store all data from your study area.
Managing number of processes#
The last bit we do is to look at some basic code lines to manage processes. The scenario is as follows: suppose you subdivide your study region using a Grid with a large number of grid cells (e.g., several hundreds). Now, you want to export STMs or a classification for all tiles to your GDrive and from there download them to your local computer. Below you find some basic elements. We describe the basic idea/functionality, but you probably need to adjust this to your personal processing engine.
GEE allows you to submit many tasks, but only a limited numnber (2-5) are processed in parallel. Now, the first reflex would be to submt all tasks and then let GEE process them all. Now, with unlimited GDrive storage this would be an option, but many people (incl. me) only have limted storage (e.g., 15GB in the free version, 100GB in the basic paid version). So, you will continuously submit tasks, and at the same time clean up your GDrive by downloading and deleting the files. Below is a loop construct that allows you doing this. The basic idea of this loop is as follows:
define a maximum number of tasks that you want GEE either have running or in a queue. I define this through
maxTasks = 10
, meaning that the sum of the two should not exceed 10.The larger
for
-loop then submits all tasks in theory but it checks if the number of current running/queued tasksn_tasks
is larger or equalmaxTasks
. If this is the case, it stops and sleeps for 60 seconds, before attemting it again.If
n_tasks < maxTasks
anonther tile will be submitted, and GDrive will be checked for new files that arrived. Those will be downloaded and stored locally. It looks a bit complicated at first, but in the end it is mostly task management. Have a look - be aware, that the variable names have not been defined previously in the code block, so you will have to think of how to manage the loop, and what the process is. In other words: this is a generic loop, which you will have to adapt!
maxTasks = 10
# Instantiate the number of tasks
n_tasks = 0
# Loop over the missing tile IDs
for tile in tileIDs:
while(n_tasks >= maxTasks):
time.sleep(60) # Check every 60 seconds whether a new task could be started when the Maximum number is reached
try:
task_list = str(ee.batch.Task.list())
n_running = task_list.count('RUNNING')
n_ready = task_list.count('READY')
n_tasks = n_running + n_ready
except:
time.sleep(5)
# Submit the next classification classification
# Begin of the code example you want to run
# -----------------
# end of the code example you want to run
task_list = str(ee.batch.Task.list())
n_running = task_list.count('RUNNING')
n_ready = task_list.count('READY')
n_tasks = n_running + n_ready
# Check in GoogleDrive whether new files arrived and download them
drive_list = drive.ListFile({'q': "'1O_SIblsy57bc6G_9DYctnnQnab3nkiFz' in parents and trashed=false"}).GetList()
for file in drive_list:
file_id = drive.CreateFile({'id': file['id']})
fname = file["title"]
outName = outFolder + fname
file_id.GetContentFile(outName)
file_id.Delete()
This is it. We have identified and explored the key elements needed for successively submit jobs to GEE and keep a loop running until all jobs are submitted, while at the same time downloading the files from GDrive.