We offer you three weeks of time for project work in teams. You should use this time to develop and execute a research objective in a guided setting. We will iteratively give feedback. The overall goal is to strengthen your competence in problem-solving and thus to increase your confidence in working with remote sensing and other geospatial datasets.


Project phase I

Goals

  • Narrow down research questions and objectives.
  • Draft a mini project proposal.
  • Connect to the HU-Desktop
  • First look at study site & data.

Today you will start working on your projects! As a first step, we need to define a clear research question, objective(s) and sketch a workflow. The outcome of this block will be a small project proposal. The proposal should be brief, but contain your motivation & objectives, study area & period, datasets (image data, training/validation) as well as a description of the main methods used for your analysis. Some of this information has already been briefly laid out in the project topics, but you should refine this further and modify the research goals as you see fit.


Practical guidelines

Here is how you get there in three broad steps:

1) Specifying the scope of your project:

  1. Specify a research question or hypotheses, around which you can develop one or two main objectives (e.g., one related to methods, one related to LULC or global change processes). You may wish to refine the objectives presented to you with the topics

  2. Search for relevant literature in your project context. Stick to recent publications in established journals, (e.g. Remote Sensing of Environment, Remote Sensing, Applied Earth Observation and Geoinformation, IEEE JSTARS).

  3. Evaluate the relevance of selected publications in more detail after you made a first selection and discuss this with your group. Go through the papers and identify aspects of the study which deserve further research. These should also guide you towards further relevant studies, or you may look at studies that cite a particular publication of interest.


2) Defining data requirements and methods:

  1. Specify the exact study area (location, extent) and time frame relevant for your analysis.

  2. Define the sensor and data product of your choice (e.g. Landsat BOA, Sentinel-2 BOA).

  3. Develop a class catalogue including clear and precise class definitions. You may consider a hierarchical approach and aggregate thematically irrelevant classes wherever it appears useful.

  4. Define the required temporal resolution (e.g. 5-yearly, annual, intra-annual).

  5. Choose a suitable method (e.g. compositing, spectral-temporal metrics, time series analysis).


3) Investigate study area and screen data:

  1. Connect to the HU-Desktop using the instructions posted on Moodle. The data for your project will be found on the “O” drive: O:\SS21_EO. For now this includes shapefiles delineating your study region.

  2. Look at your study site in Google Earth. Note the different vegetation types and patterns as well as how they change over time. You may also search online for other relevant data sources which may provide further context to your study, e.g. climate maps, fire frequency, or existing land cover maps.

  3. To screen all available image data, you may also visit the USGS EarthExplorer or any other data distribution service you might know. Browse through the available data for your study region and time frame. What is the image availability for your time period? How about cloudiness in the study region? Any other limiting factors?


Developing a research idea is an iterative process. Running into problems and realizing things don´t work as expected is part of research. In case you notice that your project idea will not work out, try to identify alternative pathways out of “the cloud”.

“The cloud” (Alon 2009)

Until 21 June: Submit your project overview in moodle.


Project phase II

Goals

  • Convert raw satellite data to analysis-ready datasets
  • Develop a protocol for training data collection

Practical guidelines

Until here, you successfully managed to develop a research idea and acquired remote sensing datasets for further analyses. The next steps will vary largely depending on the specific project workflow and the following guidelines might therefore not suit every project. Feel free to follow, modify, or ignore them as you like.

1) Produce analysis-ready data

In this step, you should perform the necessary pre-processing steps to prepare your downloaded data for the next steps. Often these data come as compressed archives with individual band and metadata files. To avoid unnecessary processing time in further steps, we need a cloud-masked stack covering the extent of our study region.

To get there, we need to do the following:

  • Unpack the archives.
  • Stack the image bands of interest.
  • Crop the image(s) to the study region extent.
  • Mask clouds & cloud shadows in the image(s).

2) Higher-level pre-processing

Producing meaningful features for classification or regression algorithms is a critical step towards good results. Make use of the methods and skillsets you acquired during the course. Depending on your research project, it might make sense to perform pixel-based compositing, or to calculate spectral-temporal metrics. Remember, many of these steps can be performed using spectral bands, band indices, or Tasseled Cap components. Take enough time to discuss these issues in your team before making a final choice.

3) Training data collection

You have already done this a couple of times by now. Here are some guidelines that can make your life easier.

  • Make sure your classes are clearly defined. Aggregate thematically irrelevant classes wherever it appears useful.
  • Define a minimum number of samples per class.
  • Set up an efficient work environment, e.g., a QGIS project containing your image data, other ancillary datasets, and a VHR baselayer (e.g., using the QuickMapServices plugin).
  • Make use of VHR data from Google Earth, but always consult the historic imagery toolbar to verify the acquisition dates of these data.
  • Save your progress after each collected point or polygon.
  • Explore the spectral and/or temporal characteristics of your training locations. Do they behave as expected? Are there outliers? Are your image features good discriminators for your classes of interest?

Until 30 June: Have your input features and training data ready, so we can proceed with producing results. Decide for an algorithm to use in your classification/regression problem. Discuss the parameterization of these algorithms in your team, i.e., which are the required parameters and how to determine useful parameter settings.


Project phase III

Goals

  • Generate, investigate, and improve your results
  • Design and perform an accuracy assessment
  • Prepare a project presentation

Practical guidelines

1) Produce results using classification or regression

Depending on your problem, choose a suitable classification / regression algorithm. Most of you will probably see RandomForest as the logical option. If it is of interest to you, try other classifiers, e.g., Support Vector Machines. Aim to make informed decisions during the parameterization of the algorithm, e.g., by conducting sensitivity analysis based on OOB errors or cross-validation. Be careful of overfitting.

Once your model is trained, perform a prediction. Write your results to disk and take some time for visual assessment of the result in a GIS environment. Some of you may encounter frequently occurring errors, such as misclassification of specific land surfaces. To overcome these, try to understand where the errors come from.

  • Is the error a result of erroneous or incomplete training data?
  • Can you think about additional input features which allow the classifier to better separate your classes?
  • Could you revise your class nomenclature in order to aggregate the confused classes while still being able to answer your research questions?
  • Is post-processing a useful idea? You could, e.g., remove illogical class-transitions in change analyses, or use a spatial filter in order to remove “salt and pepper” effects?

In some of the above cases, it will be necessary to train a new model and conduct another prediction. Re-iterate if necessary.

2) Validation

Perform an accuracy assessment following the method for area-adjusted accuracy assessment discussed in this course. The materials for session 06 contain a lot of information on the broad steps.

Broadly speaking, you can perform an area-adjusted accuracy assessment in case you are able to generate reference information at any random location of your study area from available datasets. If you can, the use of a stratified random sample is advised in order to control for the number of samples used per class. A method by Cochran (1977) can help you decide how many samples should overall be collected. See the second tab of the Excel spreadsheet used earlier.

Feel free to use any software environment or tool for the area-adjusted accuracy assessment, including the Excel spreadsheet. Remember that it was designed for a four class-problem, so you need to adjust it if necessary. If you do, please share the adjusted tables with the other groups.

In some cases, area-adjusted accuracy assessment is not applicable, e.g., when the classes of interest cannot be identified from available data sources. A common example are crop types or tree species. In these cases, you have two options:

  • Split your reference (e.g., field) data into training and validation datasets. Make sure to make the two datasets as independent as possible, e.g., by introducing a minimum distance criterion, or by avoiding training and validation samples from the same polygon.

  • Rely on cross-validation, e.g., k-fold cross-validation. This could be beneficial if, e.g., the number of reference datapoints is too small to allow for splitting.

Most importantly, be aware of which validation strategy is appropriate for your case and make sure you understand why.

3) Presentation

During the final session of Earth Observation, we will organize a mini-symposium. We kindly ask the presenting groups to upload their presentation in advance. Presentations must consist of at most 6 slides and include project background and study area, a comprehensive graph of the project workflow, three main results, as well as three discussion items. Please use the conference slide template for your presentation.

The schedule is very tight and presenters are urged to strictly stay within pre-defined time slots. For every group, an oral presentation of 15 minutes is scheduled. Presentations exceeding this time limit will be interrupted. The organizers further grant a maximum of 5 minutes for discussion.

Until 12 July: Prepare your presentation in PowerPoint and PDF format and upload them as a *.zip file on moodle.


Concerning the MAP

Your project work will result in a written homework in the style of a scientific article. The written work should follow the typical structure of a scientific paper (Abstract, Introduction, Data & Methods, Results, Discussion, and Conclusion). Feel free to screen online materials for scientific writing guidelines.

The work should contain a maximum of 20,000 characters (not counting spaces, references, and appendix). To allow for transparent grading, we kindly ask you to disentangle the individual contributions in a tabular format containing the respective members contribution to this part of the work (in %):

Component Name 1 Name 2 Name 3 Name 4
Abstract + Introduction XX % XX % XX % XX %
Methods + Results XX % XX % XX % XX %
Discussion + Conclusion XX % XX % XX % XX %
Analyses (Code / Documentation) XX % XX % XX % XX %

The submission deadline for the MAP is 31 August 2021.



Copyright © 2020 Humboldt-Universität zu Berlin. Department of Geography.