Best-pixel compositing


Learning goals

  • Understand the fundamentals of best-pixel image compositing
  • Parameterize a compositing function and create your own composite from multiple Landsat images

Background

The availability of Landsat images varies by region. Even in areas where observation density is high, such as North America, coverage can be reduced by clouds and quality of observations can be impacted by haze or sensor saturation.

Global availability of Landsat images. Source: Wulder et al. 2016; https://doi.org/10.1016/j.rse.2015.11.032

Using best-pixel compositing, we can integrate multiple observations (i.e., from multiple Landsat images) by combining the ‘best’ observations to create one consistent image composite. One of the first global Landsat composites was produced using the WELD compositing approach (Roy et al. 2011, https://doi.org/10.1016/j.rse.2009.08.011).

Monthly global WELD surface reflectance composite for June 2009. Source: https://worldview.earthdata.nasa.gov/

Pixel-based compositing allows us to create cloud-free, radiometrically and phenologically consistent image composites that are contiguous over large areas. A set of parameters is used to determine the observations best suited for analysis. These parameters can, e.g., include the distance to clouds in the image, or the temporal proximity to a target day of the year (DOY). Defining scoring functions allows for a flexible parametrization according to user’s needs and study area characteristics. Different parameters can also be weighted according to their relevance for a given application.

Creating one best-pixel composite from three images.


Best-pixel compositing in seven steps

1) Determine compositing parameters.

Common compositing parameters include the target DOY (e.g. June 15 +/- 50 days), target year (e.g., 2015 +/- 1 year), and distance of a given pixel to clouds (e.g. 100 pixels away from clouds). In the next steps, we will use these parameters to determine the ‘best’ pixel for our composite.

2) Calculate image suitability.

We will first calculate DOY and year suitability scores for each image. These scores range from 0 to 1 and are based on the image acquisition year and DOY.

Five images to be used for best-pixel compositing and their DOY and year offsets

We will use a Gaussian function to determine the DOY suitability of an observation. In the plot for the DOY function below, we’re choosing a parameterization that favors observations closest to the target DOY and assign a suitability score of 0 for observations that are 50 days from the target date. For example, a May 16 2015 observation is 30 days from the target DOY and receives a suitability score of 0.19.

A linear function determines the suitability score for the year of observation. Again, observations close to the target year are assigned high values, while observations acquired five or more years from the target year are considered unsuitable. The resulting suitability score for the May 15 2015 observation is 1.

Scoring functions for the DOY and year suitability score

3) Calculate pixel-level suitability.

Next, we calculate each pixel’s suitability score, again ranging from 0 to 1. The only pixel-level score we will consider here is the distance to clouds.

Distance to next cloud in pixels

To minimize the likelihood of pixels being affected by clouds, we favor observations further away from clouds. The linear cloud distance scoring function below assigns a suitability score of 1 as soon as the distance to the next cloud is equal or greater than 100 pixels. An observation that has a distance of 60 pixels (approximately 1800m) to the next cloud receives a cloud distance suitability score of 0.56.

Linear function for the cloud suitability score

4) Define which criteria are most important for your application.

We can weigh the different parameters according to our needs by defining weights (W) for each of the suitability scores. For instance, by prioritizing the DOY score over the year score.

\[W_{DOY} = 0.5\] \[W_{year} = 0.2\] \[W_{CloudDist} = 0.3\]

5) Calculate a score for each pixel in each image.

We can use the weighted sum of suitabilities (S = score):

\[score = S_{DOY} * W_{DOY} + S_{Year} * W_{Year} + S_{CloudDist} * W_{CloudDist}\]

Using the values from the example above yields:
\[score = 0.19 * 0.50 + 1.00 * 0.20 + 0.56 * 0.30\] \[score = 0.46\]

Resulting suitability scores for each image

6) Use the best observation (i.e. the highest score) for each pixel to create the final composite.

7) Evaluate the results by checking for seasonal/annual consistency and cloud distance (and re-iterate).

Final Landsat composite and Pixel Based Composite (PBC) information


Assignment

The goal of this week’s assignment is to combine several images into best-pixel composites using a parametric compositing function. We provide the data for this assignment in our repository. After unpacking, you will find the following folders:

  • sr_data: a total of 43 cloud-masked bottom of atmosphere image chips for our study region, acquired in various seasons, years, and with different degrees of cloud cover.
  • cloud_dist: data on the distance (in pixels) to the closest cloud for each of the BOA images.
  • fmask: the cloud masks used to mask the BOA images and to derive the cloud distance layers.

Today we will be working with pre-defined scripts. We provide both a script containing the compositing function and a main script which sources and executes the compositing function. Outsourcing functions - especially complex ones - improves the readability of scripts and allows for using the same functions in different scripts. Check the help for source() if you are not familiar with sourcing R code in your script. You can find both scripts below. Please copy them into an empty R script and save to disk.

Click here to see the parametric compositing function
#############################################################################
# MSc Earth Observation Exercise 4
# Function for creating cloud-free composites of multiple Landsat images
# Requires an input data.frame (here img_list) and eight compositing 
# parameters. Please see exercise sheet for further details.
#############################################################################

#############################################################################
# Loading required packages here...
library(raster)
library(lubridate)
library(ggplot2)

# Change raster options to store large rasters in temp files on disk
# Only use if function fails due to memory issues!
# rasterOptions(maxmemory = 1e12)

#############################################################################
# Function definition starts here
parametric_compositing <- function(img_list, target_date, 
                                   W_DOY, W_year, W_cloud_dist, 
                                   max_DOY_offset, max_year_offset, 
                                   min_cloud_dist, max_cloud_dist) {
  #...
  tic <- Sys.time()
  print(paste('Start of compositing process: ', tic))
  print(paste('Target date: ', target_date))
  
  # Extract target DOY and year from target_date
  target_DOY <- yday(target_date)
  target_year <- year(target_date)
  
  #...
  if(sum(W_DOY, W_year, W_cloud_dist)!=1) { stop('Error: something wrong.') }
  
  #############################################################################
  # Calculate the scores for the DOY, year, and cloud distance criteria
  print('Calculating compositing scores')
  
  ### DOY
  #...
  img_list$DOY_score <- exp(-0.5 * ((img_list$DOY - target_DOY) / (max_DOY_offset/3))^2)
  
  # Remove DOYs outside valid range
  img_list$DOY_score[abs(img_list$DOY-target_DOY)>max_DOY_offset] <- NA
  
  # Create data frame with daily score values 
  score_fct <- data.frame('DOY' = c(1:365),
                          'offset' = abs(target_DOY - c(1:365)),
                          'score' = exp(-0.5 * ((c(1:365) - target_DOY) / (max_DOY_offset/3))^2))
  
  ### Year
  #...
  img_list$year_score <- 1-(abs(target_year - img_list$year) / max_year_offset)
  
  if (max_year_offset == 0) {img_list$year_score[img_list$year==target_year] <- 1}
  img_list$year_score[img_list$year_score < 0] <- NA
  
  # Visualize DOY scoring function and available images
  p <- ggplot(score_fct, aes(x=DOY, y=score)) +
          geom_line() +
          geom_point(data=img_list, aes(x=DOY, y=DOY_score, color=as.factor(year))) +
          scale_y_continuous('Score') +
          scale_x_continuous('DOY') +
          scale_color_discrete('Acquisition years') +
          theme_bw()
  print(p)  
  
  # Get candidate images within max_DOY_offset and max_year_offset
  ix <- which(!is.na(img_list$DOY_score) & !is.na(img_list$year_score))
  if (length(ix)>1) { print(paste(length(ix),  'candidate images selected, calculating scores.')) }
  if (length(ix)<2) { stop('Another error because something is wrong.') }
  
  # Stack cloud distance layers of candidate images and reclassify 
  # values < min_cloud_dist to NA, and values > max_cloud_dist to max_cloud_dist
  print('Reclassifying cloud distances')
  cloud_dist <- stack(as.character(img_list$cloud_dist_files[ix]))
  cloud_dist <- reclassify(cloud_dist, rcl=c(0, min_cloud_dist, NA), right=NA, datatype='INT2S')
  cloud_dist <- reclassify(cloud_dist, rcl=c(max_cloud_dist, sqrt(nrow(cloud_dist)^2 + ncol(cloud_dist)^2), max_cloud_dist), right=NA, datatype='INT2S')
  
  #...
  cloud_score <- (cloud_dist - min_cloud_dist) / (max_cloud_dist - min_cloud_dist)
  
  #...
  obs_score <- img_list$DOY_score[ix] * W_DOY + img_list$year_score[ix] * W_year + cloud_score * W_cloud_dist
  
  #...
  select <- which.max(obs_score)
  
  #...
  candidates <- unique(select)
  
  #############################################################################
  # Fill composite image with pixels from the candidate images
  for (i in candidates){
    
    #...
    fill_image <- brick(as.character(img_list$image_files[ix[i]]), datatype='INT2S')
    
    #...
    if (i == min(candidates)) { 
      composite <- brick(fill_image, values=FALSE) 
      dataType(composite) <- 'INT2S'
      values(composite) <- 0
    }
    
    print(paste0('Filling raster with acquisition from date ', img_list$date[ix[i]]))
    fill_image.masked <- mask(fill_image, select, maskvalue=i, inverse=T, updatevalue=0, datatype='INT2S')
    fill_image.masked[is.na(fill_image.masked)] <- 0
    composite <- composite + fill_image.masked
    
  }
  
  #...
  composite_na <- mask(composite, select, maskvalue=NA, datatype='INT2S')
  
  #############################################################################
  #...
  print(paste('Remaining NAs: ', round(freq(composite_na[[1]], value=NA)/ncell(composite_na[[1]])*100, digits=3), ' %'))
  
  #...
  rcl_DOY <- matrix(ncol=2, data=c(candidates, img_list$DOY[ix[candidates]]))
  select_DOY <- reclassify(select, rcl_DOY, datatype = 'INT2S')
  
  #...
  rcl_year <- matrix(ncol=2, data=c(candidates, img_list$year[ix[candidates]]))
  select_year <- reclassify(select, rcl_year)
  
  #...
  output <- stack(composite_na, select_DOY, select_year)
  print(paste('End of compositing process: ', Sys.time()))
  
  #...
  return(output)
  
}
Click here to see the script calling the compositing function
#############################################################################
# MSc Earth Observation Assignment 4
# [Your Name]
#############################################################################

#############################################################################
library(rgdal)
library(raster)
library(lubridate)
library(ggplot2)
source('') #path to the parametric_compositing function including filename.R

# In case you run into memory issues
# change raster options to store large rasters in temp files on disk
# rasterOptions(maxmemory = 1e12)

######## Define the folder that contains your data...
data.path <- 'course.dir/S04/data/'

#############################################################################
# 1)
#############################################################################

sr <- list.files(paste0(data.path, 'sr_data'), pattern='.tif$', full.names=T, recursive=F)
fmask <- list.files(paste0(data.path, 'fmask'), pattern='.tif$', full.names=T, recursive=F)
cd <- list.files(paste0(data.path, 'cloud_dist'), pattern='.tif$', full.names=T, recursive=F)

sta <- nchar(paste0(data.path,'sr_data/LT05228082')) + 1
end <- sta + 6

dates <- as.Date(substr(sr, sta, end), format='%Y%j')

sr.sorted <- sr[order(dates)]
cd.sorted <- cd[order(dates)]

img_list <- data.frame('image_files'=as.character(sr.sorted), 
                       'cloud_dist_files'=as.character(cd.sorted),
                       'date'=sort(dates), 
                       'DOY'=yday(sort(dates)), 
                       'year'=year(sort(dates)))

#############################################################################
# 2)
#############################################################################
target_date_1 <- ymd('YYYYMMDD')
target_date_2 <- ymd('YYYYMMDD')

W_DOY <- 0.0
W_year <- 0.0
W_cloud_dist <- 0.0
  
max_DOY_offset <- 0
max_year_offset <- 0

min_cloud_dist <- 0
max_cloud_dist <- 0

composite_1 <- parametric_compositing(img_list, target_date_1, 
                                       W_DOY, W_year, W_cloud_dist, 
                                       max_DOY_offset, max_year_offset, 
                                       min_cloud_dist, max_cloud_dist)

1) Parameterization

The function above allows you to produce cloud-free best-pixel composites for a pre-defined target DOY. This function requires one input data.frame (see code template on how to create it) and eight input parameters:

  • img_list: a data frame containing five variables for each of the Landsat images:

    $image_files: the full paths to the files in …sr_data.
    $cloud_dist_files: the full paths to the files in …cloud_dist.
    $date: the acquisition day in Date format (YYYY-MM-DD).
    $DOY: the acquisition day of the year.
    $year: the acquisition year.

  • target_date: the target date for your composite in Date format (YYYY-MM-DD)

  • W_DOY, W_year, W_cloud_dist: weights for the three available parameters DOY, year and distance to clouds. Must be scaled between 0 and 1 and sum up to 1, the higher the weight, the higher the importance of the criterion.

  • max_DOY_offset, max_year_offset: Thresholds for the maximum allowed differences between target DOY and acquisition DOY, as well target year and acquisition year. Images exceeding these thresholds (further away in time) will be fully ignored. For instance, max_year_offset = 0 will not allow observations from a year other than the target year. By choosing these parameters, you will determine whether you prefer seasonal consistency (close to target DOY but from different years) over annual consistency (observations from same year but potentially distant DOYs). Discuss the parametrization in your group.

  • min_cloud_dist, max_cloud_dist: The minimum and maximum distance to clouds. min_cloud_dist = 10 will exclude all observations which are less than 10 pixels away from a cloud. The cloud scores are linearly scaled between the minimum (score = 0) and maximum cloud distance (score = 1). Pixels with distances above max_cloud_dist will receive a score of 1.


2) Defining target dates and parameters

Relying on last week´s insights during training data collection, define two target days of the year (DOY) to capture contrasting phenological stages of the different forest types. These will be used as target_date parameters later on. Next, make a decision concerning the compositing parameters explained in the background section.


3) Exploring the script and adding documentation

  1. First, let´s try to understand how the function operates. Open the parametric_compositing.R script and take time to read through it in groups. Define the necessary parameters and objects (see above) and run the script line by line. Investigate the outcomes of each line and discuss questions in your group. Seek the help pages of functions you don´t know.

  2. Next, run the parametric_compositing() function with your parameters and write the result to disk. Include the target DOY in the filename. While the function executes, proceed with the next assignment.

  3. The developer did not spend sufficient time on the documentation. Make the script a bit more user-friendly by adding missing comments (#...). Make sure your comments explain what happens in each step of the function, and why. Are there bugs or sections which you think can be improved?


4) Visual inspection and evaluation of results

Visually inspect the quality of your compositing results in QGIS. Look at the bands containing the DOY and year flags (bands 7 and 8). What worked out well. What did not? How could the quality of the composites be improved? Re-iterate with different parameters if you wish.


5) Voluntary assignment: Improving the user-friendliness of the script

Make the compositing function more user friendly. Insert a couple of plot and print commands to enable the users to follow the progress of the compositing while the function is running.

For instance, print() how many images were used for the final composite, their acquisition dates, etc. Also, you might want to plot() the composited image after each iteration. You could add further status messages telling the user how much time single steps took.

Don´t forget to save the script and re-run the source() command in your R script to update the function after you made these changes.


6) Voluntary assignment: Second run for target date 2

Repeat the compositing for the second target DOY you specified in 2) and write the results to disk.


Reading materials

For the next session, please read the following paper:

Maxwell et al. (2018): Implementation of machine-learning classification in remote sensing: an applied review. International Journal of Remote Sensing 39(9), 2784-2817

In this review, Maxwell and colleagues outline the use of several machine learning classification algorithms for application in remote sensing. Please read the article and note questions and points to discuss. Feel free to omit the details of ANNs and boosted DTs. Please read the paper thoroughly, make sure you understand the underlying concepts and write down questions for the discussion in our next session. Try answer the following questions:

  • What is the motivation for this article?
  • What are the key differences between the presented classification algorithms from a user perspective?
  • What do the following terms refer to: “overfitting”, “imbalanced data”, “ensemble classifiers”, “parameter optimization”?


Copyright © 2020 Humboldt-Universität zu Berlin. Department of Geography.