Estimating fractional cover


Learning goals

  • Understand the basic concept of linear spectral unmixing and generation of synthetic training data
  • Use support vector regression (SVR) models to estimate fractional cover of photosynthetic vegetation (PV), non-photosynthetic vegetation (NPV) and soil for agricultural fields in northeastern Germany.
  • Evaluate fractional cover estimates from Sentinel-2 with reference fractional cover

Background

The mixed pixel problem

In this course, you have already learned how to make use of multispectral data to generate land cover maps. However, you may have noticed that often a pixel contains more than one cover type. This happens when the land cover is heterogeneous on the sub-pixel scale. Classification approaches assign discrete labels to a pixel and can thus be of limited use for estimating the true distribution of a surface material. In many cases (but very much depending on the training dataset), the dominant class label will be assigned by the classification model. We thereby underestimate the proportion(s) of the minority class(es), while overestimating the proportion of the dominant class.

A good example are urban areas, where e.g. rooftops, streets and single trees fall within a single Sentinel-2 or Landsat pixel. On agricultural fields, pixels often contain mixtures of photosynthetically active vegetation, dead plant material (e.g. litter or standing dead biomass) and open soil. As the cover quantities vary over time on croplands, we can use fractional cover to better understand the vegetation changes in response to e.g. land management and weather extremes.

Fig. by Ali et al. (2016) DOI: 10.1093/jpe/rtw005

With fraction mapping we attempt to map the fraction of a variety of classes within a pixel. This can either be done by using spectral mixture analysis or regression approaches.


Spectral unmixing with regression models

Instead of assigning discrete class labels (as with classification methods previously explored), regression models predict a continuous, ordered target variable (e.g. fractional cover) based on a set of predictor variables (e.g. surface reflectance). We will therefore use regression-based unmixing in this assignment to model fractional vegetation and soil cover in an agricultural system. A variety of machine learning models are suitable for this task, for example, Random Forest Regression, Support Vector Regression or Kernel Ridge Regression. We will focus on the use of Support Vector Regression in this assignment.


Support Vector Regression

Support Vector Regression (SVR) can be used efficiently to solve regression problems. Rather than using all training samples, SVR models rely only on a subset of the training samples (‘support vectors’) which are closest to the hyperplane to be defined. Unlike Support Vector Classification, however, SVR ignores samples that are inside the error margin of \(\epsilon\), because the predicted value of these samples is close to their actual value. We can think about the aim of a SVR as fitting as many training data points inside the \(\epsilon\) insensitive “tube” as possible. However, samples outside this tube will still occur. By introducing slack variables \(\zeta\), the constant, \(C\), controls the trade-off between the flatness of the model function and the deviations from \(\epsilon\) which will still be tolerated but penalized (linearly in the figure below).

Figure by Smola and Schölkopf (2003): A Tutorial on Support Vector Regression

Like Support Vector Classification, SVR can also use the same principle for solving non-linear problems by transforming the data into higher dimensional data space (i.e. the kernel trick). This makes it a very flexible regression method.


Synthetic training data

We need representative data to train the regression models. Obtaining representative continuous training data for several cover types is often expensive and sometimes not feasible at all for the time and area of interest. Therefore, instead of using e.g. training data collected in the field, we can create synthetic training data. Based on the assumption for linear spectral unmixing, we know that the proportion of the spectral signature of a certain class in a mixed pixel corresponds to its cover fraction on the ground. We can therefore create the spectral signature of any cover fraction between 0 and 100% using the pure (i.e. 100%) spectral signature of the class. These pure spectra are often called “endmembers”. Endmembers can be spectral measurements collected in the field or the laboratory, then compiled in a spectral library (e.g. the USGS HRSL, or they can be extracted from the satellite data directly (image endmembers).

These are examples of spectral libraries from multispectral and hyperspectral images for photosynthetic vegetation (PV), non-photosynthetic vegetation (NPV), soil and shade:

By assigning weights between 0 and 1 (i.e. 0-100% cover) and combining the weighted pure spectra linearly, training samples for the regression models can be generated:

Linear mixtures of the two Sentinel-2 library spectra PV (target class) and NPV with randomly assigned weights under the constraint of summing to 1.


Evaluation of quantitative model predictions

To evaluate the results of the regression model, we need independent validation data. There a variety of sources and methods to obtain validation data for fractional cover. Some examples are:

  • field data collection (e.g. point sampling)
  • visual analysis of high resolution data (e.g. Google Earth)
  • image segmentation of, e.g., datasets from Unmanned Aerial Vehicles (UAV)
  • regression-based unmixing of datasets with sufficient spatial and spectral resolution (e.g. hyperspectral images)

For model evaluation, it is often helpful to plot the observed and predicted values in form of a scatterplot. Ideally, we expect a perfect linear relation of observed and predicted values. Therefore, assessing the results of a linear regression of the predicted and observed values provides the information for evaluating the agreement of the observed and predicted data.

Several measures of agreement can be calculated from predicted and observed data:

Regression slope and intercept

\[ y_i = ß_0 + ß_1\hat{y_i} + \epsilon \] where \(y_i\) is the observed value, \(\hat{y_i}\) is the predicted value, \(ß_0\) is the intercept, \(ß_1\) is the slope and \(\epsilon\) is the random error term.

  • for the slope, \(ß_1\), a value close to 1 indicates that observed and predicted values vary consistently within their range
  • the intercept, \(ß_0\), can only be interpreted as the bias when the slope is 1.

Coefficient of determination

  • describes the percentage of the variance in the observed values explained by the variance of the predicted values
  • \(R^2\) is also the square of the Pearson correlation coefficient R in the case of simple linear regression with one predictor. \(R^2\) is easier to interpret than Pearsons correlation coefficient, e.g., \(R^2 = 0.8\) explains twice as much variance as \(R^2=0.4\)

\[ R^2 = 1- \frac{ \sum_{i=1}^n (y_i- \hat{y_i})^2}{\sum_{i=1}^n (y_i - \overline{y_i})^2} \] where \(y_i\) is the observed value, \(\hat{y_i}\) is the predicted value and \(\overline{y}\) is the mean of the observed values. The numerator in this formula is the sum of squares residual (SSR) and the denominator is the sum of squares total (SST):

Mean Absolute Error

  • describes the average error of the predicted values without considering the error direction
  • modeling unit is preserved

\[ MAE = \frac{1}n \sum_{i=1}^n |y_i - \hat{y_i}| \]

Root Mean Square Error

  • is the mean deviation of predicted values with respect to the observed values
  • modeling unit is preserved
  • by squaring the error, RMSE penalizes higher deviations stronger than the MAE i.e. it is always higher than (in rare cases equal to) the MAE

\[ RMSE = \sqrt{\frac{1}n \sum_{i=1}^n(y_i - \hat{y}_i)^2}\]

Prediction bias

  • is the systematic error of the predicted values with regards to the observed data

\[ Bias = \frac{1}n \sum_{i=1}^n y_i - \frac{1}n \sum_{i=1}^n \hat{y_i}\]

Note: When creating a scatterplot of observed and predicted data and evaluating the linear regression function, the observed values correspond to the y-axis and the predicted values to the x-axis. Interchanging these positions will lead to incorrect slope and intercept values.


Assignment

In this assignment, you will learn how you can use SynthMix and Support Vector Regression to obtain fractional vegetation and soil cover for crop and grassland fields in northeastern Germany from 10m Sentinel-2 data. You will also learn how to quantitatively evaluate your fractional cover estimates with a reference dataset.

Here you can download the data needed for completing the assignment.

The zip-file contains these files:

  • Sentinel-2 image, BOA reflectance, cloud-masked, date of acquisition: 26.07.2019
  • Spectral library with pure spectra of PV, NPV and soil from the Sentinel-2 image

Please install and load the packages needed for this assignment. We use three packages from the tidyverse (ggplot2, tidyr and dplyr). These packages are very useful for handling and manipulating your data in R. If you are interested, you can find a very short overview of the capabilities of the tidyverse packages here.

pck_list <- c("assertthat","ggplot2", "tidyr", "dplyr","raster", "e1071", "rgdal", "viridisLite")
lapply(pck_list, require, character.only = TRUE)

1) Linear spectral mixtures

  • Load the spectral library containing the three class spectra of PV, NPV and soil which we already extracted from the Sentinel-2 image on 26th July 2019. BOA reflectance was scaled by 10000.
  • Visualize the spectral signatures of PV, NPV, soil and shade. For a better visualization, you might use the central wavelengths of the Sentinel-2 bands (find them here).
  • Comment in your script: How do the spectra differ between the classes and why?

We consider these spectra as representative of the three target classes of PV, NPV, and soil and add a shade spectrum to account for variation in brightness caused by, e.g. high vegetation.

# Let´s now model a "mixed pixel". 
# We can simply do this be defining proportions of the different surface components.
fraction_PV <- 0.1
fraction_NPV <- 0.8
fraction_soil <- 0.05
fraction_shade <- 0.05

# Do we violate the assumption that all surface components represent 100% of the surface area?
if((fraction_PV + fraction_NPV + fraction_soil+ fraction_shade) != 1) print('Fractions don´t sum to 1.')

# Create a linear mixture of the endmember spectra, based on the defined proportions.
model_spectrum <- fraction_PV * sli$PV + 
                  fraction_NPV * sli$NPV +
                  fraction_soil * sli$soil + 
                  fraction_shade * sli$shade

# We could simulate imperfect measurements by adding random noise.
noise <- rnorm(10, mean=0, sd=0.02)

# Append the modeled spectrum to the endmembers data.frame
sli$model_spectrum <- model_spectrum + noise

# Convert the spectra into long format for plotting with ggplot2
sli_vis <- pivot_longer(sli, -c(band, cwl)) 

# Visualize the modeled spectrum in comparison to the endmember spectra
ggplot(sli_vis) + 
  geom_line(aes(x=cwl, y=value, color=name, linetype=name))+
  scale_color_manual(values=c("black", "steelblue", "darkgreen", "darkgrey","firebrick"), name="Spectrum")+
  scale_linetype_manual(values=c("dotdash", rep("solid", 4)), name="Spectrum")+
  scale_y_continuous(labels=seq(0,1,0.1), breaks=seq(0,10000,1000))+
  theme_bw()+
  theme(panel.grid=element_blank())+
  labs(x="Wavelength (nm)", y="Reflectance")

2) Generate synthetic training data using the spectral library

  • You can use the function below to generate synthetic training data. First, try to follow the steps of the function. It was specifically made for this assignment in R and does not have all features for creating synthetic mixtures with full flexibility (e.g. multiple library spectra per class, within class mixtures etc.). The full functionality is available via the EnMAP Box plugin in QGIS and Python.

  • For each of the three target classes, use the synthMix function to create an individual data.frame with 1,000 linear mixtures. Use a 0.5/0.5 proportion of 2- and 3-endmember mixtures. The four pure library spectra are added automatically to the final set of spectra.

synthMix <- function(
           sli, # spectral library dataframe containing 
                # spectra (columns) and reflectances (rows)
           n_mix, # desired number of mixtures
           mix_complexity, # numbers of spectra within a mixture
                           # vector, same length as mix_likelihood
           mix_likelihood, # proportion of mixtures with respective complexity as
                           # vector, same length as mix_complexity, must sum to 1
           target_class, # mixtures will be calculated for this class
           other_classes){ # other classes which should be included in the mixtures

    assert_that(sum(mix_likelihood)==1) 

    # total number of classes
    em_total <- length(c(target_class, other_classes)) 

    # empty dataframe to store training data
    ds_training <- setNames(data.frame(matrix(ncol = nrow(sli) + 1, nrow = 0)), 
                            c(paste("B", c(1:nrow(sli)), sep = ""), "fraction")) 

    # iterator for generating each synthetic mixture 
    for (i in 1:n_mix) { 
      
      if(length(mix_likelihood) ==1){
        n_em = mix_complexity
      }
      else{
      # sample mixing complexity based on mixing likelihoods
      n_em = sample(as.vector(mix_complexity), 
                    size = 1, 
                    prob = as.vector(mix_likelihood)) 
      }
      # select EMs which will be included in the mixture
      sample_em <- sample(other_classes, n_em - 1) 

      # get target EM and mixing EMs from SLI
      df <- sli[, c(target_class, sample_em)] 

      #randomly sample weight for target endmember including 0
      w1 <- (sample.int(10001, size = 1) - 1) 
      
      # calculate weight for remaining endmember 
      if (n_em == 2) {
        w2 <- 10000 - w1
        ws = c(w1, w2)
        
        assert_that(sum(ws)==10000)

      }

      # sample weights for two other endmembers
      if (n_em == 3) { 
        # remaining weight
        wr = (10000 - w1) 
        # randomly sample weight for second endmember including 0
        w2 = (sample.int(wr + 1, size = 1) - 1) 
        w3 = 10000 - (w1 + w2)
        ws = c(w1, w2, w3)
        
        w_sum <- sum(ws)

        assert_that(w_sum==10000)

      }
      # scale weights between 0 and 1
      ws <- ws/10000 
      # multiply spectra by their weight and 
      # calculate mixed spectrum as sum of weighted reflectances
      mixture = rowSums(t(t(df) * ws)) 

      # add weight (i.e. the cover fraction)
      train_mix <- c(mixture, ws[1]) 
      # turn  into dataframe
      train_mix <- as.data.frame(t(train_mix)) 
      colnames(train_mix) <- c(paste("B", c(1:nrow(sli)), sep = ""), "fraction") 

      # add mixed spectrum to training dataset
      ds_training <- rbind(ds_training, train_mix) 
      
    }
    # add original library spectra to the training data
    # vector of target class spectrum and cover fraction of 1
    em_target <- c(sli[, target_class], 1) 
    
    # all other class spectra get a cover value of 0 for the target class 
    em_rest <- data.frame((t(sli[, other_classes])), 
                          "fraction"=rep(0,length(other_classes))) 

    em_set <- rbind(em_target, em_rest)
    rownames(em_set) <- NULL
    colnames(em_set) <- c(paste("B", c(1:nrow(sli)), sep=""),"fraction")

    ds_training <- rbind(ds_training, em_set)

    return(ds_training)

  }

3) Modeling fractional cover

  • Train a support vector regression model for the first target class using the synthetic training data. Try the svm() function from the e1071 package. The function infers the model type from the type of the data and sets several default parameters. What do the parameters mean?

  • Support Vector Machines are sensitive to the values of these parameters. To get good results, it is necessary to tune the model by testing different parameter combinations. Use the tune.svm() function to perform a grid search with 10-fold cross validation for the parameters cost and gamma. It is advised to vary the parameters between 10^-3 and 10^3 with logarithmic steps (i.e. 0.001, 0.01, 0.1 etc.). The best parameter set is internally determined by the function using the mean squared error. Set the epsilon parameter to 0.001.

  • Extract the best model from the resulting object with $best.model. Use predict() to estimate the fractional cover of each class for the Sentinel-2 image. The model will expect predictors named identical to the input data you used for training (i.e. ‘Bxx’). Rename the bands of the Sentinel-2 image (here s2) with names(s2) <- paste("B", seq(1,10), sep="").

  • Stack and export the fractional cover predictions with writeRaster. Visualize the fractional cover as RGB color combination (e.g. soil, PV, NPV) in QGIS. Remember to set the stretch from 0-1. Based on this, can you identify fields in different stages of the crop phenology?


4) Evaluation of fractional cover

  • Load the shapefile containing the validation points and the corresponding fractional cover values. Extract the values of the Sentinel-2 predictions.
  • The values of the validation data range between 0 and 1. Set negative values of the predicted values to 0 and super-positive values (i.e. > 1) to 1.
  • Create a scatterplot of predicted and observed fractional cover values for each class.
  • Calculate \(R^2\), \(MAE\), \(RMSE\) and slope and intercept of the simple linear regression and add them to your plot. How do they differ between PV, NPV and soil?
# You can use this function to calculate the simple linear regression coefficients 
# of observed and predicted data and annotate it in your plots
lm_eqn <- function(pred_x,obs_y,df){
  m <- lm(obs_y ~ pred_x, df);
  eq <- substitute(y ==  a + b * x,
                   list(a = format(as.numeric(round(coef(m)[1]/100,2))),
                        b = format(as.numeric(round(coef(m)[2],2)))))
  as.character(as.expression(eq));
}

# For a better visualization, you can use this function to calculate the point
# density and display it as the color in your scatterplots:
get_density <- function(x, y,...) { # set x and y to predicted and observed values and n = 100
  dens <- MASS::kde2d(x, y, ...)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}


Copyright © 2020 Humboldt-Universität zu Berlin. Department of Geography.