Accuracy assessment and area estimation


Learning goals

  • Improve your understanding of area-adjusted accuracy assessment
  • Assess the map errors of your forest type classification
  • Calculate error-adjusted class area estimates

Background

Independent map accuracy assessment aims at getting an unbiased estimate of map accuracy. Generally, we identify a number of sample locations and compare the predicted class in the map with “true” class, which we determine using high quality reference information (e.g., very high resolution imagery or field surveys). The following section requires understanding of the basic concepts of map accuracy assessment, i.e. the confusion matrix, overall, producer’s, and user’s accuracy. Read more here if you don´t know what these are.

Forest type map with stratified random sample


Why is area-adjustment necessary?

Sample-based estimates of map accuracy are statistical estimates - we infer information about the total population (i.e. all pixels in the map) from a sub-population (i.e. the sample pixels). In order to get a statistically sound estimate of map accuracy, we need to pay attention to the way our sample locations are selected. Sample selection should be random and all pixels should have an inclusion probability greater than zero.

Typical sampling schemes include pure random, systematic random, and stratified random sampling. When is it useful to use stratified sampling over a systematic or pure random sample? Imagine a map forest loss in an area dominated by forest, where deforestation accounts for roughly 0.2% of the area. In order to get a sufficiently large sample (let´s assume 60 samples) for the deforestation class using pure random sampling, we would approximately need to sample 30,000 locations. Stratified random sample helps us to overcome this issue by precisely allocating the number of desired samples within each class (or region) of interest.

Illustration of pure random, systematic, and stratified random sampling designs. Source: gsp.humboldt.edu

Let´s bring this into the context of our study site. A map of our study region reveals imbalances in class extent, e.g., the deciduous forest class accounts for only 7% of the study area, while coniferous forests account for 40%.

Class name Proportion (\(w_i\))
Deciduous forest \(0.07\)
Mixed forest \(0.25\)
Coniferous forest \(0.40\)
Non-forest \(0.28\)

Stratified random sampling helps us to validate the map using a sufficient amount of samples for each for the four classes, let´s say 50. Below, you find a confusion matrix of the map validation. Rows here relate to the map classes \(i\) and columns indicate reference classes \(j\). Note that the sum of samples per class differs quite substantially between map (\(n_i\)) and reference (\(n_j\)).

Class name Deciduous forest Mixed forest Coniferous forest Non-forest \(n_i\)
Deciduous forest \(39\) \(5\) \(1\) \(5\) \(50\)
Mixed forest \(15\) \(19\) \(10\) \(6\) \(50\)
Coniferous forest \(0\) \(7\) \(39\) \(4\) \(50\)
Non-forest \(11\) \(1\) \(4\) \(34\) \(50\)
\(n_j\) \(65\) \(32\) \(54\) \(49\) \(200\)

From this table, we can derive the unadjusted accuracy values, such as the overall accuracy as follows:

\(OA_{unadjusted} = \frac{(39 + 19 + 39 + 34)}{200} = 0.655 = 65.5\%\).

The above estimate of overall accuracy implicitly weighs each class according to the number of correctly classified samples in it. The more samples, the higher the influence on the overall accuracy score. The unadjusted overall accuracy is thus independent of the true area proportions of the map classes.

Furthermore, the unadjusted overall accuracy does not account for the sampling bias introduced by the stratified sampling. You can imagine this bias as varying sampling densities between the map classes. Let´s assume our map contains a total of 10,000 pixels, we have mapped 700 pixels as deciduous forest and 4,000 pixels as coniferous forest. By verifying 50 samples of each class, we assessed \(\frac{50}{700} = 0.071 = 7.1\%\) of the deciduous forest class pixels, while only \(\frac{50}{4,000} = 0.013 = 1.3\%\) of the coniferous forest class were investigated. The other way around, one sample of the deciduous forest class is considered representative for \(\frac{700}{50} = 14\) pixels, while one sample of the coniferous forest class represents \(\frac{4,000}{50} = 80\) pixels.

An assumption: The samples drawn from the deciduous forest stratum represent a much smaller area in our map (7%), as compared to the coniferous forest class (40%). In terms of overall accuracy, this means that the 39 correctly classified pixels of the deciduous forest class are probably less “relevant” than the 39 correctly classified pixels in the coniferous class. According to the numbers above, one sample of coniferous forest shold thus “weigh” \(\frac{80}{14} = 5.7\) times as much as a deciduous forest pixel. We have to account for this bias when estimating map accuracies from a stratified random sample.


How to do it?

This can be accomplished by populating the confusion matrix with probabilities of encountering the combination of map class \(i\) and reference class \(j\), expressed as:

\(p_{ij} = \frac{n_{ij}}{n_i} * w_i\)

\(n_{ij}\) = Number of pixels belonging to map class \(i\) and reference class \(j\)

\(n_i\) = Total number of pixels of map class \(i\)

\(w_i\) = Proportion of map class \(i\)

In the resulting matrix, each cell value represents the probability of occurrence of map class \(i\) and reference class \(j\).

Class name Deciduous forest Mixed forest Coniferous forest Non-forest
Deciduous forest \(\frac{39}{50}0.07\) \(\frac{5}{50}0.07\) \(\frac{1}{50}0.07\) \(\frac{5}{50}0.07\)
Mixed forest \(\frac{15}{50}0.25\) \(\frac{19}{50}0.25\) \(\frac{10}{50}0.25\) \(\frac{6}{50}0.25\)
Coniferous forest \(\frac{0}{50}0.40\) \(\frac{7}{50}0.40\) \(\frac{39}{50}0.40\) \(\frac{4}{50}0.40\)
Non-forest \(\frac{11}{50}0.28\) \(\frac{1}{50}0.28\) \(\frac{4}{50}0.28\) \(\frac{34}{50}0.28\)

Which yields the confusion matrix populated with probabilities:

Class name Deciduous forest Mixed forest Coniferous forest Non-forest \(\sum p_{i}\)
Deciduous forest \(0.055\) \(0.007\) \(0.001\) \(0.007\) \(0.070\)
Mixed forest \(0.075\) \(0.095\) \(0.050\) \(0.030\) \(0.250\)
Coniferous forest \(0.000\) \(0.056\) \(0.312\) \(0.032\) \(0.400\)
Non-forest \(0.062\) \(0.006\) \(0.022\) \(0.190\) \(0.280\)
\(\sum p_{j}\) \(0.192\) \(0.164\) \(0.385\) \(0.259\) \(1.000\)

This table sums to 1 and the row sums correspond to the map class proportions from above. We can now confirm our assumption from above. The 39 correct pixels of the deciduous forest class represent a smaller map proportion as compared to the 39 correctly classified pixels of the coniferous forest class (5.5% vs. 31.2%). The coniferous forest samples are therefore more relevant for the area-adjusted overall map accuracy than the deciduous forest samples. More precisely, they are \(\frac{0.312}{0.055} = 5.7\) times more important.

Based on this probabilistic confusion matrix, we can calculate the overall and class-wise accuracy scores as usual, while implicitly accounting for different class proportions and the sampling bias.

\(OA_{adjusted} = 0.055 + 0.095 + 0.312 + 0.190 = 0.652\)

We can notice that our overall accuracy did not change a lot after area-adjustment. Now let´s have a look at the user´s and producer´s accuracy of the deciduous forest class. First, we calculate the unadjusted accuracy scores:

\(UA_{unadjusted} = \frac{39}{50} = 0.78 = 78\%\)

\(PA_{unadjusted} = \frac{39}{65} = 0.60 = 60\%\)

Let´s do the same using the probability matrix:

\(UA_{adjusted} = \frac{0.055}{0.070} = 0.78 = 78\%\)

\(PA_{adjusted} = \frac{0.055}{0.192} = 0.29 = 29\%\)

We can see that the user´s accuracy stays the same after adjustment. This makes sense, as we do not incorporate the sampling bias when considering only samples drawn from one map stratum. For the producer´s accuracy, we combine information from samples across all strata. We hence automatically consider the different class proportions and sampling densities when calculating the producer´s accuracy from the confusion matrix populated with probabilities. The producer´s accuracy is relatively low after adjustment, owing to the fact that we likely omit a lot of deciduous forest, which we falsely classified as mixed forest or non-forest.


Error-adjusted area estimates

There are several methods to obtain class-wise area estimates. They can either be derived directly from the map by multiplying the map´s class proportions with the total size of the study area. Assuming a study area of \(S = 10,000 ha\), our area estimates for deciduous forest would thereby be:

\(area_{DF} = w_{i=DF} * S\)

\(area_{DF} = 0.070 * 10,000 ha = 700 ha\)

However, we already know that the reference data at hand is of better quality as compared to our map. Therefore class-area should preferably be estimated from the reference data:

\(area_{DF} = \sum_{p_{j=DF}}^i * S\)

\(area_{DF} = 0.192 * 10,000 ha = 1,920 ha\)

So the unbiased area estimate for deciduous forest in our study region is 1,920 ha, as compared to the map-based estimate of 700 ha. Here again, the substantial omission of deciduous forest in our map was taken into account, which in turn largely increases the area estimate for this class.

Wrapping it up, area adjustment can make quite a substantial difference for overall accuracy and producer´s accuracy. User´s accuracy remains unaffected. Producing area estimates directly from the reference data is advised in order to produce more accurate estimations of land cover, land use, or land change.

All procedures which were only briefly described here and are explained in more detail in Olofsson et al. (2014). We recommend this read, as it is documenting the current state-of-the-art in regards to accuracy assessment in high detail. Another source of background information is the technical documentation of the AREA² tools for accuracy assessment. An R implementation of the area adjusted accuracy assessment is provided in the mapac package.


Assignment

In this assignment, you will perform an area-adjusted accuracy assessment of your forest type map and estimate error-adjusted area for each forest type class. In a first step, the accuracy scores will be investigated in an Excel spreadsheet.


1) Producing reference data

Load your forest type map in R. Create a stratified reference sample using spatSample(). We want 1) a stratified sample, 2) spatial points as output and 3) ignore unclassified (NA) regions (if existing). Screen the help page and specify the three function arguments accordingly.

    The next part differs if you work on your own OR if we are in class.
  • If you do this assignment on your own, we want to have a stratified equalized sample of 25 pixels per class. Specify the size parameter accordingly.
  • If we do this assignment together in class, create a stratified equalized sample of 5 pixels per class. We will join forces here, speed up your individual task, and merge our datasets to create a larger validation sample. Now, strictly speaking, we are ‘violating’ a truely independent sampling design for your specific map. Why?

Please also make sure not to include the classified values in your vector dataset. Write the points to disk using writeVector().

Load the point shapefile in QGIS and add a field refID of type Integer. Visit each point and determine the class label according to the high resolution imagery in Google Earth and your Landsat data. Install and use the Send2GE plugin in QGIS to identify the precise pixel location. Enter the class labels in the attribute table. Work efficiently through the points and save your changes regularly.

Class name refID
Deciduous forest 1
Mixed forest 2
Coniferous forest 3
Non-forest 4
    If we do this assignment together in class, and once your are finished, upload your validation .gpkg here.

2) Area-adjusted accuracy assessment

Assess the class proportions of your map in R. To do so, use the freq() function to get pixel counts. Use these for calculating class proportions (0-1). Remember to exclude NAs, so the proportions of your 4 classes sum up to 1.

Load the reference point shapefile in R and create a confusion matrix from the resulting data.frame using table(). Copy & paste the values from your confusion matrix as well as the class proportions into the Excel spreadsheet. Next, answer the following questions:

  1. Which class has the highest / lowest user´s accuracy?
  2. Which class has the highest / lowest producer´s accuracy?
  3. How does the overall accuracy differ after area-adjustment? Why?
  4. How do the map-based area estimates differ from those obtained using the reference data?

3) Knowledge transfer

Implement three basic components of the accuracy assessment in R.

  1. Generate the confusion matrix containing probabilities. In this matrix, each cell value represents the probability of occurrence on map class \(i\) and reference class \(j\):

\[p_{ij} = \frac{n_{ij}}{n_i} * w_i\]

\(n_{ij}\) = Number of pixels belonging to map class \(i\) and reference class \(j\)

\(n_i\) = Total number of pixels of map class \(i\)

\(w_i\) = Proportion of map class \(i\)

  1. Calculate overall accuracy and class-wise user´s and producer´s accuracy from the confusion matrix.

  2. Produce error-adjusted area estimates from the confusion matrix.

To do this, track the underlying formulas of the respective sections in the Excel sheet and transfer them to R. Use helper functions such as diag(), sum() and apply(). Avoid for loops.

Compare the results with the values in the Excel table. If they differ, something went wrong. Check again!



Reading materials

In the next session, we would like to discuss Okujeni et al. (2015).

In this paper, Akpona Okujeni and colleagues use synthetically mixed training data for regression-based estimation of urban land cover fractions from simulated EnMAP data. Please read the article and note questions and points to discuss. While reading, try to focus on the following questions:

  • What are the main methodological steps to map fractional cover as presented in the paper?
  • What are advantages and disadvantages of using synthetic mixtures compared to the use of “traditional” training data from e.g. in-situ data or visual interpretation for machine learning models?
  • Which fractional cover classes differ most between Landsat and EnMAP and why? Beyond the paper: Do you have an idea how to tackle the problem of higher uncertainties from Landsat fractional cover based on the content of the EO course?

Copyright © 2020 Humboldt-Universität zu Berlin. Department of Geography.