Error propagation • mapac

It is good practice to report uncertainties (e.g., standard errors) associated with land cover maps. As such, the mapac package reports standard errors for overall accuracy, and User’s and Producer’s accuracy of individual classifications. However, sometimes it may be desirable to estimate a change in accuracy or combine accuracy (e.g. F1-measure). If the errors that we want to combine are uncorrelated, we can estimate the uncertainties of the combined outcome with a little bit of calculus. If the uncertainty components are correlated, then it is not correct to use the simple error propagation method shown here. The combined uncertainty would then be too small or too large, depending on the sign of the correlation. Now, errors in spatial maps are rarely uncorrelated. However, for the purpose described here, and because of the challenges associated with more complex error propagation methods (e.g. covariance estimates are needed), the independence assumption may be a reasonable way to go forward.

Basic formula

Let’s assume, we want to estimate the uncertainty of $x$ , which itself is a function of $u$ and $v$ .

$x= f(u,v,...)$

Now, the variance of $x$ (denoted as $\sigma_x^2$ ), can be approximated from the variance of $u$ ( $\sigma_u^2$ ) and $v$ ( $\sigma_v^2$ ) using partial derivatives:

$\sigma_x^2 \simeq \sigma_u^2 \left(\frac{\partial x}{\partial u}\right)^2 + \sigma_v^2 \left(\frac{\partial x}{\partial v}\right)^2$

Subtraction (and addition)

This example equally applies to addition. But let’s say we want to estimate the difference between two User’s accuracies, i.e., model #1 ( $u$ ) and model #2 ( $v$ ). In other words, we want to estimate the increase or decrease in User’s accuracy, denoted here as $x$ .

$x = u - v$

To estimate the standard error of $x$ , we need to calculate the partial derivatives, which are 1 and -1, respectively.

$\sigma_x^2 \simeq \sigma_u^2 (1)^2 + \sigma_v^2 (-1)^2$

which reduces to..

$\sigma_x^2 \simeq \sigma_u^2 + \sigma_v^2$

Because these formula apply to variances, we need to take the square root to convert them to standard errors $\sigma$ :

$\sigma_x \simeq \sqrt{\sigma_u^2 + \sigma_v^2}$

Division (and multiplication)

If $x$ is the quotient of $u$ and $v$ ,

$x = \frac{u}{v}$

then the partial derivatives are as follows:

$\sigma_x^2 \simeq \sigma_u^2 \left(\frac{1}{v}\right)^2 + \sigma_v^2 \left(\frac{-u}{v^2}\right)^2$

we can summarize this and divide both sides by $x^2= \left(\frac{u}{v}\right)^2$ :

$\frac{\sigma_x^2}{x^2} \simeq \frac{\sigma_u^2 \frac{1}{v^2}}{u^2/v^2} +\frac{\sigma_v^2 \frac{u^2}{{v^4}}}{u^2/v^2}$

After canceling out terms, we get:

$\frac{\sigma_x^2}{x^2} \simeq \frac{\sigma_u^2}{u^2} +\frac{\sigma_v^2}{v^2}$

again, taking the square root we obtain the general formula for multiplying and dividing standard errors. The big difference to the addition-subtraction example, is that it incorporates $u$ and $v$ . Hence, we sum the relative variances for those parameters.

$\sigma_x \simeq \sqrt{x^2 \left(\frac{\sigma_u^2}{u^2} +\frac{\sigma_v^2}{v^2} \right)}$

F1-score

A more complex example is the formula of the F-score:

$x = 2 \cdot \frac{u \cdot v }{u + v}$

where $x$ is the F-score and $u$ and $v$ are Users’s and Producer’s accuracy, respectively. As a reminder here the partial derivatives that we need to solve:

$\sigma_x^2 \simeq \sigma_u^2 \left(\frac{\partial x}{\partial u}\right)^2 + \sigma_v^2 \left(\frac{\partial x}{\partial v}\right)^2$

Let’s solve the partial derivatives in R. The D() function takes an expression object and the character name of the variable that we want to derive.

f <- expression(2 * u * v / (u + v))
pdu <- D(f, 'u')
pdu
#> 2 * v/(u + v) - 2 * u * v/(u + v)^2

pdv <- D(f, 'v')
pdv
#> 2 * u/(u + v) - 2 * u * v/(u + v)^2

which is:

$\sigma_x^2 \simeq \sigma_u^2 \left( \frac{2v^2}{(u+v)^2} \right)^2 + \sigma_v^2 \left(\frac{2u^2}{(u+v)^2} \right)^2$

and further simplified:

$\sigma_x = \sqrt{4 \cdot \frac{\sigma_u^2 v^4 + \sigma_v^2 u^4}{(u + v)^4}}$

where $\sigma_x$ is the standard error of the F-score, $\sigma_u$ the standard error of the User’s accuracy, and $\sigma_v$ the standard error of the Producer’s accuracy.

library(mapac)

exdata <- aa_examples("stehman2014")
strata <- exdata[["stratum"]]
ref <- exdata[["reference"]]
map <- exdata[["map"]]

result <-  aa_stratified(strata, ref, map, 
                         h=exdata[["h"]], 
                         N_h=exdata[["N_h"]])

result$stats
#>   class        ua     ua_se        pa     pa_se        f1      f1_se
#> 1     A 0.7419355 0.1645627 0.6571429 0.1477318 0.6969697 0.11034620
#> 2     B 0.5744681 0.1248023 0.7941176 0.1165671 0.6666667 0.09354009
#> 3     C 0.5000000 0.2151657 0.3000000 0.1504438 0.3750000 0.13219833
#> 4     D 0.7000000 0.1527525 0.6363636 0.1623242 0.6666667 0.11284328

Let’s use the previous equation to calculate the F1 standard error:

f1_uncertainty <- function(ua, ua_se, pa, pa_se) {
  sqrt( 4*(ua_se^2*pa^4+pa_se^2*ua^4) / (ua+pa)^4 )
}

f1_uncertainty(result$stats$ua, result$stats$ua_se, 
               result$stats$pa, result$stats$pa_se)
#> [1] 0.11034620 0.09354009 0.13219833 0.11284328

References

Drosg, M. (2009). Dealing with Uncertainties: A Guide to Error Analysis. 2nd ed. Edition. Springer. ISBN 978-3-642-01384-3

McRoberts, R.E. (2014). Post-classification approaches to estimating change in forest area using remotely sensed auxiliary data. Remote Sensing of Environment, 151, 149-156