Error propagation
error_propagation.Rmd
It is good practice to report uncertainties (e.g., standard errors)
associated with land cover maps. As such, the mapac
package
reports standard errors for overall accuracy, and User’s and Producer’s
accuracy of individual classifications. However, sometimes it may be
desirable to estimate a change in accuracy or combine accuracy
(e.g. F1-measure). If the errors that we want to combine are
uncorrelated, we can estimate the uncertainties of the combined outcome
with a little bit of calculus. If the uncertainty components are
correlated, then it is not correct to use the simple error propagation
method shown here. The combined uncertainty would then be too small or
too large, depending on the sign of the correlation. Now, errors in
spatial maps are rarely uncorrelated. However, for the purpose described
here, and because of the challenges associated with more complex error
propagation methods (e.g. covariance estimates are needed), the
independence assumption may be a reasonable way to go forward.
Basic formula
Let’s assume, we want to estimate the uncertainty of , which itself is a function of and .
Now, the variance of (denoted as ), can be approximated from the variance of () and () using partial derivatives:
Subtraction (and addition)
This example equally applies to addition. But let’s say we want to estimate the difference between two User’s accuracies, i.e., model #1 () and model #2 (). In other words, we want to estimate the increase or decrease in User’s accuracy, denoted here as .
To estimate the standard error of , we need to calculate the partial derivatives, which are 1 and -1, respectively.
which reduces to..
Because these formula apply to variances, we need to take the square root to convert them to standard errors :
Division (and multiplication)
If is the quotient of and ,
then the partial derivatives are as follows:
we can summarize this and divide both sides by :
After canceling out terms, we get:
again, taking the square root we obtain the general formula for multiplying and dividing standard errors. The big difference to the addition-subtraction example, is that it incorporates and . Hence, we sum the relative variances for those parameters.
F1-score
A more complex example is the formula of the F-score:
where is the F-score and and are Users’s and Producer’s accuracy, respectively. As a reminder here the partial derivatives that we need to solve:
Let’s solve the partial derivatives in R. The D()
function takes an expression
object and the character name
of the variable that we want to derive.
f <- expression(2 * u * v / (u + v))
pdu <- D(f, 'u')
pdu
#> 2 * v/(u + v) - 2 * u * v/(u + v)^2
pdv <- D(f, 'v')
pdv
#> 2 * u/(u + v) - 2 * u * v/(u + v)^2
which is:
and further simplified:
where is the standard error of the F-score, the standard error of the User’s accuracy, and the standard error of the Producer’s accuracy.
library(mapac)
exdata <- aa_examples("stehman2014")
strata <- exdata[["stratum"]]
ref <- exdata[["reference"]]
map <- exdata[["map"]]
result <- aa_stratified(strata, ref, map,
h=exdata[["h"]],
N_h=exdata[["N_h"]])
result$stats
#> class ua ua_se pa pa_se f1 f1_se
#> 1 A 0.7419355 0.1645627 0.6571429 0.1477318 0.6969697 0.11034620
#> 2 B 0.5744681 0.1248023 0.7941176 0.1165671 0.6666667 0.09354009
#> 3 C 0.5000000 0.2151657 0.3000000 0.1504438 0.3750000 0.13219833
#> 4 D 0.7000000 0.1527525 0.6363636 0.1623242 0.6666667 0.11284328
Let’s use the previous equation to calculate the F1 standard error:
f1_uncertainty <- function(ua, ua_se, pa, pa_se) {
sqrt( 4*(ua_se^2*pa^4+pa_se^2*ua^4) / (ua+pa)^4 )
}
f1_uncertainty(result$stats$ua, result$stats$ua_se,
result$stats$pa, result$stats$pa_se)
#> [1] 0.11034620 0.09354009 0.13219833 0.11284328
References
Drosg, M. (2009). Dealing with Uncertainties: A Guide to Error Analysis. 2nd ed. Edition. Springer. ISBN 978-3-642-01384-3
McRoberts, R.E. (2014). Post-classification approaches to estimating change in forest area using remotely sensed auxiliary data. Remote Sensing of Environment, 151, 149-156