# Data quality

## Learning goals

• Interpret the Landsat Collection 1 quality bands

## Background

### Landsat data quality

Think about the following question and exchange with your team members: Which issues affect the data quality of optical satellite images?

How can you find out if a pixel is affected by any of these issues? Luckily, the USGS provides quality bands for the Landsat Collection 1 products.

If we look up the Landsat QA band website to find out what this band contains, we find the following statement:

“Each pixel in the QA band contains unsigned integers that represent bit-packed combinations of surface, atmospheric, and sensor conditions that can affect the overall usefulness of a given pixel.”

In the following, you will learn to understand what that means. Please read this section carefully and discuss with your team if necessary.

### The binary system

Do you know the binary numeral system (0 / FALSE or 1 / TRUE)?

The binary system represents numbers through sequences of bits. Each bit has a position and a state. The positions are numbered from 0 to the total number of bits, read from right to left. The state of a bit can be either, 0, or 1. Consider the meaning of 0 as “FALSE” or “NO”, 1 equals “TRUE” or “YES”.

Additionally, each bit position is assigned a value represented as 2^n, where n is the bit position. So for bit position 0, we have a value of 2^0 = 1, for bit position 1 we have a value of 2^1 = 2, and so on. When extracting values from the entire bit sequence, we can simply add together all bit positions which have a state of 1.

In the example above, five bit positions are given. We can see that different combinations of states and positions enable us to represent any integer number, limited only by the number of bits in the sequence. Following this example with 5 bits, we can for instance represent these numbers:

# 00001
0 * 2^4 + 0 * 2^3 + 0 * 2^2 + 0 * 2^1 + 1 * 2^0
## [1] 1
# 00111
0 * 2^4 + 0 * 2^3 + 1 * 2^2 + 1 * 2^1 + 1 * 2^0
## [1] 7
# 11111
1 * 2^4 + 1 * 2^3 + 1 * 2^2 + 1 * 2^1 + 1 * 2^0
## [1] 31

The more bits in a sequence we have, the higher the numbers we can generate. With 5 bits, we can represent the numbers 0 – 31 (i.e., 32 unique values), with 16 bits, we can represent the numbers 0 - 65,535 (i.e., 65,536 unique values).

Such a system can be used to assign digital numbers or reflectance values to a pixel in a raster dataset. Alternatively, bit positions can be assigned any meaning a user likes. For example, letters can be assigned based on their position in the alphabet, as seen with the encoded message on the Perseverance rover’s parachute during its Mars landing.

In the next section, we will see how bit positions in the Landsat Quality Assessment (QA) band can be used to determine a pixel’s quality based on a number of different criteria.

### Landsat QA band

The Landsat QA band is coded in 16 bit, which are arranged from right to left. Each bit has a specific meaning in terms of data quality, e.g., bit 4 (or the fifth from the right) means “cloud”. If bit 4 has the state 1 (or “YES”, or “TRUE”), there was a cloud detected in that pixel. If it has the state 0 (or “NO”, or “FALSE”), there were no clouds detected in that pixel.

In the Landsat QA band, there are additionally single and double bits. Single bits inform about one condition in a binary manner:

Single bits (bit 0, 1, and 4):

• 0 = “No” = The condition does not exist.
• 1 = “Yes” = The condition does exist.

Double bits can represent conditions in more detail. Some double bits (bits 5+6, 7+8, 9+10, 11+12, read from left to right) represent levels of confidence that a condition exists:

• 00 = “Not Determined” = Algorithm did not test for the condition.
• 01 = “Low” = Algorithm has low confidence that this condition exists (probability of 0-33%).
• 10 = “Medium” = Algorithm has medium confidence that this condition exists (probability of 34-66%).
• 11 = “High” = Algorithm has high confidence that this condition exists (probability of 67-100%).

The radiometric saturation bits (2-3), read from left to right, represent how many bands contain radiometric saturation:

• 00 - No bands contain saturation.
• 01 - 1-2 bands contain saturation.
• 10 - 3-4 bands contain saturation.
• 11 - 5 or more bands contain saturation.

In addition to the meaning assigned to each bit position, the entire bit sequence of the QA band can be read as an integer value just like any of the reflectance bands. For example, take a pixel where bit 4 has the state 1 and all other bits have the state 0. This indicates a cloudy pixel, but the resulting integer value of the entire bit sequence (0000000000010000) will be 2^4 = 16. If the pixel also has more than five saturated bands, bits 2-3 will read “11”. Thus the integer value of the new bit sequence (0000000000011100) will be 2^4 + 2^3 + 2^2 = 28. Following this logic, different combinations of true/false conditions will result in different integer values.

Please note that bit designations vary between sensors and data products. See for instance table 6.2 in the Landsat 8 Collection 1 Level-2 product guide. If you work with Landsat Collection 2 data in the future, also note that some of the QA bit designations have changed.

### De-coding quality band values

Let´s look at the integer value 2804 to illustrate how this works. R offers a good way of dealing with bit-packed information. The intToBits() function converts integer numbers into sequences of bits, whereas 32 double-digit bits are returned by default.

# Convert integer to bit sequence
intToBits(2804)
##  [1] 00 00 01 00 01 01 01 01 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00
# Convert the double-digit into single-digit bits
as.numeric(intToBits(2804))
##  [1] 0 0 1 0 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# We´re looking at 16 bit data, so let´s deprecate the unused bits
as.numeric(intToBits(2804)[1:16])
##  [1] 0 0 1 0 1 1 1 1 0 1 0 1 0 0 0 0
# Bit sequences are read from right to left, so we need to reverse the order
rev(as.numeric(intToBits(2804)[1:16]))
##  [1] 0 0 0 0 1 0 1 0 1 1 1 1 0 1 0 0

Now we can compare with the Landsat 8 Collection 1 Level 1 QA band bit designation. What´s the pixel quality information of the integer value 2804?

Solution: We are looking at a cloud pixel with radiometric saturation affecting 1-2 bands.

## Assignment

### 1) Investigating the QA band

Let´s have a look at the Landsat Collection 1 Quality Band (BQA). First, load the BQA band with raster() and find the three most frequent values using freq().

As you can see, the band contains integer values. By decoding the integer values into 16 bit binary strings, we can read the quality information for each pixel. Use the intToBits() function to decode the three most frequent BQA values and decipher their meaning using the Landsat quality band documentation. As shown above, intToBits() returns 32 bits by default. Make sure you only look at bits 1 to 16. Also, keep in mind the right-to-left order when comparing the decoded bits with the table. You might want to use rev() to invert the order of the outputs by intToBits(). Note your findings as a comment in the script:

# What do the most frequent values mean? Decode each integer value into bits and
# describe their meaning here:

# Most frequent value:
# Second most frequent value:
# Third most frequent value: 

### 2) Creating a cloud mask

By converting integer values into binary bits, we can extract specific attributes from the BQA. Let´s to this in a systematic manner by defining a function, which yields TRUE for binary codes with bit 0 = 1:

# Define function to find fill values from Landsat BQA
fill_pixels <- function(x) {intToBits(x)[1] == T}

Next, use indexing and Boolean expressions to define new functions which return TRUE for:

1. high confidence clouds or high confidence cloud shadows or fill values
2. high and medium confidence clouds or high and medium confidence cloud shadows or fill values

Create a mask using the above functions in calc(). Plot the mask and check if clouds, cloud shadows, and fill values are labeled as 1 and clear observations as 0. Discuss and decide on the appropriate data type for binary masks and write them to disk.

Open both masks in QGIS, together with an RGB representation of the image. Which mask is more reliable in your opinion?

Next, load the BOA data as a stack (VIS, nIR, swIR) and use the mask() function to mask clouds, cloud shadows and fill values from the image. Use the mask which you found to be more accurate. Make sure to specify the maskvalue argument accordingly. Write the masked BOA stack to disk in the GTiff format.