In the last session you were introduced to raster data, heard about some basic properties and data formats, and learned how objects appear in true- and false-colour representations. Our overarching task remains. Yet before we can jump straight into mapping and analyzing the urban greenness of Berlin, we must first …
Last week you learned that raster data stored in image formats are
essentially a collection of pixels stored along rows and columns in
matrices or tables. But how do we deal with such data in R? The
terra
package offers useful classes and functions which
facilitate raster data handling in R. It allows us to access file
characteristics before loading data into memory, manipulate coordinate
reference systems and spatial extents. We can use it to perform raster
algebra, to combine raster and vector datasets (e.g. ESRI shapefiles),
or to convert raster files into matrices, which then become compatible
with base-R functions to manipulate data, do statistics, develop models,
and much more. The terra
package supersedes the older
raster
package developed by the same team of authors.
Because raster
has been around for a longer time,
dependencies with some packages may still exist. But development of
raster
will eventually be stopped at some point.
terra
’s classes and functions are well described in the package
documentation and on the package’s
webpage.
The most important classes within the package are:
SpatRaster
to store raster dataSpatVector
to store spatial vector data, i.e., points,
lines, polygons.SpatExtent
to store information on spatial extentsThere are a couple of things that the terra package does not provide or that are simply better done in a GUI like QGIS. For example, collecting training and validation data are preferably done in a GIS environment and processing large data volumes is preferably achieved in other languages than R.
terra
worldWith your R console opened in RStudio you can install and then load the terra package like any other package in R as follows:
# install the terra package
install.packages('terra')
# load the package to get access to the package's routines/functions in your environment
library(terra)
Once the library is loaded we can start reading geospatial data into
R. Let’s start with last week’s Landsat-8 image covering Berlin. We need
to provide the filepath to the image and then use terra’s
rast()
-function to read the file into a
SpatRaster
object.
# create a new string variable containing the path to your image file
path_landsat <- "data/Landsat-8/LC08_L1TP_193023_20190726_20200827_02_T1_int16.tif"
# read the file into raster object
img_landsat <- rast(path_landsat)
# print metadata to console
print(img_landsat)
## class : SpatRaster
## dimensions : 1840, 2171, 6 (nrow, ncol, nlyr)
## resolution : 30.00639, 30.00728 (x, y)
## extent : 360793.1, 425937, 5794470, 5849683 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633)
## source : LC08_L1TP_193023_20190726_20200827_02_T1_int16.tif
## names : BLU, GRN, RED, NIR, SW1, SW2
## min values : 639, 371, 227, 60, 23, 16
## max values : 5603, 5984, 6728, 7454, 11152, 8946
You can see that the print(img_landsat)
command prints a
bunch of image properties to the console, including min-max statistics
for each band.
Next, we can visualize our image in the plot window of RStudio using
the plot()
function:
plot(img_landsat)
By default, this creates a single plot for each band without contrast
enhancement and using a single-band pseudocolour
representation. If we would like to create a multi-band RGB composite,
we can use plotRGB
to achieve this.
Exercise 1
help(plotRGB)
or ?plotRGB
into the console to
learn how to use this function, i.e. the arguments of the
function. Hint: you need 5 arguments of the function and the
index, i.e. the position of each band along the z-dimension of
the image. Below you find the desired visual result.
plotRGB(img_landsat, r=6, g=4, b=3, stretch = "lin", axes=TRUE)
Recall our overarching goal to map the land cover of Berlin to assess the urban greenness of the city. By considering different data sources, a colleague of yours came to the conclusion to use Sentinel-2 data for the recent land cover map. The reasoning is that Sentinel-2 compared to Landsat-8 offers up to 10 meter spatial resolution (compared to the 30 meter Landsat data), allowing for capturing a more heterogenous mosaic of objects such as in an urban environment.
Exercise 2
The Sentinel-2 data is already available to you
FE1_data_S02-03 > Sentinel-2
. Let’s start with a visual
assessment in QGIS:
.tif
files of your choice (a window is
likely to pop-up, asking which coordinate transformation to apply. Leave
at default and click OK)In the following we will learn about 1) creating a multiband raster, 2) write/save data to disc, 3) choosing datatypes, 4) manipulating pixel values (raster algebra) and 5) reprojecting.
The Sentinel-2 data is currently stored in single band images for each band. Our goal is now to create a multiband image - like the Landsat image - that contains all the bands in a single file. This process of merging bands into a multiband image is frequently called raster stacking.
There are a few prerequesites to successfully merge single bands into one image. First, the data must share the same spatial resolution because pixels must match each other. Second, the spatial extent of the images must be identical because also the overall dimensions must also match each other.
Let us take a look how to create a raster stack in R. We will see -
as often with programming tasks - there are multiple ways of achieving
the same result. Analogous to the code from above we could create a
SpatRaster
object by providing the filepath of each band to
the rast()
function:
# Create a vector using c() containing the filepaths;
# note that we explicitly consider the order, i.e. we start with blue and
# end with shortwave-infrared 2 (index = shorter to longer wavelengths)
l_bands <- c(
'data/Sentinel-2/SEN2A_BOA_20190726_BLU_flt32.tif', 'data/Sentinel-2/SEN2A_BOA_20190726_GRN_flt32.tif',
'data/Sentinel-2/SEN2A_BOA_20190726_RED_flt32.tif', 'data/Sentinel-2/SEN2A_BOA_20190726_NIR_flt32.tif',
'data/Sentinel-2/SEN2A_BOA_20190726_SW1_flt32.tif', 'data/Sentinel-2/SEN2A_BOA_20190726_SW2_flt32.tif'
)
# provide vector of filenames to rast() function; check "?rast" for details
img_s2 <- rast(l_bands)
# print result to console
print(img_s2)
## class : SpatRaster
## dimensions : 4977, 6279, 6 (nrow, ncol, nlyr)
## resolution : 10, 10 (x, y)
## extent : 4524946, 4587736, 3248660, 3298430 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035)
## sources : SEN2A_BOA_20190726_BLU_flt32.tif
## SEN2A_BOA_20190726_GRN_flt32.tif
## SEN2A_BOA_20190726_RED_flt32.tif
## ... and 3 more source(s)
## names : SEN2A~flt32, SEN2A~flt32, SEN2A~flt32, SEN2A~flt32, SEN2A~flt32, SEN2A~flt32
## min values : ? , ? , ? , 0.0065, 0.0056, ?
## max values : ? , ? , ? , 0.9975, 0.9874, ?
l_bands
containing the
relative paths from the current working directory to the
individual files. Here we make sure the order of the bands matches the
order we would like them to be in the final image, i.e., starting with
shorter wavelengths to longer wavelengths or from blue to short-wave
infrared. Once the vector is defined we can immediately read the
single-band images into a multi-band
SpatRaster
by providing the vector l_bands
to
the rast()
function. Using help(rast)
specifies for its argument x
: filename (character),
missing, SpatRaster, SpatRasterDataset, SpatExtent, SpatVector, matrix,
array, list of SpatRaster objects. For other types it will be attempted
to create a SpatRaster via (‘as(x, “SpatRaster”)’
Now that we created a raster stack we would like to save it to disc,
since it is currently only stored in memory. We can do this using the
writeRaster
function.
Exercise 3
Take a look at the writeRaster
help page. There are two
mandatory arguments which the function expects. Try writing your raster
stack into your Sentinel-2 folder and choose a meaningful filename.
# provide the necessary arguments to save your image to disc
writeRaster()
Once your image is stored on disc, open it in QGIS. Select the same
visualization settings as for the Landsat image and compare pixel values
using the “Identify features” button at
the top panel and clicking somewhere into the map view. If you
right-click, select “Identify All” to inspect the values of both images.
According to the function’s help page we need at least two arguments:
x
and filename
, i.e. the image we would like
to save (x) and the associated filename(+path). We can use the same
basename as the individual files but swap the band naming to
STACK. In case the file already exists, and we would like to
overwrite it, we can specify overwrite = TRUE
.
# provide the necessary arguments to save your image to disc
writeRaster(x = img_s2, filename = "data/Sentinel-2/SEN2A_BOA_20190726_STACK_flt32.tif",
overwrite = TRUE)
We can also access the image matrix or single values directly in R. Recall the image properties from last session:
Below you find some terra
functions to access some of
these properties and particularily how to access individual bands or
pixels inside an image:
As you can see from the above figure, you can imagine the image as a
three-dimensional array in which each pixel has a xyz-coordinate or
index which we can access using square brackets []
on our
raster object. For example, to retrieve the values from the red band at
row = 1000 and column = 500 we would write:
# raster[[zdim / band]][ydim / row, xdim / column]
img_s2[[3]][1000, 500]
## SEN2A_BOA_20190726_RED_flt32
## 1 0.0369
img_landsat[[3]][1000, 500]
## RED
## 1 541
[[3]][1000, 500]
not point to the same place on
earth?
Note how the Landsat data is stored as whole numbers and Sentinel as floating point number. It is time to revisit the data types from last week.
Recall from just now that we quantized the radiometric resolution of a sensor in bits:
If the radiometric resolution is high, smaller and more subtle differences in reflected electromagnetic radiation can be differentiated by the sensor. The measured energy is then converted and stored as digital numbers, the pixel values in the imagery. The range of unique values that can be stored in each pixel is determined by the data type, thus the radiometric resolution greatly governs the needed data type to store the data. The unit in which the radiometric resolution and data type are measured is bits, i.e. short for binary digit.
The binary system
The binary system represents numbers
through sequences of bits (binary
digits). Each bit has a position and a
state. The positions are numbered from 0 to the total
number of bits, usually read from right to left. The state of an
individual bit can either be 0 (FALSE) or 1 (TRUE). From an electric
circuits perspective, electricity is either off (0) or on (1). Consider
the meaning of 0 as “FALSE” or “NO”, 1 equals “TRUE” or “YES”.
This is the very fundamental logic on which all software is currently
built upon. This is because we can represent all sort of concepts in
binary, including - of course - numbers and letters. A number in binary
is a number expressed in the base-2 numeral system (= 0
and 1, instead of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 in decimal). Just like in
the decimal system, however, if we want to represent quantities beyond
the amount of unique digits available (0-1 for binary, 0-9 for decimal),
we ‘flip over’ to an additional position, re-starting the count on the
previous position (in decimal this would be
1, 2, ..., 9, -> flip -> 10, 11, ..., 99 -> flip -> 100, 101, ..., 999 -> flip -> 1000, ...
).
Hence if we run out of symbols, we add a new position.
Let us image we start counting from 0 to 10 in binary:
-> 0
-> 1
- now we already run out of symbols, so for
the next count we add a position-> 10
-> 11
- again, we add a position for the next
count-> 100
-> 101
-> 110
-> 111
- again, we add a position for the next
count-> 1000
Since a binary system operates in base 2, each bit position is assigned a unique value defined by \(2^{n}\), where n is the bit position starting at 0. So for bit position 0, we have a value of \(2^{0} = 1\), for bit position 1 we have a value of \(2^{1} = 2\), for bit position 2 we have a value of \(2^{2} = 4\), and so on. When extracting values from the entire bit sequence, we can simply add together all bit positions which have a state of 1 (TRUE). The visualization of the described looks as follows:
In the example above, five bit positions are given, hence the quantization is 5-bit. We can see that different combinations of states and positions enable us to represent any integer number, limited only by the number of bits in the sequence.
The more bits in a sequence we have, the higher the numbers we can generate. With 5 bits, we can represent the numbers 0–31 (i.e., 32 unique values), with 16 bits, we can represent the numbers 0 - 65,535 (i.e., 65,536 unique values).
Common data types of digital remote sensing imagery
Data type (name) | Typical bit length | Value range | R datatype |
---|---|---|---|
byte / int8 | 1 byte = 8 bit | -128 to 127 (signed); 0 to 255 (unsigned) | 'INT1S' ; 'INT1U' |
int16 | 2 byte = 16 bit | -32768 to 32767 (signed); 0 to 65535 (unsigned) | 'INT2S' ; 'INT2U' |
float | 4 byte = 32 bit | -3,4e^38 to 3,4e^38, ‘single precision’; floating point number according to IEEE 754 | 'FLT4S' |
double | 8 byte = 64 bit | -Inf to +Inf, ‘double precision’ | 'FLT8S' |
The data types byte and int16 store whole numbers while float and double store floating-point values, i.e., values that have potential decimal places. For instance, if you wanted to store the number 3.14 in integer format, it would be in the potential range of values for both types, but the decimal positions would be truncated and therefore only a value of 3 would be stored.
Now you may ask why we would bother about choosing the datatype at all, if there are obvious choices such as float / double that can capture a large range of unique values and also allow for decimal storage? Well, a greater bit length (8bit vs. 32bit) also means we need to store more information on our data storage device, possibly without needing to, because the information could be stored more efficiently with a smaller bit length.
The
file size or memory needed to store an image can be
calculated as follows:
file size = number-of-bands \(\times\) number-of-columns \(\times\) numbers-of-rows \(\times\) bit-length
Consider what the above means for the Sentinel-2 data. Our image has the following dimensions:
dim(img_s2)
## [1] 4977 6279 6
The pixel values are currently stored as 32bit floating point numbers. The uncompressed filesize in megabyte is thus approx. equal to:
# n_pixels * bit width
n_pixels <- prod(dim(img_s2))
bytes <- 4 # 32 bit integer
print( paste0( "Size (Mb) 32bit: ", round( n_pixels * bytes / 1024**2, digits=2 ) ) )
## [1] "Size (Mb) 32bit: 715.27"
If we can somehow change the data as 16bit integer, the file size reduces to:
# n_pixels * bit width
n_pixels <- prod(dim(img_s2))
bytes <- 2 # 16 bit integer
print( paste0( "Size (Mb) 16bit: ", round( n_pixels * bytes / 1024**2, digits=2 ) ) )
## [1] "Size (Mb) 16bit: 357.63"
With growing amount of images, it is inevitably useful to think about datatypes and use tricks to store data more efficiently. Often we convert data from float (% reflectance between 0-1) to integer by simply multiplying by a constant value of e.g. 10,000, where a value of, for instance, 0.353 (35.5%) becomes 3530, allowing for an int16 data type. This way we can represent the decimal reflectance numbers as integer and save a lot of memory on our drives and improving computing performance.
The task to save disc space gives us a good reason to learn how to change pixel values of raster objects. Recall from the R Basics section that vector/array arithmetic works element-wise:
a_vector <- c(1, 2, 3, 4, 5)
print(a_vector * 10)
## [1] 10 20 30 40 50
You can add, subtract, multiply or divide a constant from every element in the array.
We can also use these operations on multiple arrays of the same size. For example multiplying two arrays together, the first element of the first array is multiplied by the first element of the second array, and so on.
When we apply an operation on our SpatRaster
, the
terra
package performs the matrix operations for us in the
back, meaning operations on SpatRaster
objects work
element-wise, too.
datatype
” in the table above showing “common data types of
digital remote sensing imagery”, and select the appropriate datatype
that is needed to save the data. Either index or plot your result to
check if it worked. Then, check ?writeRaster
and and write
your raster to disc with the appropriate datatype. Last, compare the two
raster stacks in your folder in terms of their filesize.
We know from the website and the “R Basics” section that similar to
vectors, the matrices containing the data in our SpatRaster
can be modified using simple arithmetic. We need an integer
compatible value range, thus move from our reflectance values
stored between 0 and 1 to whole numbers. From the table above we can
infer that we have two potential options here: 8bit integer
(INT1S
/INT1U
) and 16bit integer
(INT2S
/INT2U
). If we used INT1U
we could transfer our values to a range from 0 to 100. However, this
would disregard any sensitivity (radiometrically) beyond the second
decimal place. Since this is unreasonable for data measured originally
in 12bit (radiometric resolution of Sentinel-2), we opt for
INT2U
instead, thus transfer our values to a range from 0
to 10000. Whilst we are at it, we can also change the bandnames of the
raster using names()
.
# we can define a new variable of overwrite the existing
img_s2 <- img_s2 * 10000
# rename bands (not in task description)
names(img_s2) <- c("BLU", "GRN", "RED", "NIR", "SW1", "SW2")
# plot bands to check value range
plot(img_s2)
From help(writeRaster)
we can infer that there is a
datatype
argument allowing us to explicitly define the
datatype when writing to disc.
writeRaster(
x = img_s2,
filename = "data/Sentinel-2/SEN2A_BOA_20190726_stack_int16.tif",
datatype = "INT2U",
overwrite = TRUE
)
Our final toolset for today will deal with
reprojections and spatial subsetting.
Above you already had a look at berlin_10m_epsg32633.tif
delineating Berlin as our study area.
When you compare the Coordinate Reference System
(CRS) of the mask and Sentinel image, you notice they
are different. We thus need to reproject the Sentinel-2 image to match
the projection and grid of berlin_10m_epsg32633.tif
. The
following figure depicts a zoom-in comparing the 10x10m
target grid of berlin_10m_epsg32633.tif
shown in red, to
the 10x10m
grid of your Sentinel-2 image.
We would like to transform the Sentinel-2 onto the red grid. But since the grids obviously do not match perfectly, how do we compute the values in the new grid? In technical terms this process is called resampling. We must choose a resampling method which defines the way the values in the output raster will be computed from the input raster. The three most commonly used techniques are Nearest Neighbour, Bilinear Interpolation and Cubic Convolution:
In today’s homework assignment you will…
Reproject image
Reproject your 16bit integer Sentinel-2
image using berlin_10m_epsg32633.tif
as reference image for
projection, extent and resolution. To do this, make use of terra’s
project()
function. Seek the help page
(?project
) and identify the correct values for the
arguments x and y. Create two separate results, once
using 'near'
and once using 'cubic'
for the
method argument. Print your result to console and check if the
extent and dimensions match as expected. Last, write both results to
disc.
Assess results in QGIS
Compare the results for both
resampling methods in QGIS. Zoom in and out to search for
differences/patterns. Considering the above theory on resampling
methods, your visual assessment, and if necessary additional resources
online: Which method is more appropriate? To exemplify your words,
include at least one screenshot in your .Rmd
file using the
following syntax:
![Figure desription](relative/path/to/figure.png){width=100%}
In the lecture you have learned about the influence of the atmosphere on the measured signal. You heard that we try to eliminate the atmosphere and calculate the surface reflectance as if there was no atmosphere. Data that is not atmospherically corrected is refered to as Top-Of-Atmosphere (TOA) reflectance. Atmospherically corrected data represents surface conditions and is thus called Bottom-Of-Atmsophere (BOA).
In our case, we have worked with two different products: The Landsat-8 data is TOA reflectance, and the Sentinel-2 data is BOA reflectance.
With both datasets opened in QGIS, use the the “Identify features” at the top-panel to right-click selected pixels and select “Identify All”. Change the view from “Table” to “Graph” to see the spectral profiles for both images on top of each other.
Now select different pixels and compare the spectral profiles of the two images to each other. Identify an exemplary pixel for 1) woody vegetation (e.g. forests), 2) built-up and 3) water, and provide a screenshot for each of the profiles. For each profile then describe the general pattern of reflectance in the different wavelengths and identify the most notable differences between TOA and BOA. Based on what you learned in the lecture, explain the most prominent difference between TOA and BOA.
Document your answers to the questions above with bullet points
and the help of your screenshots in your .Rmd
document
knitted to PDF.
Upload your documentation as a PDF file on Moodle.
General submission notes: Submission deadline for the weekly assignment is always the following Monday at 10am. Please use the naming convention indicating session number and family name of all students in the respective team, e.g. ‘s01_surname1_surname2_surname3_surname4.pdf’. Each team member has to upload the assignment individually. Provide single file submissions, in case you have to submit multiple files, create a *.zip archive.
Copyright © 2023 Humboldt-Universität zu Berlin. Department of Geography.