Rasters in R

Learning goals

  • Read and write multispectral satellite images
  • Manipulate the spatial extent of a raster
  • Extract a cell value and visualize a spectral profile


The raster package

The raster package offers useful classes and functions which facilitate raster data handling in R. It allows us to access file characteristics before loading data into memory, manipulate coordinate reference systems and spatial extents. We can use it to perform raster algebra, to combine raster and vector datasets (e.g. ESRI shapefiles), or to convert raster files into matrices, which are compatible with the base functions we use to subset data, access image statistics, develop models, and much more.

There are a couple of things that the raster package does not provide. For example, collecting training and validation data are preferably done in a GIS environment (e.g. QGIS). Furthermore, processing large data volumes in R can be quite time-consuming (we often use Python instead - its syntax is quite similar to R). Advanced visualization of spatial datasets can be done in R using additional packages (e.g., ggmap, tmap, mapview) but is not (yet) covered in this course.


1) Welcome to the raster world

Today, you will make use of R’s raster package classes and functions which are well described in the package documentation. Install the raster package (like any other package) as shown below. All dependencies will automatically be installed, but you may need to install rgdal manually.

# Install the raster and rgdal packages

# Load the package

Use the package documentation to get acquainted with the following classes and functions and take notes on what they are useful for: - raster(), stack() - extent(), CRS(), projectRaster(), crop() - extract(), plotRGB(), writeRaster()

2) Reading data

Navigate to one of the folders containing an unpacked Landsat image. Did you take a close look at the Landsat file naming convention? Practically, it provides some basic meta-information. For instance, LC08_L1TP_189025_20140310_20170425_01_T1 is a sequence of information on the sensor (LC08), processing level (L1TP), WRS path and row (189/025), acquisition date (20140310), processing date (20170425), collection (01), and collection tier (T1), separated by ’_’.

As you can see, the Landsat images are delivered as single-band files. The single bands should be stacked for further analyses. For stacking, all input files must have matching extents and the identical projection. Create a stack() for each of the two Landsat 8 images.

Important: Please include only the following bands: blue, green, red, near infrared, shortwave infrared 1, shortwave infrared 2 (in this order). Check the list of Landsat spectral bands for a recap. Always keep the band designations in mind, as this can cause confusion, e.g. when combining Landsat 5 and Landsat 8 data.

Band designations for Landsat satellites

Make extensive use of the functions paste0() and list.files() (seek help page to read about the pattern, recursive, and full.names arguments). Try to create the two stacks with a minimum amount of code. Who can make it under five lines?

3) Manipulating data

Investigate the stack. In which projection is the data delivered?

Compare the extent of the two images. You will notice that they vary. Trying to stack images of different extent will cause an error message claiming:

image.stack <- stack(image.one, image.two)
Error in compareRaster(x) : different extent

To stack both images, we need to crop (i.e. clip, or cut) the images to their common extent. Find an efficient way to identify the common extent of the images, defined as common.extent <- c(xmin, xmax, ymin, ymax) (in projected coordinates).

Identifying the common extent of several images

The common extent of the two images is relatively large. In order to reduce the amount of data for the next steps, please crop() the images to our region of interest, a part of the Western Beskids, defined by roi.extent <- c(327945, 380325, 5472105, 5521095)

4) Writing data

Write the cropped stacks to your folder using writeRaster(). Use the GTiff format. Additionally, read about the available datatypes in R, and discuss which one is appropriate for this dataset in your group. Why is this important?

5) Plotting image data

Open the cropped images in QGIS. Seek the symbology to create a true-color (red, green, blue) and a false-color representation (e.g., RGB: swIR1, nIR, red) of each image. Make sure to properly consider the order of bands in your stack (1 = blue, 2 = green, 3 = red, 4 = nIR, 5 = swIR 1, 6 = swIR 2) in relation to your computer screen´s color channels (1 = red, 2 = green, 3 = blue).

Use the plotRGB() function in R to create a false-color visualization of the images in R. You may need to specify the stretch argument.

False color visualization (RGB: nIR, red, green)

6) Extracting spectral profiles

Use the extract() function to get spectral profiles from the stack containing both images.

# Define coordinate (units = m)
coordinate <- data.frame('x' = 355623, 'y' = 5486216) 

# Extract cell values from image_stack
cellvals <- extract(image_stack, coordinate)

7) Plotting a spectral profile

We can visualize the results in R. You may want to create a plot of the two spectral profiles (one for each image in the stack), while accounting for the wavelengths and acquisition dates. ggplot2 offers a nice syntax for such tasks. Read and execute the code below.

# Load ggplot2 package (or install if needed)

# Create a data frame combining extracted values with 
# band wavelengths and acquisition date 
profile <- data.frame('wavelength'=rep(c(482, 561, 655, 865, 1609, 2201),2), 
                      'date'=as.factor(c(rep('10 March', 6), rep('10 July', 6))),


# Plot spectral profiles as line plot, grouped by factor date
ggplot(profile, aes(x=wavelength, y=values, group=date)) + 
      geom_line(aes(color=date)) + 
      # Add axis & legend names
      scale_y_continuous(name='DN') + 
      scale_x_continuous(name='Wavelength (nm)') + 
      scale_colour_manual(name='Acquisition DOY', values=c('darkgreen', 'brown')) +
      theme_bw() # Chose black/white layout

##    wavelength     date values
## 1         482 10 March   7311
## 2         561 10 March   6601
## 3         655 10 March   6347
## 4         865 10 March   8061
## 5        1609 10 March   7897
## 6        2201 10 March   6766
## 7         482  10 July   8596
## 8         561  10 July   7660
## 9         655  10 July   6576
## 10        865  10 July  16698
## 11       1609  10 July   9971
## 12       2201  10 July   6935

Can you guess what type of surface we are looking at?

8) Visualizing data

We can visualize the results in R. You may want to create a plot of the two spectral profiles (one for each image in the stack), while accounting for the wavelengths and acquisition dates. ggplot2 offers a nice syntax for such tasks. Read and execute the code below.

Reading materials

In the next session, we would like to discuss the following paper:

Zhu, Z., & Woodcock, C.E. (2012). Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment, 118, 83-94.

This is a rather technical reading, which introduces the Fmask algorithm for automated cloud and cloud shadow detection. It has been widely used for cloud detection on Landsat TM and ETM+ data, and was enhanced for the use with Landsat OLI and Sentinel 2 data (documented in Zhu et al. 2015 and Qiu et al. 2019).

While reading focus on the following questions:

  • Why do we need automated cloud masking?
  • Where are the limitations?
  • How will different error types impact further analyses?

Copyright © 2020 Humboldt-Universität zu Berlin. Department of Geography.