Trend analysis

Trend analysis#

Packages#

import ee
import geemap
import pandas as pd
import numpy as np
import geopy_ts_fun as geopy

try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

Set up map window#

In this exercise, we are going to map trends. Hence, we must define a region, a coordinate reference system crs, and a scale(pixel size) or affine transformation crsTransform. These parameters are usually needed at the end of a script, i.e., when you export or visualize your results. However, it is useful to define them up-front in the script.

region = ee.Geometry.Polygon(
        [[[12.901495385988614, 53.14672196059798],
          [12.901495385988614, 52.850010520689125],
          [13.426092553957364, 52.850010520689125],
          [13.426092553957364, 53.14672196059798]]])
scale = 20
crs = 'EPSG:3035'

Map = geemap.Map()
Map.centerObject(region, zoom=11)
# Map.addLayerControl() creates issues
Map

Visualization parameters#

# Sentinel-2 surface reflectance
s2Vis = {"bands": ["B4", "B3", "B2"], "min":0, "max": 1200}

# NDVI color palette
ndviVis = {"palette": ['white', 'green'], "min":0.2, "max": 1}

Sentinel-2 data#

The code block below filters the Sentinel-2 collection, masks clouds and calculats NDVI. The cloud masking approach uses the AI-based Cloud Score+ available in the Cloud Score+ Harmonized-S2 dataset. The Sentinel-2 harmonized dataset fixes different scaling factors in the original Sentinel-2 dataset.

dateFilter = ee.Filter.And(ee.Filter.dayOfYear(200, 365), 
                           ee.Filter.date(ee.Date.fromYMD(2018, 1, 1), 
                                          ee.Date.fromYMD(2021, 12, 31)))

# Harmonized Sentinel-2 Level 2A collection.
s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')

# Cloud Score+ image collection. Note Cloud Score+ is produced from Sentinel-2
# Level 1C data and can be applied to either L1C or L2A collections.
csPlus = ee.ImageCollection('GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED')

# Use 'cs' or 'cs_cdf', depending on your use case; see docs for guidance.
QA_BAND = 'cs_cdf'

# The threshold for masking; values between 0.50 and 0.65 generally work well.
# Higher values will remove thin clouds, haze & cirrus shadows.
CLEAR_THRESHOLD = 0.60

def mask_s2(image):
    return image.updateMask(image.select(QA_BAND).gte(CLEAR_THRESHOLD))

s2m = s2.filterBounds(region) \
        .filter(dateFilter) \
        .linkCollection(csPlus, [QA_BAND]) \
        .map(mask_s2)

# Create NDVI Collection from masked Sentinel-2 data
ndviS2 = s2m.map(geopy.calcNdviS2)

Harmonic function#

In this exercise, I fit a first-order harmonic model of the form:

\(\hat{y}(t) = \beta_0 + \beta_1 t + \beta_2 cos(2\pi \frac{t}{T}) + \beta_3 sin(2\pi \frac{t}{T})\)

The harmonic model is fitted to every pixel, i.e., we will estimate the parameters \(\beta_0\), \(\beta_1\), \(\beta_2\), and \(\beta_3\) for every pixel. The \(\beta\)´s are parameters for \(t\) and different transformations of \(t\), where \(t\) is time in days or (fractional) years and \(T\) is the period. The period \(T\) describes how many \(t\) it takes to complete a full harmonic cylce. For example, \(T\) is 1 if \(t\) is in years and 365 if \(t\) is in days. The model estimates \(\hat{y}\), which is the NDVI in our case.

The first step is to assemble the response (NDVI) and predictor variables. The function addPredictorVariables() adds four bands to an NDVI image. The four bands are the intercept and the predictors, i.e., \(t\) and its harmonic tranformations. The band constant has the pixel value 1 and is required to fit an intercept. The band time, associated with parameter \(\beta_1\), is time \(t\) expressed in years since January 1st 1970. It is a fractional value with decimal places. Now, the predictor variables do not vary for a specific NDVI image, i.e., \(t\) is constant for a specific image. However, recall that we fit time series to individual pixels. You also see that we can create an image with a scalar value simply by calling ee.Image(scalar). Earth engine will expand the scalar value in space as needed, i.e., defined by the output region.

def addPredictorVariables(image):
    
    date = ee.Date(image.get('system:time_start'))
    time = date.difference(ee.Date('1970-01-01'), 'year')
    sin = time.multiply(2 * np.pi ).sin()
    cos = time.multiply(2 * np.pi ).cos()
    
    predictors = ee.Image([sin, cos, time, 1]).float()

    return predictors.addBands(image.select(['ndvi'])) \
                     .rename(['sin', 'cos', 'time', 'constant', 'ndvi']) \
                     .set('system:time_start', image.get('system:time_start'))

trainingCollection = ndviS2.map(addPredictorVariables)

Now, we can estimate the \(\beta\)´s using linear regression. The reducer is not aware of variable names and their order in the image. It assumes the last variable to be the response variable. So, we make sure that NDVI (responseName) is selected last.

The output of the regression reduction (trend) contains two bands: coefficients and residuals. However, trend is an array image and not a standard image. An array image can store an entire vector/array in each pixel not just a scalar value. For example, an array image can store a six-band image in a single band, where each pixel would contain a vector of length six. However, pixels in an array image do not have to store vectors of the same length. In our case, the coefficients band stores the coefficients in a 4x1 array in each pixel. To turn this into a four band image, we need to apply arrayProject (removes the second axis of length 1) and arrayFlatten (puts every array element into its own band).

responseName = ee.String('ndvi')
predictorNames = ee.List(['constant', 'time', 'cos', 'sin'])

trend = trainingCollection.select(predictorNames.add(responseName)) \
            .reduce(ee.Reducer.linearRegression(predictorNames.length(), 1))

coefficients = trend.select('coefficients') \
                    .arrayProject([0]) \
                    .arrayFlatten([predictorNames])

Visualizing the coefficients reveals different patterns in seasonality and trend (time) for different land cover types. The harmonic model compresse the NDVI time series into a few parameters that still represent the spectral-temporal dynamics. NOTE: the visualization of the map can take a while. The pixelwise trend fitting is processing intenstive. Keep the zoom level steady. The process restarts every time you zoom in or out.

Map.addLayer(coefficients.select(['sin']), {"min": -0.23, "max":0.1}, 'sin')
Map.addLayer(coefficients.select(['cos']), {"min": -0.23, "max":0.1}, 'cos')
Map.addLayer(coefficients.select(['time']), {"min": -0.03, "max":0.03, "palette": ['red', 'white', 'green']}, 'time')

Fit model#

We can obtain the fitted NDVI values to better understand how our models fits the data. To do this, we need to write a funciton that takes a training image (containg NDVI, \(t\) and the harmonic terms) and applies the estimated coefficients. We then map that function to the training collection to obtain a collection of fitted NDVI images.

def linearFit(image):
    fit = image.select(predictorNames) \
               .multiply(coefficients) \
               .reduce('sum') \
               .rename('fitted')
    return image.addBands(fit)

fitted = trainingCollection.map(linearFit)

Extract points#

pt_dbf = ee.Geometry.Point([13.18496707443565,53.06189771226909])
pt_ag1 = ee.Geometry.Point([13.134970698398053, 53.06835763719942])
pt_ag2 = ee.Geometry.Point([13.047938380282819, 53.04421500010952])
pt_ag3 = ee.Geometry.Point([13.0798673963961, 53.010351264908145])

ee_vals = ndviS2.getRegion(geometry=pt_dbf, scale=scale, crs=crs)
vals = ee.data.computeValue(ee_vals)
df = pd.DataFrame(vals[1:], columns=vals[0]).dropna()
df['date'] = pd.to_datetime(df.id.str.slice(0, 8))
df.index  = df.date
df.head()

	id	longitude	latitude	time	ndvi	date
date
2018-07-21	20180721T102021_20180721T102024_T32UQD	13.18485	53.061957	1532168424461	0.883285	2018-07-21
2018-07-21	20180721T102021_20180721T102024_T33UUU	13.18485	53.061957	1532168424461	0.879113	2018-07-21
2018-07-23	20180723T101019_20180723T101036_T32UQD	13.18485	53.061957	1532340636891	0.903789	2018-07-23
2018-07-23	20180723T101019_20180723T101036_T33UUU	13.18485	53.061957	1532340636891	0.909469	2018-07-23
2018-07-26	20180726T102019_20180726T102150_T32UQD	13.18485	53.061957	1532600510652	0.886746	2018-07-26

The dataset may contain duplicate values due to the overlapping Sentinel-2 tiles. That is, there should only be one observation for a given day. So, let’s drop duplicate values.

df = df.drop_duplicates(subset=['date'])
df.head()

	id	longitude	latitude	time	ndvi	date
date
2018-07-21	20180721T102021_20180721T102024_T32UQD	13.18485	53.061957	1532168424461	0.883285	2018-07-21
2018-07-23	20180723T101019_20180723T101036_T32UQD	13.18485	53.061957	1532340636891	0.903789	2018-07-23
2018-07-26	20180726T102019_20180726T102150_T32UQD	13.18485	53.061957	1532600510652	0.886746	2018-07-26
2018-07-31	20180731T102021_20180731T102024_T32UQD	13.18485	53.061957	1533032424462	0.914293	2018-07-31
2018-08-07	20180807T101021_20180807T101024_T32UQD	13.18485	53.061957	1533636624457	0.902853	2018-08-07

Now we can plot column ndvi against the dataframe index, which is date.

df.ndvi.plot(style='.', figsize=(5,3));

../../_images/a35bd09035f3158347613acff1008a7dbc4008d0c0ebea78320ec327cb248e00.png

Extract coefficients#

We can extract the coefficients of the harmonic model for a single pixel using a reducer. Coefficients is an ee.Image, which has the sampleRegions reducer. You may get an exception “User memory limit exceeded”. To avoid this error, you can limit the processed area size or save intermediate results to your assets.

# returns feature collection
ee_coef = coefficients.sampleRegions(collection=pt_dbf, scale=scale, 
                                     projection=crs)

# extracts feature collection (server-side) to pandas dataFrame (client-side)
pt_coef = ee.data.computeFeatures({'expression': ee_coef, 'fileFormat': 'PANDAS_DATAFRAME'})
pt_coef.drop(columns="geo", inplace=True)
pt_coef

	constant	cos	sin	time
0	0.723115	-0.162878	-0.213892	-0.002603

Extract fitted values#

Let’s extract the fitted NDVI values for a single pixel. Note, fitted is an ee.ImageCollection, which we can reduce with getRegion(). The output is a list that we can turn into a Pandas DataFrame for further analysis.

ee_fit = fitted.getRegion(geometry=pt_dbf, scale=scale, crs=crs)
pt_fit = ee.data.computeValue(ee_fit)

dfit = pd.DataFrame(pt_fit[1:], columns=pt_fit[0]).dropna() # create DataFrame and drop NA
dfit["date"] = pd.to_datetime(dfit.id.str.slice(0, 8)) # extract date from tile-id
dfit.index = dfit.date # put date into index to facilitate plotting
dfit = dfit.drop_duplicates(subset=['date']) # EE fits to duplicate values
dfit = dfit.iloc[:, np.delete(np.arange(dfit.shape[1]), 4)] # drop first 'time' column
dfit.iloc[:,4:].head(3)

	cos	time	constant	ndvi	fitted	date
date
2018-07-21	-0.946117	48.552486	1	0.883285	0.820100	2018-07-21
2018-07-23	-0.934459	48.557941	1	0.903789	0.825084	2018-07-23
2018-07-26	-0.914793	48.566177	1	0.886746	0.832095	2018-07-26

Visualize extracted values#

dfit[["ndvi", "fitted"]].plot(style=['.', 'x'], figsize=(6,4));

../../_images/707054f20d5e2085cd291cbbd306beac0cc6a5dd438897354567043bbf77f222.png

Make predictions#

def fit_harmonic(t, constant, time, sin, cos, period=1):
    
    result = constant + time*t + \
                sin*np.sin(2.0 * np.pi * (t / period)) + \
                cos*np.cos(2.0 * np.pi * (t / period))
    return result

def fit_trend(t, constant, time, sin, cos ):
    result = constant + time*t
    return result
    

# date difference in days
t = pd.date_range('2018-01-01', '2021-12-31', freq='D') 

# date difference in yeas
td = (t - pd.to_datetime('1970-01-01')) / np.timedelta64(1, 'D') / 365.25

# convert params stored in data frame to dictionary
params = pt_coef.to_dict(orient='records')[0]

# the result is a pandas.core.indexes.numeric.Float64Index
# we can extract the values of t with t.values
fitted_harm = fit_harmonic(td.values, **params)

fitted_trend = fit_trend(td.values, **params)

dfm = pd.DataFrame({'ndvi_harm':fitted_harm, 
                    'ndvi_trend': fitted_trend}, index=t)

ax = dfit[["ndvi", "fitted"]].plot(style=['.', 'x'], figsize=(6,4))
dfm.plot(ax=ax);

../../_images/a0e1955d4fa3d481b75fac19804a7cf9b31a4eb511d95b90485d283d5c61bfb5.png

Extract pixels to drive#

ptc = ee.FeatureCollection([pt_dbf, pt_ag1, pt_ag2, pt_ag3])

paramS2 = {"reducer": ee.Reducer.mean(),
           "crs": crs,
           "scale": scale,
           "bands": ['ndvi'],
           "bandsRename": ['ndvi'],
           "imgProps": None,
           "imgPropsRename": None,
           "datetimeName": 'date',
           "datetimeFormat": 'YYYY-MM-dd'
}

s2_spectra = geopy.extract_pixels_toDrive(ndviS2, ptc, paramS2)

You can observe the progress in the tasks tab of the web-based IDE.

task = ee.batch.Export.table.toDrive(collection=s2_spectra, folder='geopy', 
                                     description='s2_sample',
                                     fileFormat='CSV')
#task.start()