Point intersections#
By that point in the class, we got to know the general structure of raster data as well as vector data; and for both cases we got to know common ways of how we use these types of data in our ever-day research lifes. Now, what we want to do here is to (a) bring the two concepts together, and (b) pave the way towards the next session, which will be centered around image classifications and regressions. For this type of work, what we often need is a number of points, often randomly distributed over the landscape. For each of these points, we want the information of some variables
. In remote sensing, these are often spectral indices or even time series of them, in other applications we want information stemming from other spatial layers, e.g., elevation, or whether the point is located inside a certain a spatial polygon or not. The outcome of such an analysis is often an array
, a csv-file
(e.g., to import it into R) or a simple list
.
This (comparatively short) chapter gives you some tools on hand, that will help you reach this goal. We will show here some small example code snippets showing you the main idea behind it. We won’t go again over the structure of raster and vector data again. If you get stuck here, then you most likely should go one step back to these chapters and have another look there.
What we give you an example are the following files:
A point shapefile, containing some points
A raster file, representing elevation
A polygon shapefile, indicating the country borders of Ukraine and Romania
As usual, we load all the packages and files required for our work:
from osgeo import ogr, osr, gdal
import pandas as pd
folder = "D:/OneDrive - Conservation Biogeography Lab/_TEACHING/__Classes-Modules_HUB/M8_Geoprocessing-with-python/data/"
pts = ogr.Open(folder + "ROM_UKR_pointIntersect/Points_sub.shp")
ptsLYR = pts.GetLayer()
ctr = ogr.Open(folder + "ROM_UKR_pointIntersect/UKR_ROM.gpkg")
ctrLYR = ctr.GetLayer()
elev = gdal.Open(folder + "ROM_UKR_pointIntersect/Elevation.tif")
Attention:
We ommit here of doing all the necessary checks one should/could do to better get to know the data, such as getting an overview of the spatial references, numbers of features, size of the raster data, etc. Make sure that you will do that carefully every time you work with spatial data!
You noticed probably, that we loaded another package through import pandas as pd
. The package pandas
is a very efficient tool to handle tables and large datasets. It is a bit less intuitive at the beginning compared to data.frames
in R, but to me they are much more powerful, as they have a numpy
array in the background and thus inherit much of its functionality. Check the chapter on pandas
in the script to learn more about it.
We will use pandas
here to create our output table. I name the columns according to our layers:
df = pd.DataFrame(columns=['PointID', 'Elev', 'Ctr'])
df
PointID | Elev | Ctr |
---|
That was simple. And the rest will also be simple - there is one additional thing you will newly learn, but the rest is already known and practiced, if you have carefully followed the previous sections of the script. Next, I will show, how to extract for the first feature in our pts
layer how to get the values. For the shapefile
, we know this from the previous chapter. Specifically, we will apply a SpatialFilter
and use GetField()
to get the value. Since we are working with points, there is for any layer only one single value. We do need to pay attention, however, to potential differences in coordinate systems:
firstPNT = ptsLYR[0]
firstGEOM = firstPNT.geometry()
To pay attention to different coordinate systems, we build a transformation rule.The field that we need to access in the country
shapefile is NAME_0
. Pay close attention to the different functions we use here, such as .GetNextFeature()
, .ResetReading()
- those are very important in this context. Can you explain why?
fromCS = firstGEOM.GetSpatialReference()
toCS = ctrLYR.GetSpatialRef()
cT = osr.CoordinateTransformation(fromCS, toCS)
# Build a clone, transform
firstGEOM_cl = firstGEOM.Clone()
firstGEOM_cl.Transform(cT)
# Apply the spatial filter
ctrLYR.SetSpatialFilter(firstGEOM_cl)
ctrFEAT = ctrLYR.GetNextFeature()
name = ctrFEAT.GetField('NAME_0')
ctrLYR.ResetReading()
name
'Romania'
Ok, this was indeed simple. Nothing that we did not know about. For the rasters it is equally simple. We will introduce two different ways on how to do this:
One that directly access the pure raster data in memory, and returns the hexadecimal code of the value. This then requires an additional function (that we have to write ourselves) to convert this code into a numeric element that we can work with.
One that uses the
.ReadAsArray()
method we got to know in the array manipulation and the introduction on gdal chapters of this script.
Let’s work again with the same point geometry from above. Here, for simplicity, we have prepared both layers in the same coordinate system. Let’s first have a look at the coordinates of the point:
mx, my = firstGEOM.GetX(), firstGEOM.GetY()
mx, my
(24.638622055428186, 47.74465006838611)
As expected, these are geo-coordinates which we will have to convert into array (or image) coordinates. We learned before, that we can make use of the GeoTransform
to do so
gt = elev.GetGeoTransform()
px = int((mx - gt[0]) / gt[1])
py = int((my - gt[3]) / gt[5])
px, py
(787, 411)
With these coordinates, we can now move on and get the data from the raster images. Let’s start doing this using the two different ways - we start with the ReadRaster
function:
# Using ReadRaster
val = elev.GetRasterBand(1).ReadRaster(px, py, 1, 1)
val
bytearray(b'w\x03')
Oha! This reads weird - it is how the data are stored on the disc. NOw what we need to do is to translate this into a something we can work with. As we know from other applications, elevtion is stored as 16bit Singed Integer
values. The struct
package can help us translating the hexadecimal code into a “pure” value:
import struct
val_unPack = struct.unpack('h', val)
val_unPack
(887,)
The function argument 'H'
represents the code for the correct datatype (i.e., 16 bit singed integer). See more on this below, but for now the outcome is a tupel, where only the first value is of interest. By indexing we can get the only value of interest for us:
val_unPack[0]
887
This really requires us to think more about data types, and we need to find a way to make this more eficient. One way to to think about this is to understand which data types exist and then think about a potential function that helps us apply the correct code (in the example above the 'H'
). I prepared a function below that takes a raster file as argument and returns the data type, translated into a value that can be read by the package struct
. Have a look here to understand the integer
codes for all raster types we know from gdal
, and here for more information on the struct
package.
def GetDataTypeHexaDec(rasFile):
rb = rasFile.GetRasterBand(1)
dType = rb.DataType
dTypes = [[1, 'b'], [2, 'H'], [3, 'h'], [4, 'I'], [5, 'i'], [6, 'f'], [7, 'd']]
for dT in dTypes:
if dT[0] == dType:
band_dType = dT[1]
return band_dType
so, for our elevation
file this returns
GetDataTypeHexaDec(elev)
'h'
This value we can plug into the struct.unpack()
call, and we are done!
Now, let’s finish here with the second option. This, as already mentioned, we already know from previous chapters of the script.
val = elev.GetRasterBand(1).ReadAsArray(px, py, 1, 1)
val
array([[887]], dtype=int16)
As expected, we get a numpy
array as a return value. Using simple indexing and a conversion into an integer value, we can get to the real value.
int(val[[0]])
887
Which one to use? Well, this is really hard to say. According to ChatGPT
, the ReadRaster
call has advantages when the number of iterations are large, as it runs (together with the struct.unpack
function) a bit faster. Within a few hundred or thousand of points we probably don’ see much of a difference. So, you should move on with the alternative you feel most comfortable with. Try it out!
Check it out!
The examples here are really only made for one single point feature. Yet, looking some cells above we have instatiated a pandas data frame to be filled with the values of all points. Thus, what we need to do is the part of the iteration. In the homework associated with this chapter, you will have to do this. Though, can you already think now how you would implement it?