# NDVI

This document shows the computation of NDVI scores for each of the enclosed tesselations (ETs) in GB.

In [41]:
import geopandas
import rioxarray, xarray
from numpy import percentile

! echo "Run this notebook using version $GDS_ENV_VERSION of the gds_env"

Run this notebook using version 6.0alpha of the gds_env


We inspect the details of the mosaic to select the chunk:

In [4]:
! rio info /home/jovyan/data/GHS-composite-S2/GHS-composite-S2.vrt

{"blockxsize": 128, "blockysize": 128, "bounds": [-222823.73719089525, -213574.25107683009, 996789.2497132053, 1612237.380579703], "colorinterp": ["gray", "undefined", "undefined", "undefined"], "count": 4, "crs": "EPSG:27700", "descriptions": [null, null, null, null], "driver": "VRT", "dtype": "uint16", "height": 182437, "indexes": [1, 2, 3, 4], "lnglat": [-2.211309842042783, 56.18643258743896], "mask_flags": [["nodata"], ["nodata"], ["nodata"], ["nodata"]], "nodata": 0.0, "res": [10.007902079383749, 10.007902079383749], "shape": [182437, 121865], "tiled": true, "transform": [10.007902079383749, 0.0, -222823.73719089525, 0.0, -10.007902079383749, 1612237.380579703, 0.0, 0.0, 1.0], "units": [null, null, null, null], "width": 121865}


Since it's tiled on 128 by 128 pixels, we pick a chunk size that is ten times larger:

In [60]:
tst = geopandas.read_parquet("/home/jovyan/data/tess_6.pq")
tst["uID"] = tst["uID"].astype(int)
r = rioxarray.open_rasterio("/home/jovyan/data/GHS-composite-S2/GHS-composite-S2.vrt",
                            chunks={"x": 1280, "y": 1280}
                           )
r

Unnamed: 0,Array,Chunk
Bytes,177.86 GB,13.11 MB
Shape,"(4, 182437, 121865)","(4, 1280, 1280)"
Count,13729 Tasks,13728 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 177.86 GB 13.11 MB Shape (4, 182437, 121865) (4, 1280, 1280) Count 13729 Tasks 13728 Chunks Type uint16 numpy.ndarray",121865  182437  4,

Unnamed: 0,Array,Chunk
Bytes,177.86 GB,13.11 MB
Shape,"(4, 182437, 121865)","(4, 1280, 1280)"
Count,13729 Tasks,13728 Chunks
Type,uint16,numpy.ndarray


In [117]:
tst_sub = tst.cx[315891.95:330000, 213727.69:250000]
tst_sub.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 2049 entries, 15006 to 258237
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   uID          2049 non-null   int64   
 1   geometry     2049 non-null   geometry
 2   enclosureID  2049 non-null   int64   
dtypes: geometry(1), int64(2)
memory usage: 64.0 KB


Then, we can express the calculation of the NDVI index, although no computation will take place thanks to `xarray`/Dask's lazy evaluation:

In [39]:
ndvi = (r.sel(band=4) - r.sel(band=1)) / (r.sel(band=4) + r.sel(band=1))

With the NDVI expressed, we derive a method that allows us to, efficiently, go from a row in our `GeoDataFrame` to the average NDVI for that area:

In [56]:
def geom2ndvi(row, ndvi):
    val = ndvi.rio.clip_box(*row["geometry"].bounds)\
              .rio.clip([row["geometry"]])\
              .mean()\
              .values\
              .tolist()
    return val

The alternative involves [`geocube`'s zonal stats](https://corteva.github.io/geocube/stable/examples/zonal_statistics.html) and `make_geocube`:

In [57]:
from geocube.api.core import make_geocube

Before rasterization, we need to load the segment of the mosaic that overlaps (note no bits are streamed to memory, all lazy evaluation):

In [118]:
ndvi_segment = ndvi.rio.clip_box(*tst_sub.total_bounds)

We need to rasterize the features:

In [119]:
%%time
out_grid = make_geocube(
    vector_data = tst_sub,
    measurements=["uID"],
    like=ndvi_segment
)

CPU times: user 318 ms, sys: 5.81 ms, total: 324 ms
Wall time: 323 ms


This creates a `DataSet` object with a rasterised version of the tesselations in `tst`. Now we append the NDVI:

In [120]:
out_grid["ndvi"] = ndvi_segment

And with both aligned, we can group by each `uID` and calculate average NDVI:

In [121]:
%%time
g = out_grid.drop("spatial_ref")\
            .groupby(out_grid["uID"])

CPU times: user 598 ms, sys: 2.23 ms, total: 601 ms
Wall time: 598 ms


And we can get the average easily:

In [127]:
%%time
ndvi_mean = g.mean()

CPU times: user 6.21 s, sys: 2.89 ms, total: 6.22 s
Wall time: 6.21 s


In [132]:
mn = ndvi_mean.to_dataframe()[["ndvi"]]

  return func(*(_execute_task(a, cache) for a in args))
  x = np.divide(x1, x2, out)


In [149]:
geom2ndvi(tst_sub.query("uID == 6712717").iloc[0], ndvi)

  return func(*(_execute_task(a, cache) for a in args))


0.6968843539763994

In [153]:
%time out = tst_sub.head().apply(lambda r: geom2ndvi(r, ndvi), axis=1)

  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))


CPU times: user 1.6 s, sys: 14.6 ms, total: 1.62 s
Wall time: 1.6 s


  return func(*(_execute_task(a, cache) for a in args))


In [144]:
tst_sub.query("uID == 6712717")

Unnamed: 0,uID,geometry,enclosureID
22988,6712717,"POLYGON Z ((329700.369 234529.467 0.000, 32945...",658102


In [141]:
mn.head()

Unnamed: 0_level_0,ndvi
uID,Unnamed: 1_level_1
6712717.0,
6712718.0,0.682241
6712719.0,0.675091
6712720.0,
6712721.0,0.537544
