# NDVI

This document shows the computation of NDVI scores for each of the enclosed tesselations (ETs) in GB.

## Set up

In [1]:
import sys
sys.path.insert(0, "../")
import utils
import os
import fsspec
import pandas
import geopandas
import rioxarray, xarray
from dask import dataframe as dd
from numpy import percentile

# Local run (alternative to cluster)
from dask.distributed import Client, LocalCluster
import dask.array as da
from dask.diagnostics import ProgressBar

In [2]:
cluster = LocalCluster(
    n_workers=4,
    threads_per_worker=2
)
client = Client(address=cluster)
client

0,1
Client  Scheduler: tcp://127.0.0.1:38277  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 8  Memory: 67.10 GB


## Data

And paths to the two datasets we'll use:

- The full mosaic is stored as a folder of COGs served over HTTP. First let's grab the URL for the mosaic:

In [3]:
# Local
mosaic_url = "../../data/ghs_composite_s2/GHS-composite-S2.vrt"

## Connecting to the GHS Mosaic

We inspect the details of the mosaic to select the chunk:

In [10]:
! rio info $mosaic_url | python -m json.tool

{
    "blockxsize": 128,
    "blockysize": 128,
    "bounds": [
        -222823.73719089525,
        -213574.25107683009,
        996789.2497132053,
        1612237.380579703
    ],
    "colorinterp": [
        "gray",
        "undefined",
        "undefined",
        "undefined"
    ],
    "count": 4,
    "crs": "EPSG:27700",
    "descriptions": [
        null,
        null,
        null,
        null
    ],
    "driver": "VRT",
    "dtype": "uint16",
    "height": 182437,
    "indexes": [
        1,
        2,
        3,
        4
    ],
    "lnglat": [
        -2.211309842042783,
        56.18643258743896
    ],
    "mask_flags": [
        [
            "nodata"
        ],
        [
            "nodata"
        ],
        [
            "nodata"
        ],
        [
            "nodata"
        ]
    ],
    "nodata": 0.0,
    "res": [
        10.007902079383749,
        10.007902079383749
    ],
    "shape": [
        182437,
        121865
    ],
    "tiled": true,
    "transform": 

Since it's tiled on 128 by 128 pixels, we pick a chunk size that includes a few full tiles:

In [4]:
r = rioxarray.open_rasterio(mosaic_url,
                            chunks={"x": 121865, "y": 512}
                           )
r

Unnamed: 0,Array,Chunk
Bytes,177.86 GB,499.16 MB
Shape,"(4, 182437, 121865)","(4, 512, 121865)"
Count,358 Tasks,357 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 177.86 GB 499.16 MB Shape (4, 182437, 121865) (4, 512, 121865) Count 358 Tasks 357 Chunks Type uint16 numpy.ndarray",121865  182437  4,

Unnamed: 0,Array,Chunk
Bytes,177.86 GB,499.16 MB
Shape,"(4, 182437, 121865)","(4, 512, 121865)"
Count,358 Tasks,357 Chunks
Type,uint16,numpy.ndarray


In [5]:
i_y = r.coords["y"]
mid = (i_y.max() - i_y.min()) / 2
r = r.loc[:, slice(i_y[0], mid), :]

This will make each chunk in `r` read ten tiles at a time.

## National NDVI resampling

In this section we resample the national NDVI to explore rasterisation in Dask and be able to create an image of the distribution of NDVI at the national scale.

We can express the calculation of the NDVI index, although no computation will take place thanks to `xarray`/Dask's lazy evaluation:

In [6]:
ndvi = (r.sel(band=4) - r.sel(band=1)) / (r.sel(band=4) + r.sel(band=1))

With `datashader`, we can resample the entire mosaic into a much smaller `DataArray` that will fit comfortably in memory and that we can then plot:

In [7]:
import datashader as ds

w = 1500
h = w * r.shape[1] // r.shape[2]
cvs = ds.Canvas(plot_width=w, plot_height=h)
agg = cvs.raster(ndvi)
agg

Unnamed: 0,Array,Chunk
Bytes,10.32 MB,6.14 MB
Shape,"(860, 1500)","(512, 1500)"
Count,1465 Tasks,2 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 10.32 MB 6.14 MB Shape (860, 1500) (512, 1500) Count 1465 Tasks 2 Chunks Type float64 numpy.ndarray",1500  860,

Unnamed: 0,Array,Chunk
Bytes,10.32 MB,6.14 MB
Shape,"(860, 1500)","(512, 1500)"
Count,1465 Tasks,2 Chunks
Type,float64,numpy.ndarray


The real computation happens below:

In [8]:
%%time
ndvi_agg = agg.compute()



KilledWorker: ("('concatenate-finalize-from-value-resample_2d-getitem-d8eba8f8052c3824740519b916bd2726', 0, 0)", <Worker 'tcp://127.0.0.1:44859', name: 3, memory: 0, processing: 2>)

In [148]:
ndvi

Unnamed: 0,Array,Chunk
Bytes,177.86 GB,13.11 MB
Shape,"(182437, 121865)","(1280, 1280)"
Count,82369 Tasks,13728 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 177.86 GB 13.11 MB Shape (182437, 121865) (1280, 1280) Count 82369 Tasks 13728 Chunks Type float64 numpy.ndarray",121865  182437,

Unnamed: 0,Array,Chunk
Bytes,177.86 GB,13.11 MB
Shape,"(182437, 121865)","(1280, 1280)"
Count,82369 Tasks,13728 Chunks
Type,float64,numpy.ndarray
