# Reading rasters with rasterio


# Downscaling rasters thanks to dask

In this notebook we will look at a concrete case of data aggregation, with the example of changing the resolution using dask.

Dask is a Python library that enables calculations to be parallelized and large quantities of data to be handled in a scalable way, using available resources (CPU, memory, etc.). Unlike tools such as Pandas or NumPy, Dask allows you to work on data sets that exceed the available RAM memory by chunking the data into smaller pieces (chunks) and parallelizing calculations.
Dask works with lazy computation: instead of executing immediately, it builds a task graph. Calculations are only executed when the final result is explicitly requested (for example, by calling .compute()).

Dask offers a wide range of [modules](https://docs.dask.org/en/stable/#how-to-use-dask) (dask dataframe...) that can be used to distribute calculations. In this notebook we will focus on the use of [dask-arrays](https://docs.dask.org/en/stable/array.html) to manipulate raster data.

In order to change the resolution, we will look at two possible methods. One of them will prove less effective than the other, which will allow us to establish some general principles to follow for optimal use of Dask. The first one will use [map_blocks](https://docs.dask.org/en/stable/generated/dask.array.map_blocks.html) and the second one only [dask.array.mean](https://docs.dask.org/en/stable/generated/dask.array.mean.html)

For this tutorial, we will use a Sentinel-2 acquisition stored on the local disk. We will target the storage directory under the variable ``sentinel_2_dir``. Users can modify this directory and the associated paths.

## Python scripts

### Import libraries

First let's import libraries needed for this tutorial and create our dask [LocalCluster](https://docs.dask.org/en/stable/deploying-python.html#localcluster) which allow us to create workers and use [dask's dashboard](https://docs.dask.org/en/latest/dashboard.html).


In [8]:
from pathlib import Path

from typing import List, Tuple, Union, Dict
import dask.array as da
import numpy as np
import rasterio
import rioxarray as rxr
from dask import delayed
from dask.distributed import Client, LocalCluster, Lock
from rasterio.transform import Affine

from utils import create_map_with_rasters

cluster = LocalCluster()
client = Client(cluster)

print("Dask Dashboard: ", client.dashboard_link)
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 36839 instead


Dask Dashboard:  http://127.0.0.1:36839/status


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:36839/status,

0,1
Dashboard: http://127.0.0.1:36839/status,Workers: 4
Total threads: 4,Total memory: 28.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:33599,Workers: 4
Dashboard: http://127.0.0.1:36839/status,Total threads: 4
Started: Just now,Total memory: 28.00 GiB

0,1
Comm: tcp://127.0.0.1:35379,Total threads: 1
Dashboard: http://127.0.0.1:42093/status,Memory: 7.00 GiB
Nanny: tcp://127.0.0.1:43625,
Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-15rnueb2,Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-15rnueb2

0,1
Comm: tcp://127.0.0.1:34907,Total threads: 1
Dashboard: http://127.0.0.1:43573/status,Memory: 7.00 GiB
Nanny: tcp://127.0.0.1:34007,
Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-p8x8_6ao,Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-p8x8_6ao

0,1
Comm: tcp://127.0.0.1:44905,Total threads: 1
Dashboard: http://127.0.0.1:36541/status,Memory: 7.00 GiB
Nanny: tcp://127.0.0.1:34321,
Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-a5wqbi7k,Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-a5wqbi7k

0,1
Comm: tcp://127.0.0.1:45347,Total threads: 1
Dashboard: http://127.0.0.1:46387/status,Memory: 7.00 GiB
Nanny: tcp://127.0.0.1:42059,
Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-sm2i26gx,Local directory: /tmp/slurm-28574237/dask-scratch-space/worker-sm2i26gx


Dask return an url where the dashboard is availaible (usually http://127.0.0.1:8787/status). This is not a tutorial on how to use this dashboard, but we recommend using it in a separate window while using this notebook.

### Open raster thanks to rioxarray

Here we are going to open the raster data required for this tutorial, the RGB bands from a Sentinel-2 acquisition. To do this, we're going to use rioxarray and, more specifically, the [open_rasterio](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) method, which opens the images lazily (without loading data into memory) and returns a `dask.array` object. 
From this method we will use the ``chunks`` and ``lock`` arguments, which respectively set a chunk size and limit access to the data to one thread at a time to avoid read problems. Here ``chunks`` is set to ``True`` to allow dask to automatically size chunks.


## Calculation of the Average NDVI with dask

In this example, we will use what we have learned to: 

1. Read the data from the disk and stack them.
2. Calculate the associated NDVI, which combines multi-band information into a single band.
3. Reduce the information by calculating the average NDVI within a window.
4. Write the resulting image to the disk.

First, let's read the data we need to perform the NDVI.

In [3]:
def open_raster_and_get_metadata(raster_paths: List[str], chunks: Union[int, Tuple, Dict, None]):
    """
    Opens multiple raster files, extracts shared geospatial metadata, 
    and returns the concatenated data along with resolution and CRS info.

    Parameters:
    -----------
    raster_paths : List[str]
        Paths to the raster files.
    chunks : Union[int, Tuple, Dict, bool, None]
        Chunk sizes for Dask (bands, height, width).

    Returns:
    --------
    Tuple[dask.array.Array, float, float, float, float, Union[str, CRS]]
        Concatenated raster data, x and y resolution, top-left coordinates, and CRS.
    """
    results = []
    for raster_path in raster_paths:
        with rxr.open_rasterio(raster_path, chunks=chunks, lock=True) as tif:
            reprojection = tif
            transform = reprojection.rio.transform()
            crs = reprojection.rio.crs
            x_res = transform[0]
            y_res = -transform[4]
            top_left_x = transform[2]
            top_left_y = transform[5]
            results.append(reprojection)

    return da.concatenate(results), x_res, y_res, top_left_x, top_left_y, crs

# Paths to RGB Sentinel-2 bands
sentinel_2_dir = "/work/scratch/data/romaint"
s2_b4 = f"{sentinel_2_dir}/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B4.tif"
s2_b8 = f"{sentinel_2_dir}/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B8.tif"
#reading_chunks = (-1,2200,2200)
reading_chunks = True
input_data_array, x_res, y_res, top_left_x, top_left_y, crs = open_raster_and_get_metadata([s2_b4,s2_b8], reading_chunks)

When the data is read, we can express the NDVI calculation as if it were a numpy array. We add ``[None, :, :]`` to keep the shape as ``(bands, rows, cols)``. Then we can apply reduction on the dask.array and use ``compute()`` on it to triger the computation.

In [None]:
input_data_array

In [None]:
print(input_data_array.shape)
ndvi_array = (input_data_array[1] - input_data_array[0]) / (input_data_array[1] + input_data_array[0])[None, :, :]

In [None]:
%%time

mean_ndvi = ndvi_array.compute()

In [4]:
def create_raster(data: np.ndarray, output_file: Path, x_res, y_res, top_left_x, top_left_y, crs):
    transform = Affine.translation(top_left_x, top_left_y) * Affine.scale(x_res, -y_res)
    with rasterio.open(
            output_file, "w",
            driver="GTiff",
            height=data.shape[1],
            width=data.shape[2],
            count=data.shape[0],
            dtype=data.dtype,
            crs=crs,
            transform=transform
    ) as dst:
        dst.write(data)

crs="EPSG:4326"
output_file = Path("/home/tromain/Data/S2/ndvi_dask.tif")
#create_raster(ndvi_array, output_file, x_res , y_res,
#                  top_left_x, top_left_y, crs)

# Calculate NDVI With OTB in python

In [9]:
import otbApplication as otb

sentinel_2_dir = "/work/scratch/data/romaint"
s2_b4 = f"{sentinel_2_dir}/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B4.tif"
s2_b8 = f"{sentinel_2_dir}/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B8.tif"
out_ndvi_otb_py="/work/scratch/data/romaint/img_ndvi_otb.tif"
#phr_product = "/work/scratch/data/tanguyy/public/PHR_OTB/Marmande/IMG_PHR1A_PMS_202304151100243_SEN_6967638101-1_R1C1_wnodata.tif"
phr_product = "/work/scratch/data/romaint/phr_cog.tif"
#Compute NDVI with OTB in python
app_ndvi_otb = otb.Registry.CreateApplication("BandMath")
#app_ndvi_otb.SetParameterStringList("il",[s2_b4,s2_b8])
#app_ndvi_otb.SetParameterString("exp","(im2b1-im1b1)/(im2b1+im1b1)")
app_ndvi_otb.SetParameterStringList("il",[phr_product])
app_ndvi_otb.SetParameterString("exp","(im1b4-im1b1)/(im1b4+im1b1)")
app_ndvi_otb.SetParameterString("out",out_ndvi_otb_py)
app_ndvi_otb.ExecuteAndWriteOutput()




Writing /work/scratch/data/romaint/img_ndvi_otb.tif...: 100% [**************************************************] (1m 45s)


0

# Calculate NDVI with OTB in C++
This part will call BandMath with the otb CLI to compare performances with the python swig interface

In [None]:
%%bash
otbcli_BandMath -il "/work/scratch/data/romaint/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B4.tif" "/work/scratch/data/romaint/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B8.tif" -exp "( im2b1 - im1b1 ) / ( im2b1 + im1b1 )" -out "/work/scratch/data/romaint/img_ndvi_otb_cpp.tif" 

# Compute SuperImpose with OTB

Superimpose does a resampling then a crop to have a new raster that has the same resolution as the reference input image. This app is using multithreading a lot, it is interesting to compare it with a solution like dask and full python / rasterio

In [None]:
import otbApplication as otb

#product_dir = "/work/scratch/data/romaint"
pan_raster_url = "/work/scratch/data/tanguyy/public/PHR_OTB/IMG_PHR1B_P_001/IMG_PHR1B_P_202302281104151_SEN_6967639101-1_R1C1.JP2"
xs_raster_url = "/work/scratch/data/tanguyy/public/PHR_OTB/IMG_PHR1B_MS_004/IMG_PHR1B_MS_202302281104151_SEN_6967639101-2_R1C1.JP2"
appSI = otb.Registry.CreateApplication("Superimpose")

appSI.SetParameterString("inr", pan_raster_url)
appSI.SetParameterString("inm", xs_raster_url)
appSI.SetParameterString("out", "/work/scratch/data/romaint/SuperimposedXS_to_PAN.tif")

appSI.ExecuteAndWriteOutput()

# Performance improvement with xarray.coarsen method ?

In [None]:
%%time

import xarray
from rioxarray.merge import merge_arrays

pan_raster_url = "/home/tromain/Data/phr_pan.tif"
xs_raster_url = "/home/tromain/Data/phr_xs.tif"
origin_raster = rxr.open_rasterio(pan_raster_url,chunks=True)
#.squeeze('band', drop=True)
print(origin_raster)
#origin_raster = origin_raster.load()
pxs_raster = origin_raster.coarsen(x=4, y=4, boundary='pad').mean()
xs_raster = rxr.open_rasterio(xs_raster_url,chunks=True)
print(pxs_raster)
#print("X == X? {} Y==Y? {}".format((origin_raster.x == xs_raster.x),(origin_raster.y == xs_raster.y)))
print(all(list(origin_raster.x == xs_raster.x)))
#ppxs_raster = xarray.concat([pxs_raster,xs_raster],dim="band",coords="all")
#ppxs_raster = merge_arrays([pxs_raster,xs_raster])
crs="EPSG:4326"
output_file = Path("/home/tromain/Data/superimpose_coarsen.tif")
pxs_raster.rio.to_raster(output_file)

# SuperImpose using rasterio reproject_match

In [None]:
%%time

import xarray
from pathlib import Path

pan_raster_url = "/work/scratch/data/tanguyy/public/PHR_OTB/IMG_PHR1B_P_001/IMG_PHR1B_P_202302281104151_SEN_6967639101-1_R1C1.JP2"
xs_raster_url = "/work/scratch/data/tanguyy/public/PHR_OTB/IMG_PHR1B_MS_004/IMG_PHR1B_MS_202302281104151_SEN_6967639101-2_R1C1.JP2"

xds = xarray.open_dataarray(pan_raster_url,chunks=True)
xds.rio.write_crs("epsg:4326", inplace=True)
xds.rio.write_nodata(0, inplace=True)
xds_match = xarray.open_dataarray(xs_raster_url,chunks=True)
print(xds.shape)
print(xds.dtype)
print(xds_match.shape)
xds_match.rio.write_crs("epsg:4326", inplace=True)
xds_match.rio.write_nodata(0, inplace=True)
xds_repr_match = xds.rio.reproject_match(xds_match)
crs="EPSG:4326"
output_file = Path("/work/scratch/data/romaint/superimpose_reproject_match.tif")
xds_repr_match.rio.to_raster(output_file)

# Using numba vs xarray vs numpy for NDVI ?
Here we will try to use numba for ndvi computation

In [23]:
%%time

from numba import jit,njit
from xarray import DataArray 
from typing import List, Tuple, Union, Dict
import rioxarray as rxr
import numpy as np
import time
import rasterio
from pathlib import Path
import xarray

@njit
def one_pixel_ndvi(p1,p2):
    return (p2-p1) / (p2+p1) 

@njit
def compute_ndvi_numba(input_data_1: np.ndarray,input_data_2: np.ndarray):
    #ndvi_array = [one_pixel_ndvi(i,j) for i in input_data_1 for j in input_data_2]
    ndvi_array = (input_data_2 - input_data_1) / (input_data_2 + input_data_1)
    return ndvi_array

def compute_ndvi_dask(input_data_1: DataArray,input_data_2: DataArray):
    ndvi_array = ((input_data_2 - input_data_1) / (input_data_2 + input_data_1))[None,:,:]
    ndvi_array.compute()
    return ndvi_array

def compute_ndvi_std(input_data_1: np.ndarray,input_data_2: np.ndarray):
    #ndvi_array = [one_pixel_ndvi(i,j) for i in input_data_1 for j in input_data_2]
    ndvi_array = (input_data_2 - input_data_1) / (input_data_2 + input_data_1)
    return ndvi_array

sentinel_2_dir = "/work/scratch/data/romaint"
s2_b4 = f"{sentinel_2_dir}/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B4.tif"
s2_b8 = f"{sentinel_2_dir}/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1/SENTINEL2B_20240822-105857-973_L2A_T31TCJ_C_V3-1_FRE_B8.tif"
phr_product = "/work/scratch/data/tanguyy/public/PHR_OTB/Marmande/IMG_PHR1A_PMS_202304151100243_SEN_6967638101-1_R1C1_wnodata.tif" 

start = time.perf_counter()
with rasterio.open(s2_b4, 'r') as ds:
    input_data_b4 = ds.read() 

with rasterio.open(s2_b8, 'r') as ds:
    input_data_b8 = ds.read()
   
ndvi_computed = compute_ndvi_numba(input_data_b4,input_data_b8)
crs="EPSG:4326"
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_numba.tif")
#create_raster(ndvi_computed, output_file, x_res , y_res, top_left_x, top_left_y, crs)
end = time.perf_counter()
print("Elapsed with Raster IO + numba = {}s".format((end - start)))

start = time.perf_counter()
with rasterio.open(s2_b4, 'r') as ds:
    input_data_b4 = ds.read() 

with rasterio.open(s2_b8, 'r') as ds:
    input_data_b8 = ds.read()

ndvi_computed = compute_ndvi_std(input_data_b4,input_data_b8)
crs="EPSG:4326"
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_without_numba.tif")
#create_raster(ndvi_computed, output_file, x_res , y_res,top_left_x, top_left_y, crs)
end = time.perf_counter()
print("Elapsed with Raster IO + without numba = {}s".format((end - start)))
# Example with xarray
start = time.perf_counter()
input_data_b4 =  xarray.open_dataarray(s2_b4)
input_data_b8 =  xarray.open_dataarray(s2_b8)
ndvi_computed = xarray.apply_ufunc(compute_ndvi_std,input_data_b4,input_data_b8)
ndvi_computed.rio.to_raster("/work/scratch/data/romaint/output_greenit/ndvi_ufunc.tif")
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_ufunc.tif")
end = time.perf_counter()
print("Elapsed with Xarray + apply ufunc = {}s".format((end - start)))

start = time.perf_counter()
input_data_b4 = rxr.open_rasterio(s2_b4,chunks=True)
input_data_b8 = rxr.open_rasterio(s2_b8,chunks=True)
ndvi_array = (input_data_b8 - input_data_b4) / (input_data_b8 + input_data_b4)
ndvi_array.compute()
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_std.tif")
ndvi_array.rio.to_raster(output_file)
end = time.perf_counter()
print("Elapsed with RIOXarray + dask = {}s".format((end - start)))

#phr_product = "/work/scratch/data/tanguyy/public/PHR_OTB/Marmande/IMG_PHR1A_PMS_202304151100243_SEN_6967638101-1_R1C1_wnodata.tif"
phr_product = "/work/scratch/data/romaint/phr_cog.tif"
start = time.perf_counter()
reading_chunks = True
input_data_array = rxr.open_rasterio(phr_product,chunks=reading_chunks,lock=False)
print(input_data_array.shape)
ndvi_phr = (input_data_array[3] - input_data_array[0]) / (input_data_array[0] + input_data_array[3])
ndvi_phr.compute()
print(ndvi_phr.shape)
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_phr.tif")
ndvi_phr.rio.to_raster(output_file,tiled=True)
end = time.perf_counter()
print("Elapsed with RIOXarray + dask + big product = {}s".format((end - start)))

Elapsed with Raster IO + numba = 1.2825823966413736s
Elapsed with Raster IO + without numba = 1.1891403198242188s
Elapsed with Xarray + apply ufunc = 2.707308219745755s
Elapsed with RIOXarray + dask = 5.868392776697874s
(4, 41663, 39844)


2024-11-14 15:29:30,639 - distributed.core - ERROR - Exception while handling op get_data
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/zict/buffer.py", line 184, in __getitem__
    return self.fast[key]
  File "/usr/local/lib/python3.10/dist-packages/zict/common.py", line 127, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/zict/lru.py", line 117, in __getitem__
    result = self.d[key]
KeyError: "('truediv-f9d8cf4534ba5caa42181c228e4602cd', 0, 2)"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/core.py", line 820, in _handle_comm
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/distributed/worker.py", line 1734, in get_data
    data = {k: self.data[k] for k in keys if k in self.data}
  File "/usr/local/lib/python3.10/dist-packages/distributed/worker.py", line 1734,

Exception: AssertionError()

2024-11-14 15:29:39,214 - distributed.worker - ERROR - 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/zict/buffer.py", line 184, in __getitem__
    return self.fast[key]
  File "/usr/local/lib/python3.10/dist-packages/zict/common.py", line 127, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/zict/lru.py", line 117, in __getitem__
    result = self.d[key]
KeyError: "('truediv-f9d8cf4534ba5caa42181c228e4602cd', 0, 2)"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/worker.py", line 191, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/distributed/worker.py", line 1936, in handle_stimulus
    super().handle_stimulus(*stims)
  File "/usr/local/lib/python3.10/dist-packages/distributed/worker_state_machine.py", line 3659, in handle_stimulus
    in

In [None]:
%%time

from numba import jit,njit
from xarray import DataArray 
from typing import List, Tuple, Union, Dict
import rioxarray as rxr
import numpy as np
import time
import rasterio
from pathlib import Path
import xarray

phr_product = "/work/scratch/data/romaint/phr_cog.tif"
start = time.perf_counter()
reading_chunks = True
input_data_array = rxr.open_rasterio(phr_product,chunks=reading_chunks,lock=False)
print(input_data_array.shape)
ndvi_phr = (input_data_array[3] - input_data_array[0]) / (input_data_array[0] + input_data_array[3])
#ndvi_phr.compute()
print(ndvi_phr.shape)
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_phr.tif")
ndvi_phr.rio.to_raster(output_file,tiled=True)
end = time.perf_counter()
print("Elapsed with RIOXarray + dask + big product = {}s".format((end - start)))

(4, 41663, 39844)
(41663, 39844)


  dataset = writer(
  return self.func(*new_argspec)
  return self.func(*new_argspec)
  return self.func(*new_argspec)
  return self.func(*new_argspec)


In [4]:
input_data_array

Unnamed: 0,Array,Chunk
Bytes,12.37 GiB,128.00 MiB
Shape,"(4, 41663, 39844)","(1, 8192, 8192)"
Dask graph,120 chunks in 2 graph layers,120 chunks in 2 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 12.37 GiB 128.00 MiB Shape (4, 41663, 39844) (1, 8192, 8192) Dask graph 120 chunks in 2 graph layers Data type uint16 numpy.ndarray",39844  41663  4,

Unnamed: 0,Array,Chunk
Bytes,12.37 GiB,128.00 MiB
Shape,"(4, 41663, 39844)","(1, 8192, 8192)"
Dask graph,120 chunks in 2 graph layers,120 chunks in 2 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


# Parallel write

In [5]:
from xarray import DataArray 
from typing import List, Tuple, Union, Dict
import rioxarray as rxr
import numpy as np
import time
import rasterio
from pathlib import Path
import xarray

#phr_product = "/work/scratch/data/romaint/phr_cog.tif"
phr_product = "/work/scratch/data/tanguyy/public/PHR_OTB/Marmande/IMG_PHR1A_PMS_202304151100243_SEN_6967638101-1_R1C1_wnodata.tif"
reading_chunks = (-1,8192,8192)
input_data_array = rxr.open_rasterio(phr_product,chunks=reading_chunks)
#input_data_array, x_res, y_res, top_left_x, top_left_y, crs = open_raster_and_get_metadata([phr_product], reading_chunks)

In [6]:
input_data_array

Unnamed: 0,Array,Chunk
Bytes,12.37 GiB,512.00 MiB
Shape,"(4, 41663, 39844)","(4, 8192, 8192)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 12.37 GiB 512.00 MiB Shape (4, 41663, 39844) (4, 8192, 8192) Dask graph 30 chunks in 2 graph layers Data type uint16 numpy.ndarray",39844  41663  4,

Unnamed: 0,Array,Chunk
Bytes,12.37 GiB,512.00 MiB
Shape,"(4, 41663, 39844)","(4, 8192, 8192)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


In [7]:
start = time.perf_counter()
ndvi_phr = (input_data_array[3] - input_data_array[0]) / (input_data_array[3] + input_data_array[0])
ndvi_phr.compute()
print(ndvi_phr.shape)
output_file = Path("/work/scratch/data/romaint/output_greenit/ndvi_phr.tif")
#create_raster(ndvi_computed, output_file, x_res , y_res, top_left_x, top_left_y, crs)
ndvi_phr.rio.to_raster(output_file)
end = time.perf_counter()
print("Elapsed with RIOXarray + dask + big product with write = {}s".format((end - start)))


(41663, 39844)


  dataset = writer(


Elapsed with RIOXarray + dask + big product with write = 106.9249369930476s


# Conclusions and recommandations about parallel write


# Performances multiprocessing example


# Optimizing the size of your data

In [None]:
using COG ?

# Estimate the carbon impact of your code
Using code carbon you can have an estimate of your code footprint

In [6]:
from codecarbon import track_emissions


ModuleNotFoundError: No module named 'codecarbon'