<img src="https://gateway.dask.org/_static/images/dask-horizontal-white.svg"
     alt="Dask Logo"
     style="margin-right: 10px; width: 50%" />
# Distributed computing with Dask

EODC offers Dask as service by utilising [Dask Gateway](https://gateway.dask.org/). User can launch a Dask cluster in a shared and managed cluster environment without requring to have direct access to any cloud infrastructure resources such as VMs or Kubernetes clusters. The objetive is to lower the entrance barrier for users to run large scale data analysis on demand and in a scaleable environment.

An generic introduction of the usage of Dask Gateway can be found on the official [Dask Gateway documentation](https://gateway.dask.org/usage.html). In the following we will demonstrate the use of the Dask service at EODC to further support users.

Pre-requisit is to have Dask Gateway installed in your environment
```bash
pip install dask-gateway
```
or 
```bash
conda install -c conda-forge dask-gateway
```

It is important to note that the Python environment running the code and the environment utilised by Dask Gateway have to be almost identical.

We will install some additional packages used in this demo afterwards.

## Authentication via OIDC password grant flow
Only authenticated access is granted to the EODC Dask service, therefore a helper class to authenticate a user against the EODC identifiy managment system is implemented in the [EODC SDK](https://github.com/eodcgmbh/eodc-sdk).
The users password is directly handed over to the request object and is not stored.
Refreshed token is used to request a new access token in case it is expired, which is handled automatically in the authenticator.

## Connect to EODC Dask

Authenticating and connecting to EODC Dask can be done with a few lines of Python code.

Run the following in order to make sure all dependencies are met.

In [27]:
from eodc.dask import EODCDaskGateway
from rich.console import Console
from rich.prompt import Prompt
console = Console()
your_username = Prompt.ask(prompt="Enter your Username")
gateway = EODCDaskGateway(username=your_username)

KeyError: 'access_token'

## Change Cluster configuration if needed

In [2]:
cluster_options = gateway.cluster_options()
cluster_options

VBox(children=(HTML(value='<h2>Cluster Options</h2>'), GridBox(children=(HTML(value="<p style='font-weight: bo…

Options<worker_cores=2,
        worker_memory=2.0,
        image='registry.eodc.eu/eodc/clusters/dedl-deployment/dedl-dask:2023.08.3'>


## Create a Dask Cluster

Now we are going to create a Dask Cluster in order to run compute jobs.
To communicate with the cluster we have to instantiate a client as well.
Per default, no worker nodes are spawned, but this can be done either manually or even by enabling adaptive scaling of the cluster.

**Important: Please use the widget to add/scale the Dask workers. Per default no worker is spawned, therefore no computations can be performed by the cluster.**

In [None]:
cluster = gateway.new_cluster(cluster_options)
client = cluster.get_client()
cluster

VBox(children=(HTML(value='<h2>GatewayCluster</h2>'), HBox(children=(HTML(value='\n<div>\n<style scoped>\n    …

2025-02-24 10:13:08,607 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client


If you want to spawn a workers directly via Python adaptively please use the following method call. With the following the cluster will be scaled to 2 workers initially.
Depending on the load, Dask will add addtional workers, up to 5, if needed.

In [30]:
cluster.adapt(minimum=2, maximum=5)

## List clusters if available

In [31]:
console.print(gateway.list_clusters())

We can connect to already running clusters again.

In [6]:
cluster = gateway.connect(gateway.list_clusters()[0].name)
console.print(cluster)

## Display Dask Dashboard to monitor execution of computations
Copy the following link into a browser of your choice. Please consider the dashboard url provided is making use of http and not https.

In [7]:
cluster.dashboard_link

'http://dask.dev.services.eodc.eu/clusters/dask-gateway.50aa991596c6427a8668d1a52ed61fdb/status'

## implementing 01_local_dask notebook with eodc dask gateway

In [20]:
import pystac_client
from odc import stac as odc_stac
import xarray as xr
import rioxarray
import numpy as np
#import hvplot.xarray
import zipfile
from pathlib import Path
import shutil

In [21]:
chunks = {'time':1, "latitude": 1300, "longitude": 1300}
crs = "EPSG:4326" # Coordinate Reference System - World Geodetic System 1984 (WGS84) in this case 
res = 0.00018 # 20 meter in degree
time_range = "2022-10-11/2022-10-25"
minlon, maxlon = 12.3, 13.1
minlat, maxlat = 54.3, 54.6
bounding_box = [minlon, minlat, maxlon, maxlat]
eodc_catalog = pystac_client.Client.open("https://stac.eodc.eu/api/v1")
search = eodc_catalog.search(
    collections="SENTINEL1_SIG0_20M",
    bbox=bounding_box,
    datetime=time_range,
)

items_sig0 = search.item_collection()
items_sig0

def extract_orbit_names(items):
    return np.array([items[i].properties["sat:orbit_state"][0].upper() + \
                     str(items[i].properties["sat:relative_orbit"]) \
                     for i in range(len(items))])


def post_process_eodc_cube(dc: xr.Dataset, items, bands):
    if not isinstance(bands, tuple):
        bands = tuple([bands])
    for i in bands:
        dc[i] = post_process_eodc_cube_(dc[i], items, i)#https://github.com/TUW-GEO/dask-flood-mapper.git
    return dc

def post_process_eodc_cube_(dc: xr.Dataset, items, band):
    scale = items[0].assets[band].extra_fields.get('raster:bands')[0]['scale']
    nodata = items[0].assets[band].extra_fields.get('raster:bands')[0]['nodata']
    return dc.where(dc != nodata) / scale

In [23]:
bands = "VV"
sig0_dc = odc_stac.load(items_sig0,
                        bands=bands,
                        crs=crs,
                        chunks=chunks,
                        resolution=res,
                        bbox=bounding_box,
                        resampling="bilinear",
                        groupby=None,
                        )

In [24]:
sig0_dc_ = post_process_eodc_cube(sig0_dc, items_sig0, bands).\
    rename_vars({ "VV": "sig0"}).\
    assign_coords(orbit=("time", extract_orbit_names(items_sig0))).\
    dropna(dim="time", how="all").\
    sortby("time")

In [39]:
sig0_dc = client.compute(sig0_dc_)

In [40]:
__, indices = np.unique(sig0_dc.time, return_index=True)


AttributeError: 'Future' object has no attribute 'time'

In [None]:
indices.sort()
orbit_sig0 = sig0_dc.orbit[indices].data
sig0_dc = sig0_dc.groupby("time").mean(skipna=True)
sig0_dc = sig0_dc.assign_coords(orbit=("time", orbit_sig0))
sig0_dc = sig0_dc.persist()
wait(sig0_dc)
sig0_dc