# Pre-Process global _daily_ SST using `hot_to_blOb` to extract binary features

## Steps:
1. Compute Normalised Detrended Anomaly (cf. `hot_to_blOb.py::compute_normalised_anomaly()`)
2. Identify Extreme Values (i.e. above 95th percentile)

N.B.: Exploits parallelised `Dask` operations with optimised chunking using `flox` \
N.N.B.: This example using 40 years of Daily outputs at 0.25° resolution takes ~4 minutes on 128 cores

In [1]:
import xarray as xr
import dask
import intake
from getpass import getuser
from pathlib import Path

import spot_the_blOb.hot_to_blOb as hot
import spot_the_blOb.helper as hpc

In [2]:
# Start Dask Cluster
client = hpc.StartLocalCluster(n_workers=64, n_threads=2)

Memory per Worker: 7.87 GB
Hostname is  l40247
Forward Port = l40247:8787
Dashboard Link: localhost:8787/status


In [3]:
# Import 40 years of Daily EERIE ICON data

cat = intake.open_catalog("https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/eerie.yaml")
expid = 'eerie-control-1950'
version = 'v20231106'
model = 'icon-esm-er'
gridspec = 'gr025'

dat = cat['dkrz.disk.model-output'][model][expid][version]['ocean'][gridspec]

In [None]:
# Load the data directly into optimal chunks

da_predictor = dat['2d_daily_mean'](chunks={}).to_dask().to.isel(depth=0).drop_vars('depth') # Test da
time_chunk = hot.rechunk_for_cohorts(da_predictor).chunks[0]

sst = dat['2d_daily_mean'](chunks={'time':time_chunk}).to_dask().to.isel(depth=0).drop_vars('depth')

In [None]:
# Process Data using `hot_to_blOb` helper functions:

extreme_events_ds = hot.preprocess_data(sst, std_normalise=False, threshold_percentile=95)
extreme_events_ds

Unnamed: 0,Array,Chunk
Bytes,107.36 GiB,198.03 MiB
Shape,"(13879, 721, 1440)","(25, 721, 1440)"
Dask graph,556 chunks in 19 graph layers,556 chunks in 19 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 107.36 GiB 198.03 MiB Shape (13879, 721, 1440) (25, 721, 1440) Dask graph 556 chunks in 19 graph layers Data type float64 numpy.ndarray",1440  721  13879,

Unnamed: 0,Array,Chunk
Bytes,107.36 GiB,198.03 MiB
Shape,"(13879, 721, 1440)","(25, 721, 1440)"
Dask graph,556 chunks in 19 graph layers,556 chunks in 19 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 MiB,0.99 MiB
Shape,"(721, 1440)","(721, 1440)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 0.99 MiB 0.99 MiB Shape (721, 1440) (721, 1440) Dask graph 1 chunks in 6 graph layers Data type bool numpy.ndarray",1440  721,

Unnamed: 0,Array,Chunk
Bytes,0.99 MiB,0.99 MiB
Shape,"(721, 1440)","(721, 1440)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.42 GiB,24.75 MiB
Shape,"(13879, 721, 1440)","(25, 721, 1440)"
Dask graph,556 chunks in 30 graph layers,556 chunks in 30 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 13.42 GiB 24.75 MiB Shape (13879, 721, 1440) (25, 721, 1440) Dask graph 556 chunks in 30 graph layers Data type bool numpy.ndarray",1440  721  13879,

Unnamed: 0,Array,Chunk
Bytes,13.42 GiB,24.75 MiB
Shape,"(13879, 721, 1440)","(25, 721, 1440)"
Dask graph,556 chunks in 30 graph layers,556 chunks in 30 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray


In [None]:
# Save data to `zarr` for more efficient parallel I/O

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary.zarr'
extreme_events_ds.to_zarr(file_name, mode='w')