# Global Daily SST Analysis: Identifying Marine Extremes with `MarEx-Detect`

### `MarEx-Detect` Processing Pipeline:

1. **Anomaly Generation**
   - Removes polynomial trends (user-configurable orders)
   - Eliminates seasonal cycle via annual and semi-annual harmonics
   - Optionally standardises by day-of-year temporal variability

2. **Extreme Event Identification**
   - Computes adaptive local thresholds using percentile-based approach
   - Creates boolean masks identifying extreme events
   - Uses histogram-based approximation for efficiency on large datasets

3. **Results Assembly**
   - Attaches spatial metadata (connectivity, cell areas) if provided
   - Optimises chunking for subsequent analyses

The pipeline leverages **dask** for distributed parallel computation and **flox** for optimised groupby operations, enabling efficient processing of large datasets. \
A 40-year global daily analysis at 0.25° resolution completes in ~4 minutes on 128 cores.

In [None]:
import xarray as xr
import numpy as np
import dask
import intake
from getpass import getuser
from pathlib import Path

import marEx
import marEx.helper as hpc

In [None]:
# Lustre Scratch Directory
scratch_dir = Path('/scratch') / getuser()[0] / getuser()

In [None]:
# Start Dask Cluster
client = hpc.start_local_cluster(n_workers=32, threads_per_worker=1,
                                 scratch_dir = scratch_dir / 'clients')  # Specify temporary scratch directory for dask to use

In [None]:
# # Start Distributed Dask Cluster
# client = hpc.start_distributed_cluster(n_workers=256, workers_per_node=32, runtime=29, 
#                                        scratch_dir=scratch_dir / 'clients', account='bk1377')    # Specify temporary scratch directory for dask to use

In [None]:
# Import 40 years of Daily ICON data (ref. EERIE project)

cat = intake.open_catalog("https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/eerie.yaml")
expid = 'eerie-control-1950'
version = 'v20240618'
model = 'icon-esm-er'
gridspec = 'gr025'

dat = cat['dkrz.disk.model-output'][model][expid][version]['ocean'][gridspec]

In [None]:
# Load Data
sst = dat['2d_daily_mean'](chunks={}).to_dask().to.isel(depth=0).drop_vars('depth').sel(time=slice('1991-01-01', '2030-12-31')).isel(lat=slice(100,600), lon=slice(100,600))

In [None]:
# Process Data using `MarEx-Detect` helper functions:

extremes_ds = marEx.preprocess_data(sst, 
                                method_anomaly = 'shifting_baseline', # Anomalies from a rolling climatology using previous window_year years -- more "correct", but shortens time series by window_year years
                                method_extreme = 'hobday_extreme',    # Local day-of-year specific thresholds with windowing
                                threshold_percentile = 95,            # Use the 95th percentile as the extremes threshold
                                window_year_baseline = 15, 
                                smooth_days_baseline = 21,            # Defines the rolling climatology window (15 years) and smoothing window (21 days) for determining the anomalies
                                window_days_hobday = 11,              # Defines the window (11 days) of compiled samples collected for the extremes detection
                                dimensions = {'time':'time',
                                              'xdim':'lon',
                                              'ydim':'lat'},  # Define the dimensions of the data -- if 'ydim' exists, then MarEx-Detect knows this is a gridded dataset
                                dask_chunks = {'time': 25})   # Dask chunks for *output* data
extremes_ds

In [None]:
# Save Extremes Data to `zarr` for more efficient parallel I/O

file_name = scratch_dir / 'mhws' / 'extremes_binary_gridded.zarr'
extremes_ds.dat_detrend.to_zarr(file_name, mode='w')