# Global Daily Event Analysis: Marine Heatwave ID & Tracking using `MarEx`

### `MarEx` Processing Pipeline for Gridded Datasets:

1. **Morphological Pre-Processing**
    - Performs binary morphological closing using `dask_image.ndmorph` to fill small spatial holes up to `R_fill` cells in radius 
    - Executes binary opening to remove isolated small features of order `R_fill`
    - Fills gaps in time to maintain event continuity for interruptions up to `T_fill` time steps
    - Filters out smallest objects below the `area_filter_quartile` percentile threshold

2. **Blob Identification**
    - Labels spatially connected components using efficient connected-component algorithm in `dask_image.ndmeasure`
    - Computes blob properties (area, centroid, boundaries)

3. **Temporal Tracking**
    - Identifies blob overlaps between consecutive time frames
    - Connects objects across time, applying the following criteria for splitting, merging, & persistence:
        - Connected objects must overlap by at least fraction `overlap_threshold` of the smaller area
        - Merged objects retain their original ID, but partition the child area based on the parent of the _nearest-neighbour_ cell (or centroid distance)

4. **Graph Reduction & Finalisation**
    - Constructs the complete temporal graph of object evolution through time
    - Resolves object connectivity graph using `scipy.sparse.csgraph.connected_components`
    - Creates globally unique IDs for each tracked extreme event
    - Maps objects into efficient ID-time space for convenient analysis
    - Computes comprehensive statistics about the lifecycle of each event

The pipeline leverages **dask** for distributed parallel computation, enabling efficient processing of large datasets. \
A 40-year global daily analysis at 0.25° resolution on 32 cores takes
- Basic (i.e. Scannell et al., which involves no merge/split criteria or tracking):  ~5 minutes
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [None]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import marEx
import marEx.helper as hpc

In [None]:
# Start Dask Cluster
client = hpc.start_local_cluster(n_workers=32, n_threads=1)

In [None]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extremes_binary_gridded.zarr'
chunk_size = {'time': 25, 'lat': -1, 'lon': -1}  # Adjust chunksize depending on system memory (needs to be globally-chunked in space)
ds = xr.open_zarr(str(file_name), chunks=chunk_size)

In [None]:
# Run ID, Tracking, & Merging

tracker = marEx.tracker(ds.extreme_events, 
                       ds.mask.where((ds.lat < 85) &
                                     (ds.lat > -90), other=False),  # Modify Mask: Anisotropy of the lat/lon grid near the poles biases the ID & Tracking
                       area_filter_quartile = 0.5,                  # Remove the smallest 50% of the identified coherent extreme areas
                       R_fill = 8,                                  # Fill small holes with radius < 8 _cells_
                       T_fill = 2,                                  # Allow gaps of 2 days and still continue the event tracking with the same ID
                       allow_merging = True,                        # Allow extreme events to split/merge. Keeps track of merge events & unique IDs.
                       overlap_threshold = 0.5,                     # Overlap threshold for merging events. If overlap < threshold, events keep independent IDs.
                       nn_partitioning = True,                      # Use new NN method to partition merged children areas. If False, reverts to old method of Di Sun et al. 2023.
                       verbosity=1)                                 # Choose Verbosity Level (0=None, 1=Basic, 2=Advanced/Timing)

extreme_events_ds = tracker.run(return_merges=True)
extreme_events_ds

In [None]:
# Save IDed/Tracked/Merged Events to `zarr` for more efficient parallel I/O

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_merged_gridded.zarr'
extreme_events_ds.to_zarr(file_name, mode='w')

### Run Basic Tracking for Comparison
N.B.: This is the current standard method used in the literature, which involves _No_ merging/splitting and _No_ independent event tracking.

In [None]:
# Run Basic Tracking

tracker = blob.Spotter(ds.extreme_events, 
                       ds.mask.where((ds.lat < 85) &
                                     (ds.lat > -90), other=False),  # Modify Mask: Anisotropy of the lat/lon grid near the poles biases the ID & Tracking
                       area_filter_quartile = 0.5,                  # Remove the smallest 50% of the identified coherent extreme areas
                       R_fill = 8,                                  # Fill small holes with radius < 8 _cells_
                       T_fill = 0,                                  # No temporal hole filling
                       allow_merging = False,                       # Do not allow extreme events to split/merge. All touching events adopt the same ID forever.
                       verbosity=1)

extreme_events_basic_ds = tracker.run()
extreme_events_basic_ds

In [None]:
# Save IDed Events to `zarr` for more efficient parallel I/O

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_basic_gridded.zarr'
extreme_events_basic_ds.to_zarr(file_name, mode='w')