# Global Daily Event Analysis: Marine Heatwave ID & Tracking using `MarEx`

### `MarEx` Processing Pipeline for Gridded Datasets:

1. **Morphological Pre-Processing**
    - Performs binary morphological closing using `dask_image.ndmorph` to fill small spatial holes up to `R_fill` cells in radius 
    - Executes binary opening to remove isolated small features of order `R_fill`
    - Fills gaps in time to maintain event continuity for interruptions up to `T_fill` time steps
    - Filters out smallest objects below the `area_filter_quartile` percentile threshold

2. **Blob Identification**
    - Labels spatially connected components using efficient connected-component algorithm in `dask_image.ndmeasure`
    - Computes blob properties (area, centroid, boundaries)

3. **Temporal Tracking**
    - Identifies blob overlaps between consecutive time frames
    - Connects objects across time, applying the following criteria for splitting, merging, & persistence:
        - Connected objects must overlap by at least fraction `overlap_threshold` of the smaller area
        - Merged objects retain their original ID, but partition the child area based on the parent of the _nearest-neighbour_ cell (or centroid distance)

4. **Graph Reduction & Finalisation**
    - Constructs the complete temporal graph of object evolution through time
    - Resolves object connectivity graph using `scipy.sparse.csgraph.connected_components`
    - Creates globally unique IDs for each tracked extreme event
    - Maps objects into efficient ID-time space for convenient analysis
    - Computes comprehensive statistics about the lifecycle of each event

The pipeline leverages **dask** for distributed parallel computation, enabling efficient processing of large datasets. \
A 40-year global daily (OSTIA) analysis at 0.25° resolution on 32 cores takes
- Basic (i.e. Scannell et al., which involves no merge/split criteria or tracking):  ~5 minutes
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [1]:
from getpass import getuser
from pathlib import Path

import dask
import xarray as xr

import marEx
import marEx.helper as hpc

In [2]:
# Lustre Scratch Directory
scratch_dir = Path("/scratch") / getuser()[0] / getuser()

In [None]:
# Start Dask Cluster
client = hpc.start_local_cluster(
    n_workers=32, threads_per_worker=1, scratch_dir=scratch_dir / "clients"
)  # Specify temporary scratch directory for dask to use

Perhaps you already have a cluster running?
Hosting the HTTP server on port 45389 instead


Hostname: l40094
Forward Port: l40094:45389
Dashboard Link: localhost:45389/status


In [4]:
# Choose optimal chunk size & load data
#   N.B.: This is crucial for dask (not only for performance, but also to make the problem tractable)
#         The operations are eventually global-in-space, and so requires the spatial dimension to be contiguous/unchunked
#         We can adjust the chunk size in time depending on available system memory.

chunk_size = {"time": 25, "lat": -1, "lon": -1}

In [5]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = scratch_dir / "mhws" / "extremes_binary_gridded_shifting_hobday.zarr"
ds = xr.open_zarr(str(file_name), chunks=chunk_size)

In [6]:
# Run ID, Tracking, & Merging

tracker = marEx.tracker(
    ds.extreme_events,
    ds.mask.where(
        (ds.lat < 85) & (ds.lat > -90), other=False
    ),  # Modify Mask: Anisotropy of the lat/lon grid near the poles biases the ID & Tracking
    area_filter_quartile=0.5,  # Remove the smallest 50% of the identified coherent extreme areas
    R_fill=8,  # Fill small holes with radius < 8 _cells_
    T_fill=2,  # Allow gaps of 2 days and still continue the event tracking with the same ID
    allow_merging=True,  # Allow extreme events to split/merge. Keeps track of merge events & unique IDs.
    overlap_threshold=0.5,  # Overlap threshold for merging events. If overlap < threshold, events keep independent IDs.
    nn_partitioning=True,  # Use new NN method to partition merged children areas. If False, reverts to old method of Di Sun et al. 2023.
    verbose=True,  # Enable detailed logging
)

extreme_events_ds, merges_ds = tracker.run(return_merges=True)
extreme_events_ds

  result = blockwise(


Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.603239018193744
   Total Object Area IDed (cells): 542287274.0
   Number of Initial Pre-Filtered Objects: 250096
   Number of Final Filtered Objects: 125109
   Area Cutoff Threshold (cells): 714
   Accepted Area Fraction: 0.9250359966957292
   Total Events Tracked: 13614
   Total Merging Events Recorded: 26739


Unnamed: 0,Array,Chunk
Bytes,35.85 GiB,98.88 MiB
Shape,"(9282, 720, 1440)","(25, 720, 1440)"
Dask graph,372 chunks in 3 graph layers,372 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 35.85 GiB 98.88 MiB Shape (9282, 720, 1440) (25, 720, 1440) Dask graph 372 chunks in 3 graph layers Data type int32 numpy.ndarray",1440  720  9282,

Unnamed: 0,Array,Chunk
Bytes,35.85 GiB,98.88 MiB
Shape,"(9282, 720, 1440)","(25, 720, 1440)"
Dask graph,372 chunks in 3 graph layers,372 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,482.04 MiB,1.30 MiB
Shape,"(9282, 13614)","(25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 482.04 MiB 1.30 MiB Shape (9282, 13614) (25, 13614) Dask graph 372 chunks in 2 graph layers Data type int32 numpy.ndarray",13614  9282,

Unnamed: 0,Array,Chunk
Bytes,482.04 MiB,1.30 MiB
Shape,"(9282, 13614)","(25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,482.04 MiB,1.30 MiB
Shape,"(9282, 13614)","(25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 482.04 MiB 1.30 MiB Shape (9282, 13614) (25, 13614) Dask graph 372 chunks in 2 graph layers Data type float32 numpy.ndarray",13614  9282,

Unnamed: 0,Array,Chunk
Bytes,482.04 MiB,1.30 MiB
Shape,"(9282, 13614)","(25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.88 GiB,5.19 MiB
Shape,"(2, 9282, 13614)","(2, 25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.88 GiB 5.19 MiB Shape (2, 9282, 13614) (2, 25, 13614) Dask graph 372 chunks in 2 graph layers Data type float64 numpy.ndarray",13614  9282  2,

Unnamed: 0,Array,Chunk
Bytes,1.88 GiB,5.19 MiB
Shape,"(2, 9282, 13614)","(2, 25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,120.51 MiB,332.37 kiB
Shape,"(9282, 13614)","(25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 120.51 MiB 332.37 kiB Shape (9282, 13614) (25, 13614) Dask graph 372 chunks in 2 graph layers Data type bool numpy.ndarray",13614  9282,

Unnamed: 0,Array,Chunk
Bytes,120.51 MiB,332.37 kiB
Shape,"(9282, 13614)","(25, 13614)"
Dask graph,372 chunks in 2 graph layers,372 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,106.36 kiB,106.36 kiB
Shape,"(13614,)","(13614,)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 106.36 kiB 106.36 kiB Shape (13614,) (13614,) Dask graph 1 chunks in 3 graph layers Data type datetime64[ns] numpy.ndarray",13614  1,

Unnamed: 0,Array,Chunk
Bytes,106.36 kiB,106.36 kiB
Shape,"(13614,)","(13614,)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,106.36 kiB,106.36 kiB
Shape,"(13614,)","(13614,)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 106.36 kiB 106.36 kiB Shape (13614,) (13614,) Dask graph 1 chunks in 3 graph layers Data type datetime64[ns] numpy.ndarray",13614  1,

Unnamed: 0,Array,Chunk
Bytes,106.36 kiB,106.36 kiB
Shape,"(13614,)","(13614,)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.18 GiB,14.28 MiB
Shape,"(9282, 13614, 11)","(25, 13614, 11)"
Dask graph,372 chunks in 3 graph layers,372 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 5.18 GiB 14.28 MiB Shape (9282, 13614, 11) (25, 13614, 11) Dask graph 372 chunks in 3 graph layers Data type int32 numpy.ndarray",11  13614  9282,

Unnamed: 0,Array,Chunk
Bytes,5.18 GiB,14.28 MiB
Shape,"(9282, 13614, 11)","(25, 13614, 11)"
Dask graph,372 chunks in 3 graph layers,372 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [None]:
merges_ds

In [8]:
# Save IDed/Tracked/Merged Events to `zarr` for more efficient parallel I/O

file_name = scratch_dir / "mhws" / "extreme_events_merged_gridded_shifting.zarr"
extreme_events_ds.to_zarr(file_name, mode="w")

<xarray.backends.zarr.ZarrStore at 0x154f06ace0e0>

In [9]:
# Save Merges Dataset to netcdf
file_name = scratch_dir / "mhws" / "extreme_events_merged_gridded_shifting_merges.nc"
merges_ds.to_netcdf(file_name, mode="w")

### Run Basic Tracking for Comparison
N.B.: This is the current standard method used in the literature, which involves _No_ temporal gap filling, _No_ merging/splitting and _No_ independent event tracking.

In [10]:
# Run Basic Tracking

tracker = marEx.tracker(
    ds.extreme_events,
    ds.mask.where(
        (ds.lat < 85) & (ds.lat > -90), other=False
    ),  # Modify Mask: Anisotropy of the lat/lon grid near the poles biases the ID & Tracking
    area_filter_quartile=0.5,  # Remove the smallest 50% of the identified coherent extreme areas
    R_fill=8,  # Fill small holes with radius < 8 _cells_
    T_fill=0,  # No temporal hole filling
    allow_merging=False,  # Do not allow extreme events to split/merge. All touching events adopt the same ID forever (after _and_ before (!)).
    verbose=True,  # Enable detailed logging
)

extreme_events_basic_ds = tracker.run()
extreme_events_basic_ds

   Binary Hobday to Processed Area Fraction: 0.6589432631500658
   Total Object Area IDed (cells): 497220876.0
   Number of Initial Pre-Filtered Objects: 266526
   Number of Final Filtered Objects: 133343
   Area Cutoff Threshold (cells): 637
   Accepted Area Fraction: 0.9235701278157918
   Total Events Tracked: 15511


Unnamed: 0,Array,Chunk
Bytes,35.85 GiB,98.88 MiB
Shape,"(9282, 720, 1440)","(25, 720, 1440)"
Dask graph,372 chunks in 3 graph layers,372 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 35.85 GiB 98.88 MiB Shape (9282, 720, 1440) (25, 720, 1440) Dask graph 372 chunks in 3 graph layers Data type int32 numpy.ndarray",1440  720  9282,

Unnamed: 0,Array,Chunk
Bytes,35.85 GiB,98.88 MiB
Shape,"(9282, 720, 1440)","(25, 720, 1440)"
Dask graph,372 chunks in 3 graph layers,372 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [11]:
# Save IDed Events to `zarr` for more efficient parallel I/O

file_name = scratch_dir / "mhws" / "extreme_events_basic_gridded_shifting.zarr"
extreme_events_basic_ds.to_zarr(file_name, mode="w")

<xarray.backends.zarr.ZarrStore at 0x154f07ca05e0>