# Global Daily Event Analysis: Marine Heatwave ID & Tracking using `MarEx`

### `MarEx` Processing Pipeline for Gridded Datasets:

1. **Morphological Pre-Processing**
    - Performs binary morphological closing using `dask_image.ndmorph` to fill small spatial holes up to `R_fill` cells in radius 
    - Executes binary opening to remove isolated small features of order `R_fill`
    - Fills gaps in time to maintain event continuity for interruptions up to `T_fill` time steps
    - Filters out smallest objects below the `area_filter_quartile` percentile threshold

2. **Blob Identification**
    - Labels spatially connected components using efficient connected-component algorithm in `dask_image.ndmeasure`
    - Computes blob properties (area, centroid, boundaries)

3. **Temporal Tracking**
    - Identifies blob overlaps between consecutive time frames
    - Connects objects across time, applying the following criteria for splitting, merging, & persistence:
        - Connected objects must overlap by at least fraction `overlap_threshold` of the smaller area
        - Merged objects retain their original ID, but partition the child area based on the parent of the _nearest-neighbour_ cell (or centroid distance)

4. **Graph Reduction & Finalisation**
    - Constructs the complete temporal graph of object evolution through time
    - Resolves object connectivity graph using `scipy.sparse.csgraph.connected_components`
    - Creates globally unique IDs for each tracked extreme event
    - Maps objects into efficient ID-time space for convenient analysis
    - Computes comprehensive statistics about the lifecycle of each event

The pipeline leverages **dask** for distributed parallel computation, enabling efficient processing of large datasets. \
A 40-year global daily analysis at 0.25° resolution on 32 cores takes
- Basic (i.e. Scannell et al., which involves no merge/split criteria or tracking):  ~5 minutes
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import marEx
import marEx.helper as hpc

In [None]:
# Lustre Scratch Directory
scratch_dir = Path('/scratch') / getuser()[0] / getuser() / 'mhws'

In [3]:
# Start Dask Cluster
client = hpc.start_local_cluster(n_workers=32, threads_per_worker=1,
                                 scratch_dir = scratch_dir / 'clients')  # Specify temporary scratch directory for dask to use

Dask Scratch: '/scratch/b/b382615/clients/tmp1w9o5vd5'
Memory per Worker: 15.74 GB
Hostname: l40038
Forward Port: l40038:8787
Dashboard Link: localhost:8787/status


In [4]:
# Choose optimal chunk size & load data
#   N.B.: This is crucial for dask (not only for performance, but also to make the problem tractable)
#         The operations are eventually global-in-space, and so requires the spatial dimension to be contiguous/unchunked
#         We can adjust the chunk size in time depending on available system memory.

chunk_size = {'time': 25, 'lat': -1, 'lon': -1}

In [None]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = scratch_dir / 'extremes_binary_gridded.zarr'
ds = xr.open_zarr(str(file_name), chunks=chunk_size)

In [6]:
# Run ID, Tracking, & Merging

tracker = marEx.tracker(ds.extreme_events, 
                       ds.mask.where((ds.lat < 85) &
                                     (ds.lat > -90), other=False),  # Modify Mask: Anisotropy of the lat/lon grid near the poles biases the ID & Tracking
                       area_filter_quartile = 0.5,                  # Remove the smallest 50% of the identified coherent extreme areas
                       R_fill = 8,                                  # Fill small holes with radius < 8 _cells_
                       T_fill = 2,                                  # Allow gaps of 2 days and still continue the event tracking with the same ID
                       allow_merging = True,                        # Allow extreme events to split/merge. Keeps track of merge events & unique IDs.
                       overlap_threshold = 0.5,                     # Overlap threshold for merging events. If overlap < threshold, events keep independent IDs.
                       nn_partitioning = True,                      # Use new NN method to partition merged children areas. If False, reverts to old method of Di Sun et al. 2023.
                       verbosity=1)                                 # Choose Verbosity Level (0=None, 1=Basic, 2=Advanced/Timing)

extreme_events_ds, merges_ds = tracker.run(return_merges=True)
extreme_events_ds

Finished filling spatial holes
Finished filling spatio-temporal holes
Finished filtering small objects
Finished object identification
Finished calculating object properties
Finished finding overlapping objects


  result = blockwise(


Processing splitting and merging in chunk 0 of 322
Processing splitting and merging in chunk 10 of 322
Processing splitting and merging in chunk 20 of 322
Processing splitting and merging in chunk 30 of 322
Processing splitting and merging in chunk 40 of 322
Processing splitting and merging in chunk 50 of 322
Processing splitting and merging in chunk 60 of 322
Processing splitting and merging in chunk 70 of 322
Processing splitting and merging in chunk 80 of 322
Processing splitting and merging in chunk 90 of 322
Processing splitting and merging in chunk 100 of 322
Processing splitting and merging in chunk 110 of 322
Processing splitting and merging in chunk 120 of 322
Processing splitting and merging in chunk 130 of 322
Processing splitting and merging in chunk 140 of 322
Processing splitting and merging in chunk 150 of 322
Processing splitting and merging in chunk 160 of 322
Processing splitting and merging in chunk 170 of 322
Processing splitting and merging in chunk 180 of 322
Proc

Unnamed: 0,Array,Chunk
Bytes,31.08 GiB,99.01 MiB
Shape,"(8036, 721, 1440)","(25, 721, 1440)"
Dask graph,322 chunks in 3 graph layers,322 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 31.08 GiB 99.01 MiB Shape (8036, 721, 1440) (25, 721, 1440) Dask graph 322 chunks in 3 graph layers Data type int32 numpy.ndarray",1440  721  8036,

Unnamed: 0,Array,Chunk
Bytes,31.08 GiB,99.01 MiB
Shape,"(8036, 721, 1440)","(25, 721, 1440)"
Dask graph,322 chunks in 3 graph layers,322 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,297.51 MiB,0.93 MiB
Shape,"(8036, 9705)","(25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 297.51 MiB 0.93 MiB Shape (8036, 9705) (25, 9705) Dask graph 322 chunks in 1 graph layer Data type int32 numpy.ndarray",9705  8036,

Unnamed: 0,Array,Chunk
Bytes,297.51 MiB,0.93 MiB
Shape,"(8036, 9705)","(25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,297.51 MiB,0.93 MiB
Shape,"(8036, 9705)","(25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 297.51 MiB 0.93 MiB Shape (8036, 9705) (25, 9705) Dask graph 322 chunks in 1 graph layer Data type float32 numpy.ndarray",9705  8036,

Unnamed: 0,Array,Chunk
Bytes,297.51 MiB,0.93 MiB
Shape,"(8036, 9705)","(25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.16 GiB,3.70 MiB
Shape,"(2, 8036, 9705)","(2, 25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.16 GiB 3.70 MiB Shape (2, 8036, 9705) (2, 25, 9705) Dask graph 322 chunks in 1 graph layer Data type float64 numpy.ndarray",9705  8036  2,

Unnamed: 0,Array,Chunk
Bytes,1.16 GiB,3.70 MiB
Shape,"(2, 8036, 9705)","(2, 25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,74.38 MiB,236.94 kiB
Shape,"(8036, 9705)","(25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 74.38 MiB 236.94 kiB Shape (8036, 9705) (25, 9705) Dask graph 322 chunks in 1 graph layer Data type bool numpy.ndarray",9705  8036,

Unnamed: 0,Array,Chunk
Bytes,74.38 MiB,236.94 kiB
Shape,"(8036, 9705)","(25, 9705)"
Dask graph,322 chunks in 1 graph layer,322 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,75.82 kiB,75.82 kiB
Shape,"(9705,)","(9705,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 75.82 kiB 75.82 kiB Shape (9705,) (9705,) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",9705  1,

Unnamed: 0,Array,Chunk
Bytes,75.82 kiB,75.82 kiB
Shape,"(9705,)","(9705,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,75.82 kiB,75.82 kiB
Shape,"(9705,)","(9705,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 75.82 kiB 75.82 kiB Shape (9705,) (9705,) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",9705  1,

Unnamed: 0,Array,Chunk
Bytes,75.82 kiB,75.82 kiB
Shape,"(9705,)","(9705,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.07 GiB,12.96 MiB
Shape,"(8036, 9705, 14)","(25, 9705, 14)"
Dask graph,322 chunks in 2 graph layers,322 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 4.07 GiB 12.96 MiB Shape (8036, 9705, 14) (25, 9705, 14) Dask graph 322 chunks in 2 graph layers Data type int32 numpy.ndarray",14  9705  8036,

Unnamed: 0,Array,Chunk
Bytes,4.07 GiB,12.96 MiB
Shape,"(8036, 9705, 14)","(25, 9705, 14)"
Dask graph,322 chunks in 2 graph layers,322 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [7]:
merges_ds

In [8]:
# Save IDed/Tracked/Merged Events to `zarr` for more efficient parallel I/O

file_name = scratch_dir / 'extreme_events_merged_gridded.zarr'
extreme_events_ds.to_zarr(file_name, mode='w')

<xarray.backends.zarr.ZarrStore at 0x155207e88700>

### Run Basic Tracking for Comparison
N.B.: This is the current standard method used in the literature, which involves _No_ temporal gap filling, _No_ merging/splitting and _No_ independent event tracking.

In [9]:
# Run Basic Tracking

tracker = marEx.tracker(ds.extreme_events, 
                       ds.mask.where((ds.lat < 85) &
                                     (ds.lat > -90), other=False),  # Modify Mask: Anisotropy of the lat/lon grid near the poles biases the ID & Tracking
                       area_filter_quartile = 0.5,                  # Remove the smallest 50% of the identified coherent extreme areas
                       R_fill = 8,                                  # Fill small holes with radius < 8 _cells_
                       T_fill = 0,                                  # No temporal hole filling
                       allow_merging = False,                       # Do not allow extreme events to split/merge. All touching events adopt the same ID forever (after _and_ before (!)).
                       verbosity=1)

extreme_events_basic_ds = tracker.run()
extreme_events_basic_ds

Finished filling spatial holes
Finished filling spatio-temporal holes
Finished filtering small objects
Finished tracking all extreme events!


Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.8525119484330325
   Total Object Area IDed (cells): 361690641.0
   Number of Initial Pre-Filtered Objects: 210966
   Number of Final Filtered Objects: 105545
   Area Cutoff Threshold (cells): 651
   Accepted Area Fraction: 0.9064810029187347
   Total Events Tracked: 10190




Unnamed: 0,Array,Chunk
Bytes,31.08 GiB,99.01 MiB
Shape,"(8036, 721, 1440)","(25, 721, 1440)"
Dask graph,322 chunks in 3 graph layers,322 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 31.08 GiB 99.01 MiB Shape (8036, 721, 1440) (25, 721, 1440) Dask graph 322 chunks in 3 graph layers Data type int32 numpy.ndarray",1440  721  8036,

Unnamed: 0,Array,Chunk
Bytes,31.08 GiB,99.01 MiB
Shape,"(8036, 721, 1440)","(25, 721, 1440)"
Dask graph,322 chunks in 3 graph layers,322 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [10]:
# Save IDed Events to `zarr` for more efficient parallel I/O

file_name = scratch_dir / 'extreme_events_basic_gridded.zarr'
extreme_events_basic_ds.to_zarr(file_name, mode='w')

<xarray.backends.zarr.ZarrStore at 0x1552079eb250>