# Regional European Daily Event Analysis: Marine Heatwave ID & Tracking using `MarEx` Regional Mode

### `MarEx` Processing Pipeline for Regional European Datasets:

This example demonstrates **regional tracking** capabilities for high-resolution (0.05°) European marine extremes using `marEx.regional_tracker()`. Regional mode is specifically designed for focused geographical studies where coordinate system auto-detection may fail.

---

1. **Morphological Pre-Processing**
    - Performs binary morphological closing using `dask_image.ndmorph` to fill small spatial holes up to `R_fill` cells in radius 
    - Executes binary opening to remove isolated small features of order `R_fill`
    - Fills gaps in time to maintain event continuity for interruptions up to `T_fill` time steps
    - Filters out objects smaller than `area_filter_absolute` cells

2. **Blob Identification**
    - Labels spatially connected components using efficient connected-component algorithm in `dask_image.ndmeasure`
    - Computes blob properties (area, centroid, boundaries)

3. **Temporal Tracking**
    - Identifies blob overlaps between consecutive time frames
    - Connects objects across time, applying the following criteria for splitting, merging, & persistence:
        - Connected objects must overlap by at least fraction `overlap_threshold` of the smaller area
        - Merged objects retain their original ID, but partition the child area based on the parent of the _nearest-neighbour_ cell (or centroid distance)

4. **Graph Reduction & Finalisation**
    - Constructs the complete temporal graph of object evolution through time
    - Resolves object connectivity graph using `scipy.sparse.csgraph.connected_components`
    - Creates globally unique IDs for each tracked extreme event
    - Maps objects into efficient ID-time space for convenient analysis
    - Computes comprehensive statistics about the lifecycle of each event

The pipeline leverages **dask** for distributed parallel computation, enabling efficient processing of large datasets. \
A 40-year regional European daily (OSTIA) analysis at full 0.05° resolution on 32 cores takes
- Basic (i.e. Scannell et al., which involves no merge/split criteria or tracking):  ~3 minutes
- Full Split/Merge Thresholding & Merge Tracking:  ~15 minutes

In [1]:
from getpass import getuser
from pathlib import Path

import dask
import xarray as xr

import marEx
import marEx.helper as hpc

In [2]:
# Lustre Scratch Directory
scratch_dir = Path("/scratch") / getuser()[0] / getuser()

In [3]:
# Start Dask Cluster
client = hpc.start_local_cluster(
    n_workers=32, threads_per_worker=1, scratch_dir=scratch_dir / "clients"
)  # Specify temporary scratch directory for dask to use

Hostname: l40193
Forward Port: l40193:8787
Dashboard Link: localhost:8787/status


In [4]:
# Choose optimal chunk size & load data
#   N.B.: This is crucial for dask (not only for performance, but also to make the problem tractable)
#         The operations are eventually global-in-space, and so requires the spatial dimension to be contiguous/unchunked
#         We can adjust the chunk size in time depending on available system memory.

chunk_size = {"time": 25, "lat": -1, "lon": -1}

In [5]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = scratch_dir / "mhws" / "extremes_binary_regional_shifting_hobday.zarr"
ds = xr.open_zarr(str(file_name), chunks=chunk_size)

In [6]:
# Run Regional ID, Tracking, & Merging using marEx.regional_tracker()

# Key differences from global tracker:
# - Uses marEx.regional_tracker() instead of marEx.tracker()
# - Requires explicit coordinate_units specification
# - Optimised for regional coordinate systems

regional_tracker = marEx.regional_tracker(
    ds.extreme_events,
    ds.mask,
    coordinate_units="degrees",  # Explicit specification required for regional mode
    grid_resolution=0.05,  # Grid resolution in degrees, used to calculate the object areas on the globe
    area_filter_absolute=600,  # Remove objects smaller than 600 cells
    R_fill=16,  # Fill small holes with radius < 16 _cells_
    T_fill=4,  # Allow gaps of 4 days and still continue the event tracking with the same ID
    allow_merging=True,  # Allow extreme events to split/merge. Keeps track of merge events & unique IDs.
    overlap_threshold=0.25,  # Overlap threshold for merging events. If overlap > threshold, events merge, are partitioned, and are independently tracked
    nn_partitioning=True,  # Use new NN method to partition merged children areas. If False, reverts to old method of Di Sun et al. 2023.
    verbose=True,
)

extreme_events_ds, merges_ds = regional_tracker.run(return_merges=True)
extreme_events_ds

Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.7047647269550515
   Total Object Area IDed (cells): 331612330.0
   Number of Initial Pre-Filtered Objects: 40405
   Number of Final Filtered Objects: 34597
   Area Cutoff Threshold (cells): 600
   Accepted Area Fraction: 0.997835095576814
   Total Events Tracked: 4564
   Total Merging Events Recorded: 7360


Unnamed: 0,Array,Chunk
Bytes,35.96 GiB,3.97 MiB
Shape,"(9282, 800, 1300)","(1, 800, 1300)"
Dask graph,9282 chunks in 4 graph layers,9282 chunks in 4 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 35.96 GiB 3.97 MiB Shape (9282, 800, 1300) (1, 800, 1300) Dask graph 9282 chunks in 4 graph layers Data type int32 numpy.ndarray",1300  800  9282,

Unnamed: 0,Array,Chunk
Bytes,35.96 GiB,3.97 MiB
Shape,"(9282, 800, 1300)","(1, 800, 1300)"
Dask graph,9282 chunks in 4 graph layers,9282 chunks in 4 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,161.60 MiB,17.83 kiB
Shape,"(9282, 4564)","(1, 4564)"
Dask graph,9282 chunks in 2 graph layers,9282 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 161.60 MiB 17.83 kiB Shape (9282, 4564) (1, 4564) Dask graph 9282 chunks in 2 graph layers Data type int32 numpy.ndarray",4564  9282,

Unnamed: 0,Array,Chunk
Bytes,161.60 MiB,17.83 kiB
Shape,"(9282, 4564)","(1, 4564)"
Dask graph,9282 chunks in 2 graph layers,9282 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,161.60 MiB,17.83 kiB
Shape,"(9282, 4564)","(1, 4564)"
Dask graph,9282 chunks in 3 graph layers,9282 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 161.60 MiB 17.83 kiB Shape (9282, 4564) (1, 4564) Dask graph 9282 chunks in 3 graph layers Data type float32 numpy.ndarray",4564  9282,

Unnamed: 0,Array,Chunk
Bytes,161.60 MiB,17.83 kiB
Shape,"(9282, 4564)","(1, 4564)"
Dask graph,9282 chunks in 3 graph layers,9282 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,323.20 MiB,17.83 kiB
Shape,"(2, 9282, 4564)","(1, 1, 4564)"
Dask graph,18564 chunks in 7 graph layers,18564 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 323.20 MiB 17.83 kiB Shape (2, 9282, 4564) (1, 1, 4564) Dask graph 18564 chunks in 7 graph layers Data type float32 numpy.ndarray",4564  9282  2,

Unnamed: 0,Array,Chunk
Bytes,323.20 MiB,17.83 kiB
Shape,"(2, 9282, 4564)","(1, 1, 4564)"
Dask graph,18564 chunks in 7 graph layers,18564 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,40.40 MiB,4.46 kiB
Shape,"(9282, 4564)","(1, 4564)"
Dask graph,9282 chunks in 2 graph layers,9282 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 40.40 MiB 4.46 kiB Shape (9282, 4564) (1, 4564) Dask graph 9282 chunks in 2 graph layers Data type bool numpy.ndarray",4564  9282,

Unnamed: 0,Array,Chunk
Bytes,40.40 MiB,4.46 kiB
Shape,"(9282, 4564)","(1, 4564)"
Dask graph,9282 chunks in 2 graph layers,9282 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,35.66 kiB,35.66 kiB
Shape,"(4564,)","(4564,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 35.66 kiB 35.66 kiB Shape (4564,) (4564,) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",4564  1,

Unnamed: 0,Array,Chunk
Bytes,35.66 kiB,35.66 kiB
Shape,"(4564,)","(4564,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,35.66 kiB,35.66 kiB
Shape,"(4564,)","(4564,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 35.66 kiB 35.66 kiB Shape (4564,) (4564,) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",4564  1,

Unnamed: 0,Array,Chunk
Bytes,35.66 kiB,35.66 kiB
Shape,"(4564,)","(4564,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.37 GiB,267.42 kiB
Shape,"(9282, 4564, 15)","(1, 4564, 15)"
Dask graph,9282 chunks in 3 graph layers,9282 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 2.37 GiB 267.42 kiB Shape (9282, 4564, 15) (1, 4564, 15) Dask graph 9282 chunks in 3 graph layers Data type int32 numpy.ndarray",15  4564  9282,

Unnamed: 0,Array,Chunk
Bytes,2.37 GiB,267.42 kiB
Shape,"(9282, 4564, 15)","(1, 4564, 15)"
Dask graph,9282 chunks in 3 graph layers,9282 chunks in 3 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [7]:
merges_ds

In [8]:
# Save IDed/Tracked/Merged Events to `zarr` for more efficient parallel I/O

file_name = scratch_dir / "mhws" / "extreme_events_merged_regional_shifting.zarr"
extreme_events_ds.to_zarr(file_name, mode="w")

<xarray.backends.zarr.ZarrStore at 0x7ff77b41d510>

In [9]:
# Save Merges Dataset to netcdf
file_name = scratch_dir / "mhws" / "extreme_events_merged_regional_shifting_merges.nc"
merges_ds.to_netcdf(file_name, mode="w")

### Run Basic Tracking for Comparison
N.B.: This is the current standard method used in the literature, which involves _No_ temporal gap filling, _No_ merging/splitting and _No_ independent event tracking.

In [10]:
# Run Basic Regional Tracking

basic_regional_tracker = marEx.regional_tracker(
    ds.extreme_events,
    ds.mask,
    coordinate_units="degrees",  # Explicit specification required for regional mode
    area_filter_absolute=600,  # Remove objects smaller than 600 cells
    R_fill=16,  # Fill small holes with radius < 16 _cells_
    T_fill=0,  # No temporal hole filling
    allow_merging=False,  # Do not allow extreme events to split/merge. All touching events adopt the same ID forever (after _and_ before (!)).
)

extreme_events_basic_ds = basic_regional_tracker.run()
extreme_events_basic_ds

Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.8164409999213844
   Total Object Area IDed (cells): 286139612.0
   Number of Initial Pre-Filtered Objects: 41337
   Number of Final Filtered Objects: 34157
   Area Cutoff Threshold (cells): 600
   Accepted Area Fraction: 0.9982279559392148
   Total Events Tracked: 5513


Unnamed: 0,Array,Chunk
Bytes,35.96 GiB,3.97 MiB
Shape,"(9282, 800, 1300)","(1, 800, 1300)"
Dask graph,9282 chunks in 4 graph layers,9282 chunks in 4 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 35.96 GiB 3.97 MiB Shape (9282, 800, 1300) (1, 800, 1300) Dask graph 9282 chunks in 4 graph layers Data type int32 numpy.ndarray",1300  800  9282,

Unnamed: 0,Array,Chunk
Bytes,35.96 GiB,3.97 MiB
Shape,"(9282, 800, 1300)","(1, 800, 1300)"
Dask graph,9282 chunks in 4 graph layers,9282 chunks in 4 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [11]:
# Save IDed Events to `zarr` for more efficient parallel I/O

file_name = scratch_dir / "mhws" / "extreme_events_basic_regional_shifting.zarr"
extreme_events_basic_ds.to_zarr(file_name, mode="w")

<xarray.backends.zarr.ZarrStore at 0x7ff6f2ab1fc0>