# Identify & Track Marine Heatwaves on _Unstructured Grid_ using `spot_the_blOb`

## Processing Steps:
1. Fill spatial holes in the binary data, using `dask_image.ndmorph` -- up to `R_fill` cells in radius.
2. Fill gaps in time -- permitting up to `T_fill` missing time slices, while keeping the same blob ID.
3. Filter out small objects -- area less than the bottom `area_filter_quartile` of the size distribution of objects.
4. Identify objects in the binary data, using `dask_image.ndmeasure`.
5. Connect objects across time, applying the following criteria for splitting, merging, and persistence:
    - Connected Blobs must overlap by at least fraction `overlap_threshold` of the smaller blob.
    - Merged Blobs retain their original ID, but partition the child blob based on the parent of the _nearest-neighbour_ cell. 
6. Cluster and reduce the final object ID graph using `scipy.sparse.csgraph.connected_components`.
7. Map the tracked objects into ID-time space for convenient analysis.

N.B.: Exploits parallelised `dask` operations with optimised chunking using `flox` for memory efficiency and speed \
N.N.B.: This example using 40 years of _daily_ outputs at 5km resolution on an Unstructured Grid (15 million cells) using 32 cores takes 
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import spot_the_blOb as blob
import spot_the_blOb.helper as hpc

In [None]:
# Start Dask Cluster
#  N.B.: Need ~ 8 GB per worker (for 5km data // 15 million points)
client = hpc.StartLocalCluster(n_workers=50, n_threads=1)


Memory per Worker: 10.07 GB
Hostname is  l40350
Forward Port = l40350:8787
Dashboard Link: localhost:8787/status


2025-03-05 16:08:59,209 - distributed.scheduler - ERROR - Task ('time-process_time_group-846d5b50c23b7d5eb4885116cf269579-1f7ddd71154233697df67d71c18a2113', 0) marked as failed because 4 workers died while trying to run it
2025-03-05 16:08:59,210 - distributed.scheduler - ERROR - Task ('time-process_time_group-846d5b50c23b7d5eb4885116cf269579-6b37991365f0d07b047e76e1464c30dd', 13) marked as failed because 4 workers died while trying to run it
2025-03-05 16:08:59,210 - distributed.scheduler - ERROR - Task ('time-process_time_group-846d5b50c23b7d5eb4885116cf269579-7bba9799d68c5f2399e4f36f6fc472fa', 9) marked as failed because 4 workers died while trying to run it
2025-03-05 16:08:59,216 - distributed.scheduler - ERROR - Task ('time-process_time_group-846d5b50c23b7d5eb4885116cf269579-cc7ddd918ab3590237fe88344e573ca4', 8) marked as failed because 4 workers died while trying to run it
2025-03-05 16:08:59,224 - distributed.scheduler - ERROR - Task ('time-process_time_group-846d5b50c23b7d5eb4

In [3]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary_unstruct.zarr'
chunk_size = {'time': 4, 'ncells': -1}
ds = xr.open_zarr(str(file_name), chunks={}).isel(time=slice(0, 64)).chunk(chunk_size)

In [4]:
# Tracking Parameters

drop_area_quartile = 0.8  # Remove the smallest 80% of the identified blobs
hole_filling_radius = 32  # Fill small holes with radius < 32 elements, i.e. ~100 km
time_gap_fill = 2         # Allow gaps of 4 days and still continue the blob tracking with the same ID
allow_merging = True      # Allow blobs to split/merge. Keeps track of merge events & unique IDs.
overlap_threshold = 0.5   # Overlap threshold for merging blobs. If overlap < threshold, blobs keep independent IDs.
nn_partitioning = True    # Use new NN method to partition merged children blobs. If False, reverts to old method of Di Sun et al. 2023.

In [5]:
# SpOt & Track the Blobs & Merger Events

tracker = blob.Spotter(ds.extreme_events, ds.mask, R_fill=hole_filling_radius, T_fill = time_gap_fill, area_filter_quartile=drop_area_quartile, 
                       allow_merging=allow_merging, overlap_threshold=overlap_threshold, nn_partitioning=nn_partitioning, 
                       temp_dir='/scratch/b/b382615/mhws/TEMP/', # Temporary Scratch Directory for Dask
                       xdim='ncells',                 # Need to tell spot_the_blOb the new Unstructured dimension
                       unstructured_grid=True,        # Use Unstructured Grid
                       neighbours=ds.neighbours,      # Connectivity array for the Unstructured Grid Cells
                       cell_areas=ds.cell_areas,      # Cell areas for each Unstructured Grid Cell
                       verbosity=1)                   # Choose Verbosity Level (0=None, 1=Basic, 2=Advanced/Timing)

blobs = tracker.run(return_merges=False)

blobs

Finished Constructing the Sparse Dilation Matrix.
Finished Filling Spatial Holes
Finished Filling Spatio-temporal Holes.
Finished Filtering Small Blobs.
Finished Blob Identification.
Finished Making Blobs Globally Unique.
Finished Calculating Blob Properties.
Finished Finding Overlapping Blobs.
Processing Parallel Iteration 1 with 11 Merging Blobs...
Processing Parallel Iteration 2 with 9 Merging Blobs...
Processing Parallel Iteration 3 with 2 Merging Blobs...
Finished Splitting and Merging Blobs.
Finished Clustering and Renaming Blobs.
Finished Tracking All Blobs ! 


Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.7691209876330218
   Total Object Area IDed (cells): 59759044
   Number of Initial Pre-Filtered Blobs: 3405
   Area Cutoff Threshold (cells): 19152
   Accepted Area Fraction: 0.7985329049105939
   Total Blobs Tracked: 84
   Total Merging Events Recorded: 38


Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 113.57 MiB 113.57 MiB Shape (14886338,) (14886338,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",14886338  1,

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 113.57 MiB 113.57 MiB Shape (14886338,) (14886338,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",14886338  1,

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.55 GiB,56.79 MiB
Shape,"(64, 14886338)","(1, 14886338)"
Dask graph,64 chunks in 4 graph layers,64 chunks in 4 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 3.55 GiB 56.79 MiB Shape (64, 14886338) (1, 14886338) Dask graph 64 chunks in 4 graph layers Data type int32 numpy.ndarray",14886338  64,

Unnamed: 0,Array,Chunk
Bytes,3.55 GiB,56.79 MiB
Shape,"(64, 14886338)","(1, 14886338)"
Dask graph,64 chunks in 4 graph layers,64 chunks in 4 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,21.25 kiB,340 B
Shape,"(64, 85)","(1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 21.25 kiB 340 B Shape (64, 85) (1, 85) Dask graph 64 chunks in 2 graph layers Data type int32 numpy.ndarray",85  64,

Unnamed: 0,Array,Chunk
Bytes,21.25 kiB,340 B
Shape,"(64, 85)","(1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,21.25 kiB,340 B
Shape,"(64, 85)","(1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 21.25 kiB 340 B Shape (64, 85) (1, 85) Dask graph 64 chunks in 2 graph layers Data type float32 numpy.ndarray",85  64,

Unnamed: 0,Array,Chunk
Bytes,21.25 kiB,340 B
Shape,"(64, 85)","(1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,42.50 kiB,680 B
Shape,"(2, 64, 85)","(2, 1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 42.50 kiB 680 B Shape (2, 64, 85) (2, 1, 85) Dask graph 64 chunks in 2 graph layers Data type float32 numpy.ndarray",85  64  2,

Unnamed: 0,Array,Chunk
Bytes,42.50 kiB,680 B
Shape,"(2, 64, 85)","(2, 1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.31 kiB,85 B
Shape,"(64, 85)","(1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 5.31 kiB 85 B Shape (64, 85) (1, 85) Dask graph 64 chunks in 2 graph layers Data type bool numpy.ndarray",85  64,

Unnamed: 0,Array,Chunk
Bytes,5.31 kiB,85 B
Shape,"(64, 85)","(1, 85)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,680 B,680 B
Shape,"(85,)","(85,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 680 B 680 B Shape (85,) (85,) Dask graph 1 chunks in 1 graph layer Data type datetime64[ns] numpy.ndarray",85  1,

Unnamed: 0,Array,Chunk
Bytes,680 B,680 B
Shape,"(85,)","(85,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,680 B,680 B
Shape,"(85,)","(85,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 680 B 680 B Shape (85,) (85,) Dask graph 1 chunks in 1 graph layer Data type datetime64[ns] numpy.ndarray",85  1,

Unnamed: 0,Array,Chunk
Bytes,680 B,680 B
Shape,"(85,)","(85,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,63.75 kiB,1.00 kiB
Shape,"(64, 85, 3)","(1, 85, 3)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 63.75 kiB 1.00 kiB Shape (64, 85, 3) (1, 85, 3) Dask graph 64 chunks in 2 graph layers Data type int32 numpy.ndarray",3  85  64,

Unnamed: 0,Array,Chunk
Bytes,63.75 kiB,1.00 kiB
Shape,"(64, 85, 3)","(1, 85, 3)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [6]:
# Save Tracked Blobs to `zarr` for more efficient parallel I/O
file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'MHWs_tracked_unstruct_5.zarr'
blobs.to_zarr(file_name, mode='w')

<xarray.backends.zarr.ZarrStore at 0x154f503d3760>