# Identify & Track Marine Heatwaves on _Unstructured Grid_ using `spot_the_blOb`

## Processing Steps:
1. Fill spatial holes in the binary data, using `dask_image.ndmorph` -- up to `R_fill` cells in radius.
2. Fill gaps in time -- permitting up to `T_fill` missing time slices, while keeping the same blob ID.
3. Filter out small objects -- area less than the bottom `area_filter_quartile` of the size distribution of objects.
4. Identify objects in the binary data, using `dask_image.ndmeasure`.
5. Connect objects across time, applying the following criteria for splitting, merging, and persistence:
    - Connected Blobs must overlap by at least fraction `overlap_threshold` of the smaller blob.
    - Merged Blobs retain their original ID, but partition the child blob based on the parent of the _nearest-neighbour_ cell. 
6. Cluster and reduce the final object ID graph using `scipy.sparse.csgraph.connected_components`.
7. Map the tracked objects into ID-time space for convenient analysis.

N.B.: Exploits parallelised `dask` operations with optimised chunking using `flox` for memory efficiency and speed \
N.N.B.: This example using 40 years of _daily_ outputs at 5km resolution on an Unstructured Grid (15 million cells) using 32 cores takes 
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import spot_the_blOb as blob
import spot_the_blOb.helper as hpc

In [2]:
# Start Dask Cluster
client = hpc.StartLocalCluster(n_workers=32, n_threads=2)

Memory per Worker: 7.86 GB
Hostname is  l10049
Forward Port = l10049:8787
Dashboard Link: localhost:8787/status


2025-02-03 16:56:26,448 - distributed.scheduler - ERROR - Task ('original-xarray-xarray-time_start-55882338b7fc8bf06c92b55f50cb595b', 0) marked as failed because 4 workers died while trying to run it
2025-02-03 16:56:26,450 - distributed.scheduler - ERROR - Task ('xarray-<this-array>-c3e33610bfc04d9a164b417f203da30d', 5, 0) marked as failed because 4 workers died while trying to run it
2025-02-03 16:56:26,451 - distributed.scheduler - ERROR - Task ('xarray-presence-84e07ab46047e9d800456f0b28eed7d5', 1, 0) marked as failed because 4 workers died while trying to run it
2025-02-03 16:56:26,456 - distributed.scheduler - ERROR - Task ('array-7975d6d91640ce705b00e8ac05503d02', 0) marked as failed because 4 workers died while trying to run it
2025-02-03 16:56:26,473 - distributed.scheduler - ERROR - Task ('xarray-presence-84e07ab46047e9d800456f0b28eed7d5', 5, 0) marked as failed because 4 workers died while trying to run it
2025-02-03 16:56:26,475 - distributed.scheduler - ERROR - Task ('xarr

In [3]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary_unstruct.zarr'
chunk_size = {'time': 4, 'ncells': -1}
ds = xr.open_zarr(str(file_name), chunks={}).isel(time=slice(0,32)).chunk(chunk_size)

In [4]:
# Tracking Parameters

drop_area_quartile = 0.8  # Remove the smallest 80% of the identified blobs
hole_filling_radius = 32  # Fill small holes with radius < 32 elements, i.e. ~100 km
time_gap_fill = 2         # Allow gaps of 2 days and still continue the blob tracking with the same ID
allow_merging = True      # Allow blobs to split/merge. Keeps track of merge events & unique IDs.
overlap_threshold = 0.5   # Overlap threshold for merging blobs. If overlap < threshold, blobs keep independent IDs.
nn_partitioning = True    # Use new NN method to partition merged children blobs. If False, reverts to old method of Di Sun et al. 2023.

In [5]:
# SpOt & Track the Blobs & Merger Events

tracker = blob.Spotter(ds.extreme_events, ds.mask, R_fill=hole_filling_radius, T_fill = time_gap_fill, area_filter_quartile=drop_area_quartile, 
                       allow_merging=allow_merging, overlap_threshold=overlap_threshold, nn_partitioning=nn_partitioning, 
                       xdim='ncells',                 # Need to tell spot_the_blOb the new Unstructured dimension
                       unstructured_grid=True,        # Use Unstructured Grid
                       neighbours=ds.neighbours,      # Connectivity array for the Unstructured Grid Cells
                       cell_areas=ds.cell_areas,      # Cell areas for each Unstructured Grid Cell
                       debug=0,                       # Choose Debugging Level (max=2)
                       verbosity=3)                   # Choose Verbosity Level (0=None, 1=Basic, 2=Timing)

blobs = tracker.run(return_merges=False)

blobs

Finished Constructing the Sparse Dilation Matrix.
Finished Filling Spatial Holes
Finished Filling Spatio-temporal Holes.
Finished Filtering Small Blobs.
Finished Blob Identification.
Finished Making Blobs Globally Unique.
Finished Calculating Blob Properties.
Finished Finding Overlapping Blobs.
Processing Parallel Iteration 1 with 6 Merging Blobs...




  Finished Batch Processing Step.
  Finished Consolidation Step 1: Temporary ID Mapping
  Finished Consolidation Step 2: Data Field Update.
  Finished Consolidation Step 3: Merge List Dictionary Consolidation.
Processing Parallel Iteration 2 with 4 Merging Blobs...




  Finished Batch Processing Step.
  Finished Consolidation Step 1: Temporary ID Mapping
  Finished Consolidation Step 2: Data Field Update.
  Finished Consolidation Step 3: Merge List Dictionary Consolidation.
Finished Splitting and Merging Blobs.
Finished Clustering and Renaming Blobs.
Finished Tracking All Blobs ! 


Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.7365573740794455
   Total Object Area IDed (cells): 32497142
   Number of Initial Pre-Filtered Blobs: 1653
   Area Cutoff Threshold (cells): 21838
   Accepted Area Fraction: 0.8044298480155578
   Total Blobs Tracked: 34
   Total Merging Events Recorded: 21


Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 113.57 MiB 113.57 MiB Shape (14886338,) (14886338,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",14886338  1,

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 113.57 MiB 113.57 MiB Shape (14886338,) (14886338,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",14886338  1,

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.77 GiB,227.15 MiB
Shape,"(32, 14886338)","(4, 14886338)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 1.77 GiB 227.15 MiB Shape (32, 14886338) (4, 14886338) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",14886338  32,

Unnamed: 0,Array,Chunk
Bytes,1.77 GiB,227.15 MiB
Shape,"(32, 14886338)","(4, 14886338)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.25 kiB,544 B
Shape,"(32, 34)","(4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 4.25 kiB 544 B Shape (32, 34) (4, 34) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",34  32,

Unnamed: 0,Array,Chunk
Bytes,4.25 kiB,544 B
Shape,"(32, 34)","(4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.25 kiB,544 B
Shape,"(32, 34)","(4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 4.25 kiB 544 B Shape (32, 34) (4, 34) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",34  32,

Unnamed: 0,Array,Chunk
Bytes,4.25 kiB,544 B
Shape,"(32, 34)","(4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.50 kiB,1.06 kiB
Shape,"(2, 32, 34)","(2, 4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.50 kiB 1.06 kiB Shape (2, 32, 34) (2, 4, 34) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",34  32  2,

Unnamed: 0,Array,Chunk
Bytes,8.50 kiB,1.06 kiB
Shape,"(2, 32, 34)","(2, 4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.06 kiB,136 B
Shape,"(32, 34)","(4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 1.06 kiB 136 B Shape (32, 34) (4, 34) Dask graph 8 chunks in 1 graph layer Data type bool numpy.ndarray",34  32,

Unnamed: 0,Array,Chunk
Bytes,1.06 kiB,136 B
Shape,"(32, 34)","(4, 34)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,272 B,272 B
Shape,"(34,)","(34,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 272 B 272 B Shape (34,) (34,) Dask graph 1 chunks in 1 graph layer Data type datetime64[ns] numpy.ndarray",34  1,

Unnamed: 0,Array,Chunk
Bytes,272 B,272 B
Shape,"(34,)","(34,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,272 B,272 B
Shape,"(34,)","(34,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 272 B 272 B Shape (34,) (34,) Dask graph 1 chunks in 1 graph layer Data type datetime64[ns] numpy.ndarray",34  1,

Unnamed: 0,Array,Chunk
Bytes,272 B,272 B
Shape,"(34,)","(34,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,42.50 kiB,5.31 kiB
Shape,"(32, 34, 10)","(4, 34, 10)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 42.50 kiB 5.31 kiB Shape (32, 34, 10) (4, 34, 10) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",10  34  32,

Unnamed: 0,Array,Chunk
Bytes,42.50 kiB,5.31 kiB
Shape,"(32, 34, 10)","(4, 34, 10)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [7]:
blobs = blobs.compute() 

In [8]:
# Save Tracked Blobs to `zarr` for more efficient parallel I/O

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'MHWs_tracked_unstruct.zarr'
blobs.to_zarr(file_name, mode='w')

<xarray.backends.zarr.ZarrStore at 0x7ff6dc55dbc0>