# Identify & Track Marine Heatwaves on _Unstructured Grid_ using `spot_the_blOb`

## Processing Steps:
1. Fill spatial holes in the binary data, using `dask_image.ndmorph` -- up to `R_fill` cells in radius.
2. Fill gaps in time -- permitting up to `T_fill` missing time slices, while keeping the same blob ID.
3. Filter out small objects -- area less than the bottom `area_filter_quartile` of the size distribution of objects.
4. Identify objects in the binary data, using `dask_image.ndmeasure`.
5. Connect objects across time, applying the following criteria for splitting, merging, and persistence:
    - Connected Blobs must overlap by at least fraction `overlap_threshold` of the smaller blob.
    - Merged Blobs retain their original ID, but partition the child blob based on the parent of the _nearest-neighbour_ cell. 
6. Cluster and reduce the final object ID graph using `scipy.sparse.csgraph.connected_components`.
7. Map the tracked objects into ID-time space for convenient analysis.

N.B.: Exploits parallelised `dask` operations with optimised chunking using `flox` for memory efficiency and speed \
N.N.B.: This example using 40 years of _daily_ outputs at 5km resolution on an Unstructured Grid (15 million cells) using 32 cores takes 
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import spot_the_blOb as blob
import spot_the_blOb.helper as hpc

In [2]:
# Start Dask Cluster
#  N.B.: Need ~ 8 GB per worker (for 5km data // 15 million points)
client = hpc.StartLocalCluster(n_workers=64, n_threads=1)

Memory per Worker: 7.87 GB


Perhaps you already have a cluster running?
Hosting the HTTP server on port 35139 instead


Hostname is  l40287
Forward Port = l40287:35139
Dashboard Link: localhost:35139/status


In [3]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary_unstruct.zarr'
chunk_size = {'time': 4, 'ncells': -1}
ds = xr.open_zarr(str(file_name), chunks={}).isel(time=slice(0, 256)).chunk(chunk_size)

In [4]:
# Tracking Parameters

drop_area_quartile = 0.8  # Remove the smallest 80% of the identified blobs
hole_filling_radius = 32  # Fill small holes with radius < 32 elements, i.e. ~100 km
time_gap_fill = 4         # Allow gaps of 4 days and still continue the blob tracking with the same ID
allow_merging = True      # Allow blobs to split/merge. Keeps track of merge events & unique IDs.
overlap_threshold = 0.5   # Overlap threshold for merging blobs. If overlap < threshold, blobs keep independent IDs.
nn_partitioning = True    # Use new NN method to partition merged children blobs. If False, reverts to old method of Di Sun et al. 2023.

In [None]:
# SpOt & Track the Blobs & Merger Events

tracker = blob.Spotter(ds.extreme_events, ds.mask, R_fill=hole_filling_radius, T_fill = time_gap_fill, area_filter_quartile=drop_area_quartile, 
                       allow_merging=allow_merging, overlap_threshold=overlap_threshold, nn_partitioning=nn_partitioning, 
                       xdim='ncells',                 # Need to tell spot_the_blOb the new Unstructured dimension
                       unstructured_grid=True,        # Use Unstructured Grid
                       neighbours=ds.neighbours,      # Connectivity array for the Unstructured Grid Cells
                       cell_areas=ds.cell_areas,      # Cell areas for each Unstructured Grid Cell
                       verbosity=1)                   # Choose Verbosity Level (0=None, 1=Basic, 2=Advanced/Timing)

blobs = tracker.run(return_merges=False)

blobs

Finished Constructing the Sparse Dilation Matrix.
Finished Filling Spatial Holes
Finished Filling Spatio-temporal Holes.
Finished Filtering Small Blobs.
Finished Blob Identification.
Finished Making Blobs Globally Unique.
Finished Calculating Blob Properties.
Finished Finding Overlapping Blobs.
Processing Parallel Iteration 1 with 60 Merging Blobs...


Traceback (most recent call last):
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/distributed/sizeof.py", line 17, in safe_sizeof
    return sizeof(obj)
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/dask/utils.py", line 772, in __call__
    return meth(arg, *args, **kwargs)
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/dask/sizeof.py", line 96, in sizeof_python_dict
    + sizeof(list(d.values()))
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/dask/utils.py", line 772, in __call__
    return meth(arg, *args, **kwargs)
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/dask/sizeof.py", line 59, in sizeof_python_collection
    return sys.getsizeof(seq) + sum(map(sizeof, seq))
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/dask/utils.py", line 772, in __call__
    return meth(arg, *args, **kwargs)
  File "/home/b/b382615/opt/anaconda3/lib/python3.10/site-packages/dask/sizeo

Processing Parallel Iteration 2 with 44 Merging Blobs...
Processing Parallel Iteration 3 with 14 Merging Blobs...
Processing Parallel Iteration 4 with 7 Merging Blobs...


In [None]:
blobs = blobs.compute() 

In [None]:
# Save Tracked Blobs to `zarr` for more efficient parallel I/O

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'MHWs_tracked_unstruct.zarr'
blobs.to_zarr(file_name, mode='w')