# Identify & Track Marine Heatwaves on _Unstructured Grid_ using `spot_the_blOb`

## Processing Steps:
1. Fill spatial holes in the binary data, using `dask_image.ndmorph` -- up to `R_fill` cells in radius.
2. Fill gaps in time -- permitting up to `T_fill` missing time slices, while keeping the same blob ID.
3. Filter out small objects -- area less than the bottom `area_filter_quartile` of the size distribution of objects.
4. Identify objects in the binary data, using `dask_image.ndmeasure`.
5. Connect objects across time, applying the following criteria for splitting, merging, and persistence:
    - Connected Blobs must overlap by at least fraction `overlap_threshold` of the smaller blob.
    - Merged Blobs retain their original ID, but partition the child blob based on the parent of the _nearest-neighbour_ cell. 
6. Cluster and reduce the final object ID graph using `scipy.sparse.csgraph.connected_components`.
7. Map the tracked objects into ID-time space for convenient analysis.

N.B.: Exploits parallelised `dask` operations with optimised chunking using `flox` for memory efficiency and speed \
N.N.B.: This example using 40 years of _daily_ outputs at 5km resolution on an Unstructured Grid (15 million cells) using 32 cores takes 
- Full Split/Merge Thresholding & Merge Tracking:  ~40 minutes

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import spot_the_blOb as blob
import spot_the_blOb.helper as hpc

In [2]:
# Start Dask Cluster
client = hpc.StartLocalCluster(n_workers=32, n_threads=2)

Memory per Worker: 7.86 GB
Hostname is  l10072
Forward Port = l10072:8787
Dashboard Link: localhost:8787/status


In [3]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary_unstruct.zarr'
chunk_size = {'time': 4, 'ncells': -1}
ds = xr.open_zarr(str(file_name), chunks={}).isel(time=slice(0,128)).chunk(chunk_size)

In [4]:
# Tracking Parameters

drop_area_quartile = 0.8  # Remove the smallest 80% of the identified blobs
hole_filling_radius = 32  # Fill small holes with radius < 32 elements, i.e. ~100 km
time_gap_fill = 2         # Allow gaps of 2 days and still continue the blob tracking with the same ID
allow_merging = True      # Allow blobs to split/merge. Keeps track of merge events & unique IDs.
overlap_threshold = 0.5   # Overlap threshold for merging blobs. If overlap < threshold, blobs keep independent IDs.
nn_partitioning = True    # Use new NN method to partition merged children blobs. If False, reverts to old method of Di Sun et al. 2023.

In [5]:
# SpOt & Track the Blobs & Merger Events

tracker = blob.Spotter(ds.extreme_events, ds.mask, R_fill=hole_filling_radius, T_fill = time_gap_fill, area_filter_quartile=drop_area_quartile, 
                       allow_merging=allow_merging, overlap_threshold=overlap_threshold, nn_partitioning=nn_partitioning, 
                       xdim='ncells',                 # Need to tell spot_the_blOb the new Unstructured dimension
                       unstructured_grid=True,        # Use Unstructured Grid
                       neighbours=ds.neighbours,      # Connectivity array for the Unstructured Grid Cells
                       cell_areas=ds.cell_areas,      # Cell areas for each Unstructured Grid Cell
                       debug=0)                       # Choose Debugging Level (max=2)

blobs = tracker.run(return_merges=False)

blobs

Finished Constructing the Sparse Dilation Matrix.
Finished Filling Spatio-temporal Holes.
Finished Filtering Small Blobs.
Finished Blob Identification.
Finished Calculating Blob Properties.




Finished Finding Overlapping Blobs.
Processing Parallel Iteration 1 with 27 Merging Blobs...
Processing Parallel Iteration 2 with 41 Merging Blobs...
Processing Parallel Iteration 3 with 24 Merging Blobs...
Processing Parallel Iteration 4 with 12 Merging Blobs...
Processing Parallel Iteration 5 with 11 Merging Blobs...
Processing Parallel Iteration 6 with 7 Merging Blobs...
Processing Parallel Iteration 7 with 3 Merging Blobs...




Finished Splitting and Merging Blobs.
Finished Clustering and Renaming Blobs.
Finished Tracking All Blobs ! 


Tracking Statistics:
   Binary Hobday to Processed Area Fraction: 0.7908875002768747
   Total Object Area IDed (cells): 114869236
   Number of Initial Pre-Filtered Blobs: 7137
   Area Cutoff Threshold (cells): 19067
   Accepted Area Fraction: 0.785646228203346
   Total Blobs Tracked: 114
   Total Merging Events Recorded: 198


Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 113.57 MiB 113.57 MiB Shape (14886338,) (14886338,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",14886338  1,

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 113.57 MiB 113.57 MiB Shape (14886338,) (14886338,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",14886338  1,

Unnamed: 0,Array,Chunk
Bytes,113.57 MiB,113.57 MiB
Shape,"(14886338,)","(14886338,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.10 GiB,227.15 MiB
Shape,"(128, 14886338)","(4, 14886338)"
Dask graph,32 chunks in 1 graph layer,32 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 7.10 GiB 227.15 MiB Shape (128, 14886338) (4, 14886338) Dask graph 32 chunks in 1 graph layer Data type int32 numpy.ndarray",14886338  128,

Unnamed: 0,Array,Chunk
Bytes,7.10 GiB,227.15 MiB
Shape,"(128, 14886338)","(4, 14886338)"
Dask graph,32 chunks in 1 graph layer,32 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,57.00 kiB,5.79 kiB
Shape,"(128, 114)","(13, 114)"
Dask graph,16 chunks in 1 graph layer,16 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 57.00 kiB 5.79 kiB Shape (128, 114) (13, 114) Dask graph 16 chunks in 1 graph layer Data type float32 numpy.ndarray",114  128,

Unnamed: 0,Array,Chunk
Bytes,57.00 kiB,5.79 kiB
Shape,"(128, 114)","(13, 114)"
Dask graph,16 chunks in 1 graph layer,16 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,114.00 kiB,5.79 kiB
Shape,"(2, 128, 114)","(1, 13, 114)"
Dask graph,32 chunks in 1 graph layer,32 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 114.00 kiB 5.79 kiB Shape (2, 128, 114) (1, 13, 114) Dask graph 32 chunks in 1 graph layer Data type float32 numpy.ndarray",114  128  2,

Unnamed: 0,Array,Chunk
Bytes,114.00 kiB,5.79 kiB
Shape,"(2, 128, 114)","(1, 13, 114)"
Dask graph,32 chunks in 1 graph layer,32 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,228.00 kiB,7.12 kiB
Shape,"(128, 114, 4)","(4, 114, 4)"
Dask graph,32 chunks in 1 graph layer,32 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 228.00 kiB 7.12 kiB Shape (128, 114, 4) (4, 114, 4) Dask graph 32 chunks in 1 graph layer Data type int32 numpy.ndarray",4  114  128,

Unnamed: 0,Array,Chunk
Bytes,228.00 kiB,7.12 kiB
Shape,"(128, 114, 4)","(4, 114, 4)"
Dask graph,32 chunks in 1 graph layer,32 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [15]:
# blobs = blobs.compute() 

In [16]:
# Save Tracked Blobs to `zarr` for more efficient parallel I/O

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'MHWs_tracked_unstruct.zarr'
blobs.to_zarr(file_name, mode='w')

<xarray.backends.zarr.ZarrStore at 0x7ff6f078aec0>