# Identify & Track Marine Heatwaves using `spot_the_blOb`

## Processing Steps:
1. Fill holes in the binary data, using `dask_image.ndmorph` -- up to `R_fill` cells in radius.
2. Filter out small objects -- area less than the bottom `area_filter_quartile` of the size distribution of objects.
3. Identify objects in the binary data, using `dask_image.ndmeasure`.
4. Manually connect objects across time, applying the following criteria for splitting, merging, and persistence:
    - Connected Blobs must overlap by at least fraction `overlap_threshold` of the smaller blob.
    - Merged Blobs retain their original ID, but partition the child blob based on the parent of the _nearest-neighbour_ cell. 
5. Cluster and reduce the final object ID graph using `scipy.sparse.csgraph.connected_components`.
6. Map the tracked objects into ID-time space for convenient analysis.

N.B.: Exploits parallelised `dask` operations with optimised chunking using `flox` for memory efficiency and speed \
N.N.B.: This example using 40 years of _daily_ outputs at 0.25° resolution on 32 cores takes 
- Standard (i.e. Scannell et al., which involves no merge/split criteria or tracking):  ~2 minutes
- Full Split/Merge Thresholding & Merger Tracking:  ~1.5 hours

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import spot_the_blOb as blob
import spot_the_blOb.helper as hpc

In [2]:
# Start Dask Cluster
client = hpc.StartLocalCluster(n_workers=32, n_threads=2)

Memory per Worker: 15.74 GB
Hostname is  l40225
Forward Port = l40225:8787
Dashboard Link: localhost:8787/status


In [3]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary.zarr'
chunk_size = {'time': 25, 'lat': -1, 'lon': -1}
ds = xr.open_zarr(str(file_name), chunks=chunk_size)

In [4]:
# Extract Binary Features and Modify Mask

extreme_bin = ds.extreme_events#.isel(time=slice(0, 2000))
mask = ds.mask.where((ds.lat<85) & (ds.lat>-90), other=False)

In [5]:
# Tracking Parameters

drop_area_quartile = 0.5
filling_radius = 8
allow_merging = True     # Allow blobs to split/merge. Keeps track of merge events & unique IDs.
overlap_threshold = 0.5  # Overlap threshold for merging blobs. If overlap < threshold, blobs keep independent IDs.
nn_partitioning = True   # Use new NN method to partition merged children blobs. If False, reverts to old method of Di Sun et al. 2023...

In [None]:
# Spot the Blobs

tracker = blob.Spotter(extreme_bin, mask, R_fill=filling_radius, area_filter_quartile=drop_area_quartile, 
                       allow_merging=allow_merging, overlap_threshold=overlap_threshold, nn_partitioning=nn_partitioning)
blobs = tracker.run()
blobs

Finished filling holes.
Finished filtering small blobs.
Finished blob identification.
Finished calculating blob properties.
Finished finding overlapping blobs.


  result = blockwise(


Processing splitting and merging in chunk 0 of 556
Processing splitting and merging in chunk 25 of 556
Missing newly created child_ids {151981} because parents have split/morphed in the meantime...
Processing splitting and merging in chunk 50 of 556
Missing newly created child_ids {153469} because parents have split/morphed in the meantime...
Processing splitting and merging in chunk 75 of 556
Missing newly created child_ids {154794} because parents have split/morphed in the meantime...
Processing splitting and merging in chunk 100 of 556
Processing splitting and merging in chunk 125 of 556
Missing newly created child_ids {157565} because parents have split/morphed in the meantime...
Processing splitting and merging in chunk 150 of 556
Processing splitting and merging in chunk 175 of 556
Missing newly created child_ids {160172} because parents have split/morphed in the meantime...
Processing splitting and merging in chunk 200 of 556
Processing splitting and merging in chunk 225 of 556


In [None]:
# Save Tracked Blobs

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'MHWs_tracked.nc'
blobs.to_netcdf(file_name, mode='w')