# Identify & Track Marine Heatwaves using `spot_the_blOb`

## Processing Steps:
1. Fill holes in the binary data, using `dask_image.ndmorph` -- up to `R_fill` cells in radius.
2. Filter out small objects -- area less than the `area_filter_quartile` of the distribution of objects.
3. Identify objects in the binary data, using `dask_image.ndmeasure`.
4. Manually connect objects across time, applying Sun et al. 2023 criteria:
    - Connected Blobs must overlap by at least `overlap_threshold=50%` of the smaller blob.
    - Merged Blobs retain their original ID, but split the blob based on parent centroid locality.
5. Cluster and reduce the final object ID graph using `scipy.sparse.csgraph.connected_components`.

N.B.: Exploits parallelised `Dask` operations with optimised chunking using `flox` for memory efficiency and speed \
N.N.B.: This example using 40 years of Daily outputs at 0.25° resolution takes ~6 minutes on 128 total cores.

In [1]:
import xarray as xr
import dask
from getpass import getuser
from pathlib import Path

import spot_the_blOb as blob
import spot_the_blOb.helper as hpc

In [2]:
# Start Dask Cluster
client = hpc.StartLocalCluster(n_workers=32, n_threads=2)

Memory per Worker: 7.86 GB
Hostname is  l20134
Forward Port = l20134:8787
Dashboard Link: localhost:8787/status


In [3]:
# Load Pre-processed Data (cf. `01_preprocess_extremes.ipynb`)

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'extreme_events_binary.zarr'
chunk_size = {'time': 25, 'lat': -1, 'lon': -1}
ds = xr.open_zarr(str(file_name), chunks=chunk_size)

In [4]:
# Extract Binary Features and Modify Mask

extreme_bin = ds.extreme_events.isel(time=slice(0, 100))
mask = ds.mask.where((ds.lat<85) & (ds.lat>-90), other=False)

In [5]:
# Tracking Parameters

drop_area_quartile = 0.5
filling_radius = 8
allow_merging = True
centroid_partitioning = False


In [6]:
# Spot the Blobs

tracker = blob.Spotter(extreme_bin, mask, R_fill=filling_radius, area_filter_quartile=drop_area_quartile, 
                       allow_merging=allow_merging, centroid_partitioning=centroid_partitioning)
blobs = tracker.run()

blobs

  unique_ids_by_time = xr.apply_ufunc(
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repe

Total Object Area: 7086628
Number of Initial Blobs: 1958
Area Cutoff Threshold: 936.5
Rejected Area Fraction: 0.0633728763524768
Total Blobs Tracked: 90
Total Merging Events: 161
Multi-Parent Merging Events: 41


Unnamed: 0,Array,Chunk
Bytes,396.06 MiB,99.01 MiB
Shape,"(100, 721, 1440)","(25, 721, 1440)"
Dask graph,4 chunks in 9 graph layers,4 chunks in 9 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 396.06 MiB 99.01 MiB Shape (100, 721, 1440) (25, 721, 1440) Dask graph 4 chunks in 9 graph layers Data type int32 numpy.ndarray",1440  721  100,

Unnamed: 0,Array,Chunk
Bytes,396.06 MiB,99.01 MiB
Shape,"(100, 721, 1440)","(25, 721, 1440)"
Dask graph,4 chunks in 9 graph layers,4 chunks in 9 graph layers
Data type,int32 numpy.ndarray,int32 numpy.ndarray


In [None]:
data_bin_filled = tracker.fill_holes()

In [None]:
data_bin_filtered, area_threshold, blob_areas, N_blobs_unfiltered = tracker.filter_small_blobs(data_bin_filled)

In [None]:
data_bin = data_bin_filtered

In [None]:
blob_id_field, _ = tracker.identify_blobs(data_bin, time_connectivity=False)

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Calculate Properties of each Blob
blob_props = tracker.calculate_blob_properties(blob_id_field, properties=['area', 'centroid'])


In [None]:

# Compile List of Overlapping Blob ID Pairs Across Time
overlap_blobs_list = tracker.find_overlapping_blobs(blob_id_field)  # List of overlapping blob pairs


In [None]:
split_merged_blob_id_field_unique, merged_blobs_props, split_merged_blobs_list, merge_events = tracker.split_and_merge_blobs(blob_id_field, blob_props, overlap_blobs_list)


In [None]:
# Cluster Blobs List to Determine Globally Unique IDs & Update Blob ID Field
split_merged_blobs_ds = tracker.cluster_rename_blobs_and_props(split_merged_blob_id_field_unique, merged_blobs_props, split_merged_blobs_list)

In [None]:
split_merged_blobs_ds = xr.merge([split_merged_blobs_ds, merge_events])

# Add summary attributes 
split_merged_blobs_ds.attrs['total_merges'] = len(merge_events.merge_ID)
split_merged_blobs_ds.attrs['multi_parent_merges'] = (merge_events.merge_n_parents > 2).sum().item()

# Count Number of Blobs (This may have increased due to splitting)
N_blobs = split_merged_blobs_ds.ID_field.max().compute().data

In [None]:
split_merged_blobs_ds

In [None]:
blobs.attrs

In [None]:
# Save Tracked Blobs

file_name = Path('/scratch') / getuser()[0] / getuser() / 'mhws' / 'MHWs_tracked.nc'
blobs.to_netcdf(file_name, mode='w')