# Distance-based metric of cluster boundary detection

This notebook implements a set of metrics evaluating the precision of cluster boundary detection as compared to manually drawn areas of expected morphotopes. It captures the distance to the nearest boundary determined by clustering to the discretised boundary of each target morphotope. It reports statistical summary per each morphotope.

In [37]:
import geopandas as gpd
import shapely
import pandas as pd
import matplotlib.pyplot as plt

Load the data and link them together.

In [None]:
tessellation = gpd.read_parquet("/data/uscuni-ulce/processed_data/tessellations/tessellation_69300.parquet")
clusters = pd.read_csv("/data/uscuni-ulce/processed_data/clusters/clusters_69300.csv", index_col=0)

tessellation = tessellation[tessellation.index > -1]

tessellation["cluster"] = clusters.loc[:, "0"]

Load expected morphotopes.

In [6]:
morphotopes = gpd.read_file('morphotopes.geojson').to_crs(tessellation.crs)

Generate boundaries of clusters.

In [7]:
%%time
cluster_boundaries = tessellation.dissolve("cluster")

CPU times: user 22.9 s, sys: 43.8 ms, total: 23 s
Wall time: 23 s


Due to the floating point issues, do a small buffer to avoid artifacts.

In [8]:
boundaries = cluster_boundaries.buffer(1e-6).boundary

Get points along the morphotope boundary.

In [9]:
coords = morphotopes.segmentize(10).get_coordinates(index_parts=True)
morphotopes_points = coords.set_geometry(gpd.points_from_xy(*coords.values.T), crs=morphotopes.crs)

Get the distance to the nearest from each point.

In [16]:
%%time
_, dist = boundaries.sindex.nearest(morphotopes_points.geometry, return_distance=True, max_distance=500, return_all=False)
morphotopes_points["distance"] = dist

CPU times: user 10.5 s, sys: 0 ns, total: 10.5 s
Wall time: 10.5 s


Generate the indicators of boundary detection precision per morphotope.

In [27]:
indicators = morphotopes_points.groupby(level=0)["distance"].describe().set_geometry(morphotopes.geometry)

Explore the result.

In [None]:
m = cluster_boundaries.explore(prefer_canvas=True, opacity=.5, tiles="cartodb positron")
indicators.explore("mean", m=m)
boundaries.explore(m=m, color="red")

In [None]:
f, axs = plt.subplots(1, 2, figsize=(40, 20))
indicators.plot("mean",ax=axs[0],  legend=True)
indicators.plot("std", ax=axs[1], legend=True)

As a function:

In [None]:
def boundary_distance_metric(tessellation, clusters, morphotopes, segmentation_distance=10):
    tessellation = tessellation[tessellation.index > -1]
    tessellation["cluster"] = clusters.loc[:, "0"]
    cluster_boundaries = tessellation.dissolve("cluster")
    boundaries = cluster_boundaries.buffer(1e-6).boundary
    coords = morphotopes.segmentize(10).get_coordinates(index_parts=True)
    morphotopes_points = coords.set_geometry(gpd.points_from_xy(*coords.values.T), crs=morphotopes.crs)
    _, dist = boundaries.sindex.nearest(morphotopes_points.geometry, return_distance=True, max_distance=500, return_all=False)
    morphotopes_points["distance"] = dist
    return morphotopes_points.groupby(level=0)["distance"].describe().set_geometry(morphotopes.geometry)