# Clustering of nodes

This notebook is an exploration of clustering of nodes based on their immediate spatial context. The idea is that this would identify subtypes of issues that need to be solved. Whether it will actualy work, no one knows. We also have no idea which variables shall be measured.

In [40]:
import geopandas as gpd
import momepy as mm
from bokeh.io import output_notebook
from bokeh.plotting import show
from clustergram import Clustergram
from sklearn.preprocessing import scale

output_notebook()

In [3]:
streets = gpd.read_parquet("../data/1133/original/1133.parquet")
streets.shape

(86101, 5)

Remove nodes of a degree 2.

In [4]:
cleaned = mm.remove_false_nodes(streets)

  return GeometryArray(data, crs=_get_common_crs(to_concat))


In [7]:
cleaned.shape

(46952, 5)

That helped. Now we will characterise only nodes that are interesting in some way.

1. Use networkx-based characterisation of nodes based on their ego graphs as available currently in momepy. This takes a bit (30 mins on my machine).

In [18]:
nx_graph = mm.gdf_to_nx(cleaned, approach="primal", length="length")
nx_graph = mm.node_degree(nx_graph, name="degree")
nx_graph = mm.subgraph(nx_graph, radius=3, length="length")

  0%|          | 0/31730 [00:00<?, ?it/s]

Get nodes to a GeoDataFrame.

In [19]:
nodes = mm.nx_to_gdf(nx_graph, lines=False)

Get some clustering.

Scale the data and fit clustergram using K-Means.

In [30]:
nodes["gamma"] = nodes.gamma.fillna(nodes.gamma.mean())

In [32]:
data = scale(nodes.drop(columns=["geometry"]))
cgram = Clustergram(range(1, 10), n_init=100, verbose=True)
cgram.fit(data)

K=1 skipped. Mean computed from data directly.
K=2 fitted in 0.654 seconds.
K=3 fitted in 0.558 seconds.
K=4 fitted in 0.915 seconds.
K=5 fitted in 1.353 seconds.
K=6 fitted in 1.621 seconds.
K=7 fitted in 2.090 seconds.
K=8 fitted in 2.075 seconds.
K=9 fitted in 2.343 seconds.


Clustergram(k_range=range(1, 10), backend='sklearn', method='kmeans', kwargs={'n_init': 100})

Explore the clustergram.

In [33]:
fig = cgram.bokeh()
show(fig)

Clsutergram suggests 4 or 7 clusters. Let's see how it looks like on a map.

In [None]:
nodes["cluster"] = cgram.labels_[7]
m = cleaned.explore(tiles="openstreetmap hot", color="black", prefer_canvas=True)
nodes.explore("cluster", categorical=True, m=m)

Absolutely nothing of value came from this :D. Saving the data if you want to see the result yourself.

In [45]:
nodes[[str(c) for c in cgram.labels_.columns]] = cgram.labels_
nodes.to_file("../data/1133/node_clusters.gpkg")