# 1. Introduction

Treves Li

UC Berkeley

2025-12-09

Create 3D digital twin of hillslope and colluvium at the Big C. Clustering interpolated EM results.

# 2. Setup and Imports

In [6]:
# Standard libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from pathlib import Path

# Set root path
import sys
root_path = Path.cwd().parents[0]
sys.path.append(str(root_path / "code"))

# Modelling libraries
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN

# GIS helpers
# from common_utils.gis_helpers import em_inv2paraview, extract_elev_from_dem, filter_pred_intersect, format_csv2pointcloud
# from common_utils.emagpy_helpers import plot_inv_em_xsection
# from viz_helpers import voxelise_csv, plot_3d_voxel, plot_curtain_slice

# Plotting defaults
plt.rcParams['font.family'] = 'Helvetica'
plt.rcParams['xtick.direction'] = 'in'
plt.rcParams['ytick.direction'] = 'in'
plt.rcParams['xtick.top'] = True
plt.rcParams['ytick.right'] = True

In [4]:
df = gpd.read_file(root_path / "data" / "processed" / "20251123_em_processed" / "1437_processed_inv_doi_kriged_0pt5_cropped.csv")

# 3. Cluster Points to Classify Colluvium

For detecting colluvium, resistivity is the most important, then maybe `z`. If we also cluster by x/y coordinates, otherwise clustering may be biased by spatial spread.

We can still include `x` and `y`, if we scale features to make all features comparable, since Euclidean distance is dominated by the **largest-scale feature**. After scaling, all features should then contribute equally.

In [7]:
scaler = StandardScaler()
features = df[['x', 'y', 'z', 'resistivity']].copy()
features_scaled = scaler.fit_transform(features)

Clustering algorithms:

* **K-Means**: Not ideal, since colluvium may be elongated along slope
* **Gaussian**: may struggle with irregular shapes
* **DBSCAN**: density-based, good for irregular shapes, no need to predefine number of clusters, but needs careful tuning of parameters

In [None]:
# Fit DBSCAN
db = DBSCAN(eps=0.5, min_samples=10)
labels = db.fit_predict(features_scaled)
df['cluster'] = labels

# TODO: hyperparameter tuning
# TODO: how do set 3 clusters