# GU003 – StarDist Single Patch Cell Feature Extraction

## Purpose
This notebook computes **cell-level geometric and shape features** from a **single image patch**, using **StarDist-generated GeoJSON** as input.

Each polygon in the GeoJSON corresponds to one segmented cell.  
The output is a **cell-by-feature table (CSV / Excel-ready)**, where:
- rows = individual cell instances
- columns = geometric, shape, and spatial features

This notebook implements **Step 3** of the GU003 StarDist pipeline.

---

## Pipeline Context (GU003)
1. Patch extraction from WSI  
2. StarDist cell segmentation  
3. Export cell instances as GeoJSON  
4. **(This notebook)** GeoJSON → cell-level feature extraction → CSV

---

## Input
- GeoJSON file generated from:
  `GU003_StarDist_single_patch_geojson.ipynb`
- Each GeoJSON feature represents one cell polygon
- Coordinates are assumed to be in **patch pixel coordinates**

---

## Output
- A CSV file where:
  - each row corresponds to one cell (cell_001, cell_002, …)
  - each column corresponds to a predefined feature
- The CSV structure is designed to be directly compatible with Excel and downstream analysis.

---

## Notes
- This notebook operates on **a single patch** for validation and QC.
- Multi-patch / WSI-scale processing will be implemented in a separate notebook.


Step 1: Define task scope and feature table schema


In [2]:
# ================================
# Step 1: Define feature table schema
# ================================

# This step defines the column structure of the cell-level feature table.
# No files are read and no computation is performed.

FEATURE_COLUMNS = [
    "Cell_ID",
    "area",
    "perimeter",
    "centroid_x",
    "centroid_y",
    "bbox_min_row",
    "bbox_min_col",
    "bbox_max_row",
    "bbox_max_col",
    "equivalent_diameter",
    "solidity",
    "extent",
    "eccentricity",
    "orientation",
    "major_axis_length",
    "minor_axis_length",
    "inertia_tensor_eigval1",
    "inertia_tensor_eigval2",
    "perimeter_area_ratio",
    "distance_to_center",
    "distance_to_nn",
]

print("Number of features:", len(FEATURE_COLUMNS))
FEATURE_COLUMNS


Number of features: 21


['Cell_ID',
 'area',
 'perimeter',
 'centroid_x',
 'centroid_y',
 'bbox_min_row',
 'bbox_min_col',
 'bbox_max_row',
 'bbox_max_col',
 'equivalent_diameter',
 'solidity',
 'extent',
 'eccentricity',
 'orientation',
 'major_axis_length',
 'minor_axis_length',
 'inertia_tensor_eigval1',
 'inertia_tensor_eigval2',
 'perimeter_area_ratio',
 'distance_to_center',
 'distance_to_nn']

Step 2: Imports and path setup


In [3]:
# ================================
# Step 2: Imports and Path Setup
# ================================

# ---- Core libraries ----
import os
import json
import numpy as np
import pandas as pd

# ---- Geometry / image processing ----
from shapely.geometry import shape
from skimage.draw import polygon
from skimage.measure import regionprops

# ---- Google Drive (Colab) ----
from google.colab import drive
drive.mount('/content/drive')

# ---- Project base directory ----
BASE_DIR = "/content/drive/MyDrive/wj165/GU_Projects/GU003_StarDist"

# ---- GeoJSON input directory ----
GEOJSON_DIR = os.path.join(BASE_DIR, "outputs")

# ---- GeoJSON files (confirmed inputs) ----
GEOJSON_POLYGON_ONLY = os.path.join(
    GEOJSON_DIR, "stardist_patch_instances.geojson"
)

GEOJSON_WITH_CENTROID = os.path.join(
    GEOJSON_DIR, "stardist_patch_instances_with_centroid.geojson"
)

# ---- Sanity check ----
print("Base dir exists:", os.path.exists(BASE_DIR))
print("Polygon-only GeoJSON exists:", os.path.exists(GEOJSON_POLYGON_ONLY))
print("With-centroid GeoJSON exists:", os.path.exists(GEOJSON_WITH_CENTROID))


Mounted at /content/drive
Base dir exists: True
Polygon-only GeoJSON exists: True
With-centroid GeoJSON exists: True


Step 3: Load GeoJSON and inspect structure


In [4]:
# ================================
# Step 3: Load GeoJSON and Inspect Structure
# ================================

# Load GeoJSON (use with-centroid version by default)
with open(GEOJSON_WITH_CENTROID, "r") as f:
    geojson_data = json.load(f)

# Extract features
features = geojson_data.get("features", [])

print("Number of cell instances:", len(features))

# Inspect the first cell
first_cell = features[0]

print("\nGeometry type:", first_cell["geometry"]["type"])
print("Number of polygon vertices:",
      len(first_cell["geometry"]["coordinates"][0]))

print("\nProperty keys:")
print(list(first_cell.get("properties", {}).keys()))


Number of cell instances: 32

Geometry type: Polygon
Number of polygon vertices: 97

Property keys:
['instance_id', 'area_px', 'centroid']


Step 4: Convert one cell polygon to binary mask


In [5]:
# ================================
# Step 4: Convert One Cell Polygon to Binary Mask
# ================================

# Select the first cell polygon
cell0 = features[0]
coords = cell0["geometry"]["coordinates"][0]

# Define patch size (adjust only if needed)
PATCH_H = 1024
PATCH_W = 1024

# Convert polygon to binary mask
rr, cc = polygon(
    [p[1] for p in coords],  # y coordinates
    [p[0] for p in coords],  # x coordinates
    shape=(PATCH_H, PATCH_W)
)

mask = np.zeros((PATCH_H, PATCH_W), dtype=np.uint8)
mask[rr, cc] = 1

print("Mask shape:", mask.shape)
print("Mask area (pixel count):", mask.sum())


Mask shape: (1024, 1024)
Mask area (pixel count): 422


Step 5: Compute regionprops for one cell


In [6]:
# ================================
# Step 5: Compute Regionprops for One Cell
# ================================

from skimage.measure import regionprops

# Compute region properties
props = regionprops(mask)[0]

print("Regionprops area:", props.area)
print("Regionprops perimeter:", props.perimeter)
print("Regionprops centroid (row, col):", props.centroid)

# Compare with GeoJSON properties (if available)
geo_props = cell0.get("properties", {})

print("\nGeoJSON area_px:", geo_props.get("area_px"))
print("GeoJSON centroid:", geo_props.get("centroid"))


Regionprops area: 422.0
Regionprops perimeter: 76.18376618407356
Regionprops centroid (row, col): (np.float64(211.99289099526067), np.float64(15.210900473933648))

GeoJSON area_px: 422
GeoJSON centroid: {'x': 15.257731958762887, 'y': 212.08762886597938}


Step 6: Extract regionprops features for all cells


In [7]:
# ================================
# Step 6: Extract Regionprops Features for All Cells
# ================================

rows = []

PATCH_H = 1024
PATCH_W = 1024

for i, cell in enumerate(features):
    coords = cell["geometry"]["coordinates"][0]

    # Polygon -> mask
    rr, cc = polygon(
        [p[1] for p in coords],  # y
        [p[0] for p in coords],  # x
        shape=(PATCH_H, PATCH_W)
    )

    mask = np.zeros((PATCH_H, PATCH_W), dtype=np.uint8)
    mask[rr, cc] = 1

    props = regionprops(mask)[0]

    row = {
        "Cell_ID": f"cell_{i+1:03d}",
        "area": props.area,
        "perimeter": props.perimeter,
        "centroid_x": props.centroid[1],
        "centroid_y": props.centroid[0],
        "bbox_min_row": props.bbox[0],
        "bbox_min_col": props.bbox[1],
        "bbox_max_row": props.bbox[2],
        "bbox_max_col": props.bbox[3],
        "equivalent_diameter": props.equivalent_diameter,
        "solidity": props.solidity,
        "extent": props.extent,
        "eccentricity": props.eccentricity,
        "orientation": props.orientation,
        "major_axis_length": props.major_axis_length,
        "minor_axis_length": props.minor_axis_length,
        "inertia_tensor_eigval1": props.inertia_tensor_eigvals[0],
        "inertia_tensor_eigval2": props.inertia_tensor_eigvals[1],
        "perimeter_area_ratio": props.perimeter / props.area,
        "distance_to_center": None,
        "distance_to_nn": None,
    }

    rows.append(row)

df_cells = pd.DataFrame(rows)
df_cells.head()


Unnamed: 0,Cell_ID,area,perimeter,centroid_x,centroid_y,bbox_min_row,bbox_min_col,bbox_max_row,bbox_max_col,equivalent_diameter,...,extent,eccentricity,orientation,major_axis_length,minor_axis_length,inertia_tensor_eigval1,inertia_tensor_eigval2,perimeter_area_ratio,distance_to_center,distance_to_nn
0,cell_001,422.0,76.183766,15.2109,211.992891,202,3,224,28,23.179885,...,0.767273,0.662128,1.108917,26.834025,20.109178,45.004056,25.273689,0.18053,,
1,cell_002,405.0,74.526912,234.012346,172.283951,160,224,185,245,22.708193,...,0.771429,0.6856,0.48121,26.632816,19.388124,44.331682,23.493711,0.184017,,
2,cell_003,404.0,72.769553,196.059406,230.868812,220,186,244,208,22.680141,...,0.765152,0.35344,0.157545,23.466685,21.952078,34.417833,30.118358,0.180123,,
3,cell_004,394.0,73.112698,174.248731,228.34264,216,165,241,185,22.397687,...,0.788,0.707837,0.423139,26.680787,18.84665,44.491526,22.199763,0.185565,,
4,cell_005,351.0,67.112698,49.444444,186.438746,177,39,197,61,21.140177,...,0.797727,0.381662,-1.209084,21.990079,20.32547,30.222725,25.820294,0.191204,,


Step 7: Save cell-level feature table to CSV


In [8]:
# ================================
# Step 7: Save Feature Table to CSV
# ================================

# Define output path
OUTPUT_CSV = os.path.join(
    BASE_DIR, "outputs", "cell_features_single_patch.csv"
)

# Save DataFrame
df_cells.to_csv(OUTPUT_CSV, index=False)

print("Saved CSV to:", OUTPUT_CSV)
print("CSV shape:", df_cells.shape)

# Optional: download to local machine (Colab)
from google.colab import files
files.download(OUTPUT_CSV)


Saved CSV to: /content/drive/MyDrive/wj165/GU_Projects/GU003_StarDist/outputs/cell_features_single_patch.csv
CSV shape: (32, 21)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>