# SPPT Advanced Examples

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yunusserhat/sppt/blob/main/notebooks/02_advanced_examples.ipynb)

This notebook covers all modes and features of the `sppt` package:

1. **Percentage mode** (default) — spatial distribution comparison
2. **Count mode** — absolute value comparison
3. **Fixed base** — bootstrap only the test variable
4. **Custom confidence levels** — 90%, 95%, 99%
5. **Single variable** — CI bounds without comparison
6. **Multiple variable pairs** — batch analysis
7. **Export results** — Shapefile, GeoPackage, CSV
8. **Publication-quality maps**

---

**Author:** Yunus Serhat Bıçakçı  
**Based on:** R package by [Martin A. Andresen](https://github.com/martin-a-andresen/sppt.aggregated.data)

In [None]:
!pip install sppt -q

In [None]:
from sppt import sppt, load_sample_data, create_bivariate_map, create_publication_map
import pandas as pd

data = load_sample_data()
print(f"Dataset: {data.shape[0]} spatial units, {data.shape[1]} columns")
print(f"Crime variables: TFV={data['TFV'].sum():.0f}, TOV={data['TOV'].sum():.0f}, "
      f"THEFT={data['THEFT'].sum():.0f}, MISCHIEF={data['MISCHIEF'].sum():.0f}")

---
## 1. Percentage Mode (Default)

Compares the **spatial distributions** (percentages summing to 100%) of two variables. Use this when total counts differ between variables — it answers: *"Is crime distributed the same way across space?"*

In [None]:
result_pct = sppt(
    data=data,
    group_col="DAUID",
    count_col=["TFV", "TOV"],
    B=200,
    check_overlap=True,
    use_percentages=True,    # default
    create_maps=False,
    seed=171717,
)

print(f"\nS-Index:        {result_pct.s_index:.4f}")
print(f"Robust S-Index: {result_pct.robust_s_index:.4f}")

---
## 2. Count Mode

Compares **absolute counts** rather than percentages. Use when totals are similar or when absolute differences matter.

In [None]:
result_cnt = sppt(
    data=data,
    group_col="DAUID",
    count_col=["TFV", "TOV"],
    B=200,
    check_overlap=True,
    use_percentages=False,   # absolute counts
    create_maps=False,
    seed=171717,
)

print(f"\nS-Index (counts):        {result_cnt.s_index:.4f}")
print(f"Robust S-Index (counts): {result_cnt.robust_s_index:.4f}")

In [None]:
# Compare: percentage vs count mode
comparison = pd.DataFrame({
    "Mode": ["Percentages", "Counts"],
    "S-Index": [result_pct.s_index, result_cnt.s_index],
    "Robust S-Index": [result_pct.robust_s_index, result_cnt.robust_s_index],
})
comparison

---
## 3. Fixed Base Variable

When `fix_base=True`, the first (base) variable is **not bootstrapped** — its confidence interval collapses to a single point (the exact percentage or count). Only the second (test) variable is randomized.

**Use case:** Comparing known official statistics (census) against uncertain estimates.

In [None]:
result_fixed = sppt(
    data=data,
    group_col="DAUID",
    count_col=["TFV", "TOV"],
    B=200,
    check_overlap=True,
    fix_base=True,           # don't bootstrap TFV
    create_maps=False,
    seed=171717,
)

# Verify: base variable has L == U (point estimate)
print("Base variable (TFV) has identical L and U:")
print(f"  TFV_L == TFV_U: {(result_fixed.data['TFV_L'] == result_fixed.data['TFV_U']).all()}")
print(f"\nS-Index (fixed base):        {result_fixed.s_index:.4f}")
print(f"Robust S-Index (fixed base): {result_fixed.robust_s_index:.4f}")

---
## 4. Custom Confidence Levels

A wider confidence interval (99%) is more conservative — fewer areas will show significant change. A narrower interval (90%) is more sensitive.

In [None]:
results = {}
for cl in [0.90, 0.95, 0.99]:
    r = sppt(
        data=data, group_col="DAUID", count_col=["TFV", "TOV"],
        B=200, check_overlap=True, create_maps=False,
        conf_level=cl, seed=171717,
    )
    results[cl] = r

conf_comparison = pd.DataFrame({
    "Confidence Level": ["90%", "95%", "99%"],
    "S-Index": [results[0.90].s_index, results[0.95].s_index, results[0.99].s_index],
    "Robust S-Index": [results[0.90].robust_s_index, results[0.95].robust_s_index, results[0.99].robust_s_index],
})
conf_comparison

---
## 5. Single Variable Analysis

You can also bootstrap a single variable to get confidence intervals without comparison:

In [None]:
result_single = sppt(
    data=data,
    group_col="DAUID",
    count_col="TFV",          # single variable
    B=200,
    create_maps=False,
    seed=42,
)

# Show CI bounds for first 5 DAs
result_single.data[["DAUID", "TFV", "TFV_L", "TFV_U"]].head()

---
## 6. Multiple Variable Pairs

Compare several pairs of crime types systematically:

In [None]:
pairs = [
    ("TFV", "TOV"),
    ("TFV", "THEFT"),
    ("TFV", "MISCHIEF"),
    ("TOV", "THEFT"),
    ("TOV", "MISCHIEF"),
    ("THEFT", "MISCHIEF"),
]

pair_results = []
for base, test in pairs:
    r = sppt(
        data=data, group_col="DAUID", count_col=[base, test],
        B=200, check_overlap=True, create_maps=False, seed=171717,
    )
    pair_results.append({
        "Base": base, "Test": test,
        "S-Index": round(r.s_index, 4),
        "Robust S-Index": round(r.robust_s_index, 4),
    })

pd.DataFrame(pair_results)

---
## 7. Export Results

Save results to disk in various formats:

In [None]:
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
    # Export as CSV (drops geometry)
    result_csv = sppt(
        data=data, group_col="DAUID", count_col=["TFV", "TOV"],
        B=200, check_overlap=True, create_maps=False, seed=171717,
        export_results=True, export_format="csv",
        export_results_dir=tmpdir,
    )

    # Show exported file
    for f in os.listdir(tmpdir):
        size = os.path.getsize(os.path.join(tmpdir, f))
        print(f"  {f}  ({size:,} bytes)")

Supported formats:

| Format | Extension | Geometry | Best for |
|--------|-----------|----------|----------|
| `"shp"` | `.shp` | ✅ | GIS software (ArcGIS, QGIS) |
| `"gpkg"` | `.gpkg` | ✅ | Modern GIS, large datasets |
| `"csv"` | `.csv` | ❌ | Excel, statistical software |
| `"txt"` | `.txt` | ❌ | Plain text |
| `"pickle"` | `.pkl` | ✅ | Python workflows |

---
## 8. Publication-Quality Maps

In [None]:
# Standard map (gray / white / black) — matches R default
create_bivariate_map(
    result_pct.data,
    count_col=["TFV", "TOV"],
)

In [None]:
# Publication map (blue / white / red) — matches R ggplot2 version
create_publication_map(
    result_pct.data,
    count_col=["TFV", "TOV"],
)

### Custom Map with Matplotlib

You can also build fully custom maps using the result GeoDataFrame directly:

In [None]:
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm, ListedColormap
from matplotlib.patches import Patch

fig, axes = plt.subplots(1, 3, figsize=(20, 6))

configs = [
    ("Percentages (default)", result_pct),
    ("Counts mode", result_cnt),
    ("Fixed base", result_fixed),
]

cmap = ListedColormap(["#CCCCCC", "white", "black"])
norm = BoundaryNorm([-1.5, -0.5, 0.5, 1.5], cmap.N)

for ax, (title, res) in zip(axes, configs):
    res.data.plot(
        column="SIndex_Bivariate", cmap=cmap, norm=norm,
        edgecolor="#4D4D4D", linewidth=0.2, ax=ax, legend=False,
    )
    ax.set_title(f"{title}\nS={res.s_index:.3f}, RS={res.robust_s_index:.3f}",
                 fontsize=11)
    ax.set_axis_off()

# Shared legend
legend_elements = [
    Patch(facecolor="#CCCCCC", edgecolor="black", label="TFV > TOV"),
    Patch(facecolor="white", edgecolor="black", label="No significant change"),
    Patch(facecolor="black", edgecolor="black", label="TOV > TFV"),
]
fig.legend(handles=legend_elements, loc="lower center", ncol=3, fontsize=11,
           frameon=False, bbox_to_anchor=(0.5, -0.02))

plt.suptitle("SPPT Comparison: Three Modes", fontsize=14, fontweight="bold", y=1.02)
plt.tight_layout()
plt.show()

---
## Summary

| Feature | Parameter | Default |
|---------|-----------|--------|
| Compare distributions | `use_percentages=True` | ✅ |
| Compare absolute counts | `use_percentages=False` | |
| Bootstrap both variables | `fix_base=False` | ✅ |
| Fix base, bootstrap test only | `fix_base=True` | |
| 95% confidence | `conf_level=0.95` | ✅ |
| Compute S-Index | `check_overlap=True` | |
| Auto-generate map | `create_maps=True` | ✅ |
| Export to file | `export_results=True` | |

For more information, see the [GitHub repository](https://github.com/yunusserhat/sppt) and the original [R package](https://github.com/martin-a-andresen/sppt.aggregated.data).