# SPPT Quickstart — Spatial Point Pattern Test for Aggregated Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yunusserhat/sppt-python/blob/main/notebooks/01_quickstart.ipynb)

This notebook demonstrates the core functionality of the **`sppt`** Python package — a faithful reimplementation of the R package [`sppt.aggregated.data`](https://github.com/martin-a-andresen/sppt.aggregated.data) by [Martin A. Andresen](https://github.com/martin-a-andresen).

**What is SPPT?**

The Spatial Point Pattern Test compares spatial distributions of count data (e.g., crime across census tracts) using bootstrap resampling. It produces:
- **Confidence intervals** for each spatial unit
- **S-Index**: proportion of units with overlapping intervals (0 = complete change, 1 = no change)
- **SIndex_Bivariate**: per-unit direction of change (−1, 0, +1)

---

**Author:** Yunus Serhat Bıçakçı  
**Based on:** R package by Martin A. Andresen

## 1. Installation

Install `sppt` from PyPI (takes ~30 seconds on Colab):

In [None]:
!pip install sppt -q

## 2. Load Sample Data

The package ships with the **Vancouver Dissemination Areas Crime 2021** dataset — 1,019 census polygons with crime counts.

| Column | Description |
|--------|-------------|
| `DAUID` | Dissemination Area unique ID |
| `TFV` | Total Family Violence |
| `TOV` | Total Other Violence |
| `THEFT` | Theft counts |
| `MISCHIEF` | Mischief counts |
| `geometry` | Polygon boundaries |

In [None]:
from sppt import load_sample_data

data = load_sample_data()
print(f"Shape: {data.shape}")
print(f"CRS:   {data.crs}")
data.head()

## 3. Run the SPPT

Compare the spatial distribution of **Total Family Violence (TFV)** vs **Total Other Violence (TOV)** across Vancouver's Dissemination Areas.

This mirrors the R example exactly:
```r
result <- sppt(data, group_col="DAUID", count_col=c("TFV", "TOV"),
               B=200, check_overlap=TRUE, seed=171717)
```

In [None]:
from sppt import sppt

result = sppt(
    data=data,
    group_col="DAUID",
    count_col=["TFV", "TOV"],   # [base, test]
    B=200,
    conf_level=0.95,
    check_overlap=True,
    create_maps=False,            # we'll make our own below
    seed=171717,
    use_percentages=True,
    fix_base=False,
)

## 4. Inspect the Results

The result object contains both the augmented data and the S-Index metrics:

In [None]:
print(f"S-Index:        {result.s_index:.4f}")
print(f"Robust S-Index: {result.robust_s_index:.4f}")
print(f"Fix base:       {result.fix_base}")
print(f"Percentages:    {result.use_percentages}")

In [None]:
# View the key output columns
cols = ["DAUID", "TFV", "TOV", "TFV_L", "TFV_U", "TOV_L", "TOV_U",
        "intervals_overlap", "SIndex_Bivariate"]
result.data[cols].head(10)

In [None]:
# Distribution of bivariate S-Index values
print("SIndex_Bivariate value counts:")
print(result.data["SIndex_Bivariate"].value_counts().sort_index())
print(f"\nOverlap rate: {result.data['intervals_overlap'].mean():.1%}")

## 5. Map the Results

### Standard Map (gray / white / black)

Matches the R package's default `plot()` output:
- **Gray**: TFV > TOV (base greater)
- **White**: No significant difference
- **Black**: TOV > TFV (test greater)

In [None]:
from sppt import create_bivariate_map

create_bivariate_map(
    result.data,
    count_col=["TFV", "TOV"],
    export_maps=False,
)

### Publication-Quality Map (blue / white / red)

Mirrors the ggplot2 map from the R development script:

In [None]:
from sppt import create_publication_map

create_publication_map(
    result.data,
    count_col=["TFV", "TOV"],
)

## 6. Interpretation

| Metric | Value | Meaning |
|--------|-------|--------|
| S-Index ≈ 0.74 | 74% of spatial units show overlapping distributions | The spatial patterns of TFV and TOV are **largely similar** |
| Robust S-Index ≈ 0.73 | Slightly lower when excluding zero-count areas | Some zero-count DAs artificially inflate overlap |

The map reveals **where** the patterns diverge — black areas have significantly more TOV relative to TFV, gray areas the opposite.

---

**Next steps:** See the [Advanced Examples notebook](02_advanced_examples.ipynb) for fixed-base mode, count mode, custom confidence levels, and export options.