# Getting Started with fastcpd: Part 2 - Detecting Change Points

This tutorial shows how to use fastcpd to detect change points in time series data.

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from fastcpd import fastcpd
from fastcpd.datasets import make_mean_change, make_glm_change, make_garch_change

%matplotlib inline

## Basic Usage

The `fastcpd()` function detects change points. You need to specify:
1. Your data
2. The model `family`
3. Optional: penalty parameter `beta`

## 1. Mean Changes

Simplest case - detecting when the mean shifts.

In [None]:
# Generate data with STRONG mean changes for clear detection
# Specify: change points at [100, 200, 250] and large mean shifts (5 std devs)
data_dict = make_mean_change(
    n_samples=300, 
    n_changepoints=3,
    mean_deltas=[5.0],  # 5 std deviation shifts - very clear signal
    seed=42
)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect change points
result = fastcpd(data, family="mean", beta="MBIC")

print("True change points:", true_cps)
print("Detected change points:", result.cp_set)
print(f"SNR: {data_dict['metadata']['snr_db']:.1f} dB (higher is easier to detect)")

# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data, linewidth=0.8, label='Data')
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6, label='True' if cp == true_cps[0] else '')
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2, alpha=0.8, label='Detected' if cp == result.cp_set[0] else '')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Mean Change Detection')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 2. Variance Changes

Detecting when volatility/variance changes.

In [None]:
from fastcpd.datasets import make_variance_change

# Generate data with LARGE variance changes for clear detection
# Use variance ratios of [1.0, 4.0, 0.5] for strong contrast
data_dict = make_variance_change(
    n_samples=300, 
    n_changepoints=2,
    variance_ratios=[1.0, 4.0, 0.5],  # Strong variance changes
    seed=42
)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="variance", beta="MBIC")

print("True:", true_cps)
print("Detected:", result.cp_set)
print(f"Variance ratios: {data_dict['metadata']['variance_ratios']}")

## 3. Regression (with Covariates)

When you have predictors/features and want to detect coefficient changes.

In [None]:
from fastcpd.datasets import make_regression_change

# Generate data
data_dict = make_regression_change(n_samples=300, n_changepoints=2, n_features=3, seed=42)
data = data_dict['data']  # Shape: (n, features+1) - first column is response
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="lm", beta="BIC")

print("True:", true_cps)
print("Detected:", result.cp_set)

# Visualize response variable
plt.figure(figsize=(12, 4))
plt.plot(data[:, 0], linewidth=0.8)
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6)
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2)
plt.xlabel('Time')
plt.ylabel('Response')
plt.title('Regression: Coefficient Changes')
plt.grid(True, alpha=0.3)
plt.show()

## 4. GLM: Binomial (Binary/Count Data)

For logistic regression or binomial data.

In [None]:
# Generate binary data with STRONG coefficient changes
# Use larger coefficients for better separation
data_dict = make_glm_change(
    n_samples=300, 
    n_changepoints=2, 
    n_features=3, 
    family='binomial',
    coef_changes='sign_flip',  # Clear coefficient pattern
    seed=42
)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect - use lower beta for GLM
result = fastcpd(data, family="binomial", beta=5.0)

print("True:", true_cps)
print("Detected:", result.cp_set)
print(f"AUC per segment: {[f'{a:.2f}' if a else 'N/A' for a in data_dict['metadata']['separation_per_segment']]}")

# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data[:, 0], 'o', markersize=2, alpha=0.6)
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6)
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2)
plt.xlabel('Time')
plt.ylabel('Binary Response')
plt.title('Binomial GLM Detection')
plt.grid(True, alpha=0.3)
plt.show()

## 5. Controlling Data Generation

You can control signal strength to make changes easier or harder to detect.

## Controlling Data Generation

You can control signal strength to make changes easier or harder to detect:

**Mean changes:**
```python
# Weak signal (hard to detect)
data_dict = make_mean_change(n_samples=300, n_changepoints=2, mean_deltas=[1.0])

# Strong signal (easy to detect)
data_dict = make_mean_change(n_samples=300, n_changepoints=2, mean_deltas=[5.0])
```

**Variance changes:**
```python
# Subtle changes
data_dict = make_variance_change(n_samples=300, variance_ratios=[1.0, 1.5, 2.0])

# Strong changes  
data_dict = make_variance_change(n_samples=300, variance_ratios=[1.0, 4.0, 0.5])
```

**GLM (Binomial):**
```python
# 'sign_flip' creates clear coefficient pattern (easier to detect)
data_dict = make_glm_change(n_samples=300, family='binomial', coef_changes='sign_flip')
```

**Tip:** Use stronger signals when learning. Once familiar, try weaker signals to test robustness.

In [None]:
# Generate data with STRONG signal
data_dict = make_mean_change(n_samples=300, n_changepoints=3, mean_deltas=[5.0], seed=42)
data = data_dict['data']

# Try different beta values
for beta in ["MBIC", "BIC", 5.0, 20.0]:
    result = fastcpd(data, family="mean", beta=beta)
    beta_str = str(beta) if isinstance(beta, (int, float)) else beta
    print(f"Beta={beta_str:>6s}: {len(result.cp_set)} change points detected at {result.cp_set}")

### Vanilla Percentage (Speed vs Accuracy)

Controls the trade-off between speed and accuracy:
- `vanilla_percentage=0.0` - Pure SeGD (faster, approximate)
- `vanilla_percentage=1.0` - Pure PELT (slower, exact)
- `vanilla_percentage=0.5` - Hybrid (balanced)

In [None]:
# For small datasets (n < 500): use vanilla_percentage=1.0 for highest accuracy
result = fastcpd(data, family="mean", beta="MBIC", vanilla_percentage=1.0)

# For larger datasets: use vanilla_percentage=0.5 for speed
# result = fastcpd(data, family="mean", beta="MBIC", vanilla_percentage=0.5)

## Understanding Results

The result object contains:

In [None]:
result = fastcpd(data, family="mean", beta="MBIC")

print("Detected change points:", result.cp_set)
print("Number of segments:", len(result.cp_set) + 1)
print("\nAll attributes:")
print(dir(result))

## Summary

**Core model families covered in this tutorial:**
- `"mean"` - Mean changes
- `"variance"` - Variance changes
- `"meanvariance"` - Both mean and variance
- `"lm"` - Linear regression
- `"binomial"` - Logistic/binomial GLM

**Additional families available** (see advanced examples):
- `"lasso"` - LASSO regression
- `"poisson"` - Poisson GLM  
- `"garch"` - GARCH volatility
- `"arma"` - ARMA time series

**Basic syntax:**
```python
result = fastcpd(data, family="mean", beta="MBIC")
change_points = result.cp_set
```

**Best practice**: Use automatic penalty selection (`beta="MBIC"` or `beta="BIC"`) for reliable results.

---

**Next**: Part 3 - Evaluating and Visualizing Results