# Getting Started with fastcpd: Part 2 - Detecting Change Points

This tutorial shows how to use fastcpd to detect change points in time series data.

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from fastcpd import fastcpd
from fastcpd.datasets import make_mean_change, make_glm_change, make_garch_change

%matplotlib inline

## Basic Usage

The `fastcpd()` function detects change points. You need to specify:
1. Your data
2. The model `family`
3. Optional: penalty parameter `beta`

## 1. Mean Changes

Simplest case - detecting when the mean shifts.

In [None]:
# Generate data
data_dict = make_mean_change(n_samples=300, n_changepoints=3, seed=42)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect change points
result = fastcpd(data, family="mean", beta="MBIC")

print("True change points:", true_cps)
print("Detected change points:", result.cp_set)

# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data, linewidth=0.8, label='Data')
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6, label='True' if cp == true_cps[0] else '')
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2, alpha=0.8, label='Detected' if cp == result.cp_set[0] else '')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Mean Change Detection')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 2. Variance Changes

Detecting when volatility/variance changes.

In [None]:
from fastcpd.datasets import make_variance_change

# Generate data
data_dict = make_variance_change(n_samples=300, n_changepoints=2, seed=42)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="variance", beta="MBIC")

print("True:", true_cps)
print("Detected:", result.cp_set)

## 3. Regression (with Covariates)

When you have predictors/features and want to detect coefficient changes.

In [None]:
from fastcpd.datasets import make_regression_change

# Generate data
data_dict = make_regression_change(n_samples=300, n_changepoints=2, n_features=3, seed=42)
data = data_dict['data']  # Shape: (n, features+1) - first column is response
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="lm", beta="BIC")

print("True:", true_cps)
print("Detected:", result.cp_set)

# Visualize response variable
plt.figure(figsize=(12, 4))
plt.plot(data[:, 0], linewidth=0.8)
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6)
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2)
plt.xlabel('Time')
plt.ylabel('Response')
plt.title('Regression: Coefficient Changes')
plt.grid(True, alpha=0.3)
plt.show()

## 4. GLM: Binomial (Binary/Count Data)

For logistic regression or binomial data.

In [None]:
# Generate binary data
data_dict = make_glm_change(n_samples=300, n_changepoints=2, n_features=3, family='binomial', seed=42)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="binomial", beta="MBIC")

print("True:", true_cps)
print("Detected:", result.cp_set)

# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data[:, 0], 'o', markersize=2, alpha=0.6)
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6)
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2)
plt.xlabel('Time')
plt.ylabel('Binary Response')
plt.title('Binomial GLM Detection')
plt.grid(True, alpha=0.3)
plt.show()

## 5. GLM: Poisson (Count Data)

For count/rate data.

In [None]:
# Generate count data
data_dict = make_glm_change(n_samples=300, n_changepoints=2, n_features=3, family='poisson', seed=42)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="poisson", beta="MBIC")

print("True:", true_cps)
print("Detected:", result.cp_set)

## 6. ARMA Time Series

In [None]:
from fastcpd.datasets import make_arma_change

# Generate ARMA data
data_dict = make_arma_change(n_samples=300, n_changepoints=2, seed=42)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="arma", beta="MBIC", order=[1, 1])

print("True:", true_cps)
print("Detected:", result.cp_set)

## 7. GARCH (Volatility Models)

In [None]:
# Generate GARCH data
data_dict = make_garch_change(n_samples=300, n_changepoints=2, seed=42)
data = data_dict['data']
true_cps = data_dict['changepoints']

# Detect
result = fastcpd(data, family="garch", beta="MBIC", order=[1, 1])

print("True:", true_cps)
print("Detected:", result.cp_set)

# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data, linewidth=0.6)
for cp in true_cps:
    plt.axvline(cp, color='green', linestyle='--', alpha=0.6, label='True' if cp == true_cps[0] else '')
for cp in result.cp_set:
    plt.axvline(cp, color='red', linestyle=':', linewidth=2, label='Detected' if cp == result.cp_set[0] else '')
plt.xlabel('Time')
plt.ylabel('Returns')
plt.title('GARCH Volatility Change Detection')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Tuning Parameters

### Beta (Penalty Parameter)

Controls how many change points are detected:
- **Higher beta** → Fewer change points (more conservative)
- **Lower beta** → More change points (more sensitive)

Options:
- `"MBIC"` - Modified BIC (recommended)
- `"BIC"` - Standard BIC
- `"MDL"` - Minimum Description Length
- Numeric value (e.g., `10.0`)

In [None]:
# Generate data
data_dict = make_mean_change(n_samples=300, n_changepoints=3, seed=42)
data = data_dict['data']

# Try different beta values
for beta in ["MBIC", "BIC", 5.0, 20.0]:
    result = fastcpd(data, family="mean", beta=beta)
    print(f"Beta={beta:6s}: {len(result.cp_set)} change points detected at {result.cp_set}")

### Vanilla Percentage (Speed vs Accuracy)

Controls the trade-off between speed and accuracy:
- `vanilla_percentage=0.0` - Pure SeGD (faster, approximate)
- `vanilla_percentage=1.0` - Pure PELT (slower, exact)
- `vanilla_percentage=0.5` - Hybrid (balanced)

In [None]:
# For small datasets (n < 500): use vanilla_percentage=1.0 for highest accuracy
result = fastcpd(data, family="mean", beta="MBIC", vanilla_percentage=1.0)

# For larger datasets: use vanilla_percentage=0.5 for speed
# result = fastcpd(data, family="mean", beta="MBIC", vanilla_percentage=0.5)

## Understanding Results

The result object contains:

In [None]:
result = fastcpd(data, family="mean", beta="MBIC")

print("Detected change points:", result.cp_set)
print("Number of segments:", len(result.cp_set) + 1)
print("\nAll attributes:")
print(dir(result))

## Summary

**Model families available:**
- `"mean"` - Mean changes
- `"variance"` - Variance changes
- `"meanvariance"` - Both mean and variance
- `"lm"` - Linear regression
- `"lasso"` - LASSO regression
- `"binomial"` - Logistic/binomial GLM
- `"poisson"` - Poisson GLM
- `"arma"` - ARMA time series
- `"garch"` - GARCH volatility

**Basic syntax:**
```python
result = fastcpd(data, family="mean", beta="MBIC")
change_points = result.cp_set
```

---

**Next**: Part 3 - Evaluating and Visualizing Results