# Conformal Prediction in APDTFlow v0.2.0

## Rigorous Uncertainty Quantification! 📊

This notebook demonstrates **conformal prediction** for calibrated prediction intervals with **finite-sample coverage guarantees**.

### What is Conformal Prediction?

A distribution-free method that provides:
- **Guaranteed coverage** (e.g., 95% of true values fall in the interval)
- **No distribution assumptions** required
- **Finite-sample guarantees** (not just asymptotic!)
- **Adaptive to non-stationary data**

### Why It Matters

- Traditional probabilistic forecasts can be **miscalibrated**
- Conformal prediction gives **GUARANTEED** coverage
- Critical for **decision-making** in business/healthcare/finance
- **Hottest topic** in 2025 time series research!

### Research References

- arXiv:2509.02844 - Conformal Prediction for Time Series
- arXiv:2503.21251 - Dual-Splitting for Multi-Step
- ICLR 2025 - Kernel-based Optimally Weighted CP

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

from apdtflow import APDTFlowForecaster
from apdtflow.conformal import (
    SplitConformalPredictor,
    AdaptiveConformalPredictor,
    plot_conformal_intervals
)

print("✓ APDTFlow v0.2.0 Conformal Prediction loaded!")

## Step 1: Generate Synthetic Time Series

Create data with trend + seasonality + noise

In [None]:
# Set seed for reproducibility
np.random.seed(42)

# Generate 500 days of data
dates = pd.date_range(start='2023-01-01', periods=500, freq='D')
t = np.arange(len(dates))

# Components
trend = 0.05 * t
seasonality = 10 * np.sin(2 * np.pi * t / 30)  # 30-day cycle
noise = np.random.normal(0, 2, len(t))

values = 100 + trend + seasonality + noise

df = pd.DataFrame({
    'date': dates,
    'value': values
})

print(f"Dataset: {len(df)} days")
print(f"Components: Trend + 30-day seasonality + Gaussian noise")
df.head()

In [None]:
# Visualize data
plt.figure(figsize=(14, 5))
plt.plot(df['date'], df['value'])
plt.title('Synthetic Time Series', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## Step 2: Split Conformal Prediction

The simplest and most reliable conformal method.

**How it works:**
1. Split data: Train (60%) / Calibration (20%) / Test (20%)
2. Train model on training set
3. Compute nonconformity scores on calibration set
4. Use quantile of scores to construct prediction intervals

In [None]:
# Split data
n = len(df)
train_end = int(0.6 * n)
cal_end = int(0.8 * n)

train_data = df.iloc[:train_end]
cal_data = df.iloc[train_end:cal_end]
test_data = df.iloc[cal_end:]

print(f"Data split:")
print(f"  Training:    {len(train_data)} samples")
print(f"  Calibration: {len(cal_data)} samples")
print(f"  Test:        {len(test_data)} samples")

In [None]:
# Train base forecaster
model = APDTFlowForecaster(
    forecast_horizon=1,
    history_length=30,
    num_epochs=30,
    verbose=False
)

model.fit(train_data, target_col='value', date_col='date')
print("✓ Base model trained")

In [None]:
# Create prediction function for conformal predictor
def predict_fn(X):
    """Wrapper to predict on arbitrary inputs."""
    # X is a 2D array (n_samples, history_length)
    preds = []
    for i in range(len(X)):
        # Temporarily set last sequence
        old_seq = model.last_sequence_
        model.last_sequence_ = X[i]
        pred = model.predict()
        preds.append(pred[0])
        model.last_sequence_ = old_seq
    return np.array(preds)

# Prepare calibration data
cal_values = cal_data['value'].values
X_cal = np.array([cal_values[i-30:i] for i in range(30, len(cal_values))])
y_cal = cal_values[30:]

print(f"Calibration set: {len(X_cal)} samples")

In [None]:
# Create and calibrate conformal predictor
conformal = SplitConformalPredictor(
    predict_fn=predict_fn,
    alpha=0.05  # 95% coverage
)

conformal.calibrate(X_cal, y_cal)
print("\n✓ Conformal predictor calibrated!")

### Evaluate Coverage on Test Set

In [None]:
# Prepare test data
test_values = test_data['value'].values
X_test = np.array([test_values[i-30:i] for i in range(30, len(test_values))])
y_test = test_values[30:]

# Get conformal predictions
lower, pred, upper = conformal.predict(X_test)

# Calculate empirical coverage
covered = (y_test >= lower) & (y_test <= upper)
empirical_coverage = np.mean(covered)

print(f"Empirical Coverage: {empirical_coverage:.1%}")
print(f"Target Coverage:    {1-conformal.alpha:.1%}")
print(f"\nAverage Interval Width: {np.mean(upper - lower):.2f}")
print(f"Quantile Value: {conformal.quantile:.2f}")

In [None]:
# Visualize conformal intervals
plot_conformal_intervals(
    y_true=y_test[:50],
    y_pred=pred[:50],
    lower=lower[:50],
    upper=upper[:50],
    title=f"Split Conformal Prediction (Coverage: {empirical_coverage:.1%})",
    figsize=(14, 6)
)

print(f"Notice: ~{empirical_coverage:.0%} of true values fall within the intervals!")

## Step 3: Adaptive Conformal Prediction

For **non-stationary** data where the distribution changes over time.

**How it works:**
- Starts with split conformal calibration
- Adapts the interval width online based on recent errors
- Maintains coverage even when data distribution shifts!

In [None]:
# Create non-stationary data (increasing variance over time)
np.random.seed(42)
dates_ns = pd.date_range(start='2023-01-01', periods=500, freq='D')
t_ns = np.arange(len(dates_ns))

# Variance increases over time!
trend_ns = 0.05 * t_ns
seasonality_ns = 10 * np.sin(2 * np.pi * t_ns / 30)
noise_std = 2 + 0.01 * t_ns  # Increasing noise!
noise_ns = np.random.normal(0, noise_std)

values_ns = 100 + trend_ns + seasonality_ns + noise_ns

df_ns = pd.DataFrame({
    'date': dates_ns,
    'value': values_ns
})

print("Non-stationary dataset created")
print("Variance increases over time!")

In [None]:
# Visualize increasing variance
plt.figure(figsize=(14, 5))
plt.plot(df_ns['date'], df_ns['value'])
plt.title('Non-Stationary Time Series (Increasing Variance)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print("Notice how the fluctuations get larger over time!")

In [None]:
# Split data
train_ns = df_ns.iloc[:300]
cal_ns = df_ns.iloc[300:400]
test_ns = df_ns.iloc[400:]

# Train model
model_ns = APDTFlowForecaster(
    forecast_horizon=1,
    history_length=30,
    num_epochs=30,
    verbose=False
)

model_ns.fit(train_ns, target_col='value', date_col='date')
print("✓ Model trained on non-stationary data")

In [None]:
# Create prediction function
def predict_fn_ns(X):
    preds = []
    for i in range(len(X)):
        old_seq = model_ns.last_sequence_
        model_ns.last_sequence_ = X[i]
        pred = model_ns.predict()
        preds.append(pred[0])
        model_ns.last_sequence_ = old_seq
    return np.array(preds)

# Prepare calibration data
cal_values_ns = cal_ns['value'].values
X_cal_ns = np.array([cal_values_ns[i-30:i] for i in range(30, len(cal_values_ns))])
y_cal_ns = cal_values_ns[30:]

print(f"Calibration: {len(X_cal_ns)} samples")

In [None]:
# Create ADAPTIVE conformal predictor
adaptive_conformal = AdaptiveConformalPredictor(
    predict_fn=predict_fn_ns,
    alpha=0.05,
    gamma=0.05  # Learning rate for adaptation
)

adaptive_conformal.calibrate(X_cal_ns, y_cal_ns)
print("\n✓ Adaptive conformal predictor calibrated!")

In [None]:
# Online prediction with adaptation
test_values_ns = test_ns['value'].values
predictions_adaptive = []
lower_adaptive = []
upper_adaptive = []

for i in range(30, len(test_values_ns)):
    X_t = test_values_ns[i-30:i].reshape(1, -1)
    y_t = test_values_ns[i]
    
    # Predict and update
    lower_t, pred_t, upper_t = adaptive_conformal.predict_and_update(X_t, y_t.reshape(1))
    
    predictions_adaptive.append(pred_t[0])
    lower_adaptive.append(lower_t[0])
    upper_adaptive.append(upper_t[0])

y_test_ns = test_values_ns[30:]
predictions_adaptive = np.array(predictions_adaptive)
lower_adaptive = np.array(lower_adaptive)
upper_adaptive = np.array(upper_adaptive)

# Calculate coverage
covered_adaptive = (y_test_ns >= lower_adaptive) & (y_test_ns <= upper_adaptive)
coverage_adaptive = np.mean(covered_adaptive)

print(f"Adaptive Conformal Coverage: {coverage_adaptive:.1%}")
print(f"Target Coverage:             {1-adaptive_conformal.alpha:.1%}")

In [None]:
# Visualize adaptive intervals
plot_conformal_intervals(
    y_true=y_test_ns,
    y_pred=predictions_adaptive,
    lower=lower_adaptive,
    upper=upper_adaptive,
    title=f"Adaptive Conformal Prediction (Coverage: {coverage_adaptive:.1%})",
    figsize=(14, 6)
)

print("Notice how the interval width adapts to increasing variance!")

In [None]:
# Get adaptation statistics
stats = adaptive_conformal.get_adaptation_stats()

print("\nAdaptation Statistics:")
print(f"  Updates performed:    {stats['num_updates']}")
print(f"  Initial quantile:     {stats['initial_quantile']:.2f}")
print(f"  Current quantile:     {stats['current_quantile']:.2f}")
print(f"  Quantile change:      {stats['quantile_change']:.2f}")
print(f"  Recent coverage:      {stats['recent_coverage']:.1%}")
print(f"  Target coverage:      {stats['target_coverage']:.1%}")

## Step 4: Comparison - Split vs Adaptive

Let's compare both methods on the non-stationary data.

In [None]:
# Also run split conformal on non-stationary data
split_conformal_ns = SplitConformalPredictor(
    predict_fn=predict_fn_ns,
    alpha=0.05
)

split_conformal_ns.calibrate(X_cal_ns, y_cal_ns)

# Prepare test data
X_test_ns = np.array([test_values_ns[i-30:i] for i in range(30, len(test_values_ns))])

# Predict
lower_split, pred_split, upper_split = split_conformal_ns.predict(X_test_ns)

# Coverage
covered_split = (y_test_ns >= lower_split) & (y_test_ns <= upper_split)
coverage_split = np.mean(covered_split)

print(f"Split Conformal Coverage:    {coverage_split:.1%}")
print(f"Adaptive Conformal Coverage: {coverage_adaptive:.1%}")
print(f"Target Coverage:             95.0%")

In [None]:
# Compare interval widths over time
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))

x = np.arange(len(y_test_ns))

# Split conformal
ax1.plot(x, y_test_ns, 'ko-', label='Actual', alpha=0.6, markersize=4)
ax1.fill_between(x, lower_split, upper_split, alpha=0.3, color='blue', label='Split Conformal')
ax1.set_title(f'Split Conformal (Coverage: {coverage_split:.1%})', fontsize=12, fontweight='bold')
ax1.set_ylabel('Value')
ax1.legend()
ax1.grid(alpha=0.3)

# Adaptive conformal
ax2.plot(x, y_test_ns, 'ko-', label='Actual', alpha=0.6, markersize=4)
ax2.fill_between(x, lower_adaptive, upper_adaptive, alpha=0.3, color='red', label='Adaptive Conformal')
ax2.set_title(f'Adaptive Conformal (Coverage: {coverage_adaptive:.1%})', fontsize=12, fontweight='bold')
ax2.set_xlabel('Time Step')
ax2.set_ylabel('Value')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKey Observation:")
print("  Split: Fixed interval width (may undercover later)")
print("  Adaptive: Intervals widen as variance increases (maintains coverage!)")

## Step 5: Integration with APDTFlowForecaster

Use conformal prediction directly in the high-level API.

In [None]:
# Create model with conformal prediction enabled
model_conformal = APDTFlowForecaster(
    forecast_horizon=7,
    history_length=30,
    num_epochs=30,
    use_conformal=True,
    conformal_method='split',  # or 'adaptive'
    calibration_split=0.2,
    verbose=False
)

# Train (automatically splits for calibration)
model_conformal.fit(train_data, target_col='value', date_col='date')

print("✓ Model with conformal prediction trained")
print("  20% of training data used for calibration")

In [None]:
# Get conformal prediction intervals
lower_api, pred_api, upper_api = model_conformal.predict(
    alpha=0.05,
    return_intervals='conformal'
)

print(f"Predictions with 95% conformal intervals:")
print(f"  Lower: {lower_api}")
print(f"  Pred:  {pred_api}")
print(f"  Upper: {upper_api}")

## Key Takeaways

### 1. Why Conformal Prediction?

**Traditional Uncertainty:**
- Based on model assumptions (e.g., Gaussian)
- Can be miscalibrated
- No guarantees

**Conformal Prediction:**
- Distribution-free (no assumptions!)
- Finite-sample coverage guarantees
- Theoretically rigorous

### 2. Split vs Adaptive

**Split Conformal:**
- Simple and reliable
- Best for stationary data
- Fixed interval width

**Adaptive Conformal:**
- For non-stationary data
- Intervals adapt to changing variance
- Maintains coverage under distribution shift!

### 3. When to Use

Use conformal prediction when:
- **Decision-making** requires guaranteed coverage
- **Risk management** (finance, healthcare)
- **Safety-critical** applications
- **Regulatory requirements** for uncertainty quantification

### 4. API Usage

```python
# Method 1: Direct usage
from apdtflow.conformal import SplitConformalPredictor

conformal = SplitConformalPredictor(predict_fn, alpha=0.05)
conformal.calibrate(X_cal, y_cal)
lower, pred, upper = conformal.predict(X_test)

# Method 2: Integrated with forecaster
model = APDTFlowForecaster(
    use_conformal=True,
    conformal_method='split'
)

model.fit(df)
lower, pred, upper = model.predict(return_intervals='conformal')
```

### 5. Research References

- **arXiv:2509.02844**: Conformal Prediction for Time Series with Change Points
- **ICLR 2025**: Kernel-based Optimally Weighted Conformal Prediction
- **arXiv:2503.21251**: Dual-Splitting for Multi-Step Forecasting
- **NeurIPS 2021**: Adaptive Conformal Inference Under Distribution Shift

## Next Steps

1. Try with your own data!
2. Experiment with different alpha levels (90%, 95%, 99%)
3. Combine with exogenous variables (see `exogenous_variables.ipynb`)
4. Compare coverage across different models

📚 **Documentation**: https://github.com/yotambraun/APDTFlow