# AutoTSForecast ‚Äî Tutorial

**Quick Install:** `pip install autotsforecast`

**üìö Documentation:**
- **[API Reference](../API_REFERENCE.md)**: Complete parameter documentation
- **[Quick Start](../QUICKSTART.md)**: 5-minute getting started guide  
- **[README](../README.md)**: Package overview and features

## What This Tutorial Covers

**Core Features:**
1. **AutoForecaster** - Automatic model selection with cross-validation
2. **Flexible Covariates** - Use external features (temp, promo, etc.)
3. **Hierarchical Reconciliation** - Enforce consistency across series
4. **Interpretability** - SHAP and DriverAnalyzer for feature importance
5. **Backtesting Module** - Standalone CV evaluation with holdout support

**Dataset:**
- 3 time series: `total`, `region_a`, `region_b` (where `total = region_a + region_b`)
- 2 covariates: `temp`, `promo`
- 226 training points, 14 test points
- Demonstrates both with and without covariates

In [None]:
import numpy as np

import pandas as pd

import copy

import matplotlib.pyplot as plt

import shap



from autotsforecast import AutoForecaster

from autotsforecast.backtesting.validator import BacktestValidator

from autotsforecast.hierarchical.reconciliation import HierarchicalReconciler

from autotsforecast.interpretability.drivers import DriverAnalyzer

from autotsforecast.models.base import LinearForecaster, MovingAverageForecaster, VARForecaster

from autotsforecast.models.external import ARIMAForecaster, ETSForecaster, LSTMForecaster, ProphetForecaster, RandomForestForecaster, XGBoostForecaster



np.random.seed(42)

In [None]:
horizon=14

n=240

idx=pd.date_range("2023-01-01",periods=n,freq="D")

time_step=np.arange(n)

temp=20+8*np.sin(2*np.pi*time_step/7)+np.random.normal(0,0.8,n)

promo=(np.random.rand(n)<0.12).astype(int)

promo[-horizon:]=(np.random.rand(horizon)<0.45).astype(int)

if promo[-horizon:].sum()==0:

    promo[-1]=1

X=pd.DataFrame({"temp":temp,"promo":promo},index=idx)

# Make regions harder (noisy) but total smoother: a large shared shock
# enters regions with opposite signs and cancels in the total.
shared=np.random.normal(0,4.0,n)
eps_a=np.random.normal(0,0.8,n)
eps_b=np.random.normal(0,0.8,n)

region_a=40+0.10*time_step+50.0*X["promo"].values+1.6*X["temp"].values+shared+eps_a

region_b=25+7.0*np.sin(2*np.pi*time_step/30)+1.8*X["temp"].values-shared+eps_b

total=region_a+region_b

y=pd.DataFrame({"region_a":region_a,"region_b":region_b,"total":total},index=idx)

y_train,y_test=y.iloc[:-horizon],y.iloc[-horizon:]

X_train,X_test=X.iloc[:-horizon],X.iloc[-horizon:]

rmse=lambda yt,yp:float(np.sqrt(np.mean((np.asarray(yt)-np.asarray(yp))**2)))

mape=lambda yt,yp:float(np.mean(np.abs((np.asarray(yt)-np.asarray(yp))/(np.abs(np.asarray(yt))+1e-9)))*100)

print("üìä Data Overview:")
print(f"   Training: {y_train.shape[0]} time steps √ó {y_train.shape[1]} series")
print(f"   Test: {y_test.shape[0]} time steps √ó {y_test.shape[1]} series")
print(f"   Covariates: {X_train.shape[1]} features (temp, promo)")
print(f"   Horizon: {horizon} days ahead")

(y_train.shape,y_test.shape,X_train.shape,X_test.shape,int(X_test["promo"].sum()))

In [None]:
# Plot the time series data
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Plot each series
axes[0, 0].plot(y.index, y['region_a'], label='region_a', color='steelblue')
axes[0, 0].axvline(y_train.index[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test split')
axes[0, 0].set_title('Region A', fontsize=11, fontweight='bold')
axes[0, 0].set_ylabel('Value')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

axes[0, 1].plot(y.index, y['region_b'], label='region_b', color='darkorange')
axes[0, 1].axvline(y_train.index[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test split')
axes[0, 1].set_title('Region B', fontsize=11, fontweight='bold')
axes[0, 1].set_ylabel('Value')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

axes[1, 0].plot(y.index, y['total'], label='total', color='green')
axes[1, 0].axvline(y_train.index[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test split')
axes[1, 0].set_title('Total (region_a + region_b)', fontsize=11, fontweight='bold')
axes[1, 0].set_xlabel('Date')
axes[1, 0].set_ylabel('Value')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Plot covariates
ax2 = axes[1, 1]
ax2.plot(X.index, X['temp'], label='temp', color='purple', alpha=0.7)
ax2.axvline(X_train.index[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test split')
ax2.set_ylabel('Temperature', color='purple')
ax2.tick_params(axis='y', labelcolor='purple')
ax2.set_title('Covariates (Exogenous Variables)', fontsize=11, fontweight='bold')
ax2.set_xlabel('Date')
ax2.grid(alpha=0.3)

ax2_right = ax2.twinx()
ax2_right.scatter(X.index, X['promo'], label='promo', color='red', alpha=0.5, s=30)
ax2_right.set_ylabel('Promo (0/1)', color='red')
ax2_right.tick_params(axis='y', labelcolor='red')
ax2_right.set_ylim(-0.1, 1.1)

lines1, labels1 = ax2.get_legend_handles_labels()
lines2, labels2 = ax2_right.get_legend_handles_labels()
ax2.legend(lines1 + lines2, labels1 + labels2, loc='upper left')

plt.tight_layout()
plt.show()

print(f"Data shape: {y.shape[0]} total points")
print(f"  ‚Ä¢ Training: {y_train.shape[0]} points")
print(f"  ‚Ä¢ Testing: {y_test.shape[0]} points")
print(f"  ‚Ä¢ Horizon: {horizon}")
print(f"\nNote: Promo events are more frequent in the test period (by design)")


## 1) AutoForecaster: Automatic Model Selection

**How It Works:**
- **Training**: Uses your historical data
- **Model Selection**: Runs time-respecting CV on training data only (NEVER uses test data!)
- **Forecasting**: Generates future predictions

**Cross-Validation Setup:**
With 226 training points, `cv_splits=3`, `test_size=14`:
- **Fold 1**: Train [0:212] ‚Üí Validate [212:226] (most recent data)
- **Fold 2**: Train [0:198] ‚Üí Validate [198:212]
- **Fold 3**: Train [0:184] ‚Üí Validate [184:198]

Each fold respects time ordering - no data leakage! Test data [226:240] is completely isolated for final evaluation.

In [None]:
# Quick data summary
print("="*80)
print("Dataset Summary")
print("="*80)
print(f"\nTraining points: {len(y_train)}")
print(f"Test points: {len(y_test)}")
print(f"Forecast horizon: {horizon}")
print(f"Number of series: {y_train.shape[1]}")
print(f"Number of covariates: {X_train.shape[1]}")
print("\n" + "="*80)

In [None]:
# Visualize how CV splits work (no data leakage - respects time order)
import matplotlib.patches as mpatches

fig, ax = plt.subplots(figsize=(12, 4))

n_train = len(y_train)
n_test = len(y_test)
total_for_cv = n_train

# Show CV folds (validation windows work backwards from end of training data)
cv_splits = 3  # Number of CV folds
for fold in range(cv_splits):
    # Validation windows work backwards from end of training data
    # Fold 0: validate on [n_train - 3*horizon : n_train - 2*horizon]
    # Fold 1: validate on [n_train - 2*horizon : n_train - 1*horizon]
    # Fold 2: validate on [n_train - 1*horizon : n_train]
    val_end = total_for_cv - (cv_splits - fold - 1) * horizon
    val_start = val_end - horizon
    train_end = val_start
    
    y_offset = fold * 0.3
    
    # Train portion (expanding window - grows each fold)
    ax.barh(y_offset, train_end, left=0, height=0.2, color='steelblue', alpha=0.7)
    
    # Validation portion (within training data)
    ax.barh(y_offset, horizon, left=val_start, height=0.2, color='orange', alpha=0.7)
    
    ax.text(-5, y_offset, f'Fold {fold+1}', va='center', ha='right', fontsize=9)

# Show final holdout test
y_offset = cv_splits * 0.3 + 0.2
ax.barh(y_offset, n_train, left=0, height=0.2, color='steelblue', alpha=0.9)
ax.barh(y_offset, n_test, left=n_train, height=0.2, color='red', alpha=0.7)
ax.text(-5, y_offset, 'Final', va='center', ha='right', fontsize=9, fontweight='bold')

ax.set_xlim(-10, n_train + n_test + 10)
ax.set_ylim(-0.2, 1.2)
ax.set_xlabel('Time (index)', fontsize=11)
ax.set_yticks([])
ax.set_title('Time Series CV: Expanding Window (No Data Leakage)', fontsize=12, fontweight='bold')

# Legend
train_patch = mpatches.Patch(color='steelblue', alpha=0.7, label='Train')
val_patch = mpatches.Patch(color='orange', alpha=0.7, label='Validation')
test_patch = mpatches.Patch(color='red', alpha=0.7, label='Holdout Test')
ax.legend(handles=[train_patch, val_patch, test_patch], loc='upper left')

plt.tight_layout()
plt.show()

print("\nüìã Cross-Validation Setup:")
print(f"   ‚Ä¢ Training points: {n_train}, Test points: {n_test}")
print(f"   ‚Ä¢ CV folds: {cv_splits} (validation windows within training data only)")
print(f"   ‚Ä¢ Fold 1: train [0:{n_train-3*horizon:3d}], validate [{n_train-3*horizon:3d}:{n_train-2*horizon:3d}]")
print(f"   ‚Ä¢ Fold 2: train [0:{n_train-2*horizon:3d}], validate [{n_train-2*horizon:3d}:{n_train-horizon:3d}]")
print(f"   ‚Ä¢ Fold 3: train [0:{n_train-horizon:3d}], validate [{n_train-horizon:3d}:{n_train:3d}]")
print(f"\nüîí No data leakage: Test data [{n_train}:{n_train+n_test}] never used for model selection")

In [None]:
# Define candidate models
cv_splits = 3
candidates = [
    VARForecaster(horizon=horizon, lags=7),
    ETSForecaster(horizon=horizon, seasonal_periods=7, trend=None, seasonal="add"),
    ARIMAForecaster(horizon=horizon, order=(1,1,1), seasonal_order=(1,0,1,7)),
    MovingAverageForecaster(horizon=horizon, window=7),
    LinearForecaster(horizon=horizon),
    RandomForestForecaster(horizon=horizon, n_lags=14, n_estimators=400, random_state=0),
    XGBoostForecaster(horizon=horizon, n_lags=14, n_estimators=400, random_state=0, max_depth=6, learning_rate=0.05),
    ProphetForecaster(horizon=horizon),
    LSTMForecaster(horizon=horizon, n_lags=21, hidden_size=32, num_layers=1, dropout=0.0, epochs=5, batch_size=64, learning_rate=0.01, random_state=0),
]

print(f"Candidate models: {len(candidates)}")
print(f"CV configuration: {cv_splits} folds, test_size={horizon}")

# Train AutoForecaster WITH covariates
auto = AutoForecaster(
    candidate_models=candidates,
    metric="rmse",
    n_splits=cv_splits,
    test_size=horizon,
    window_type="expanding",
    verbose=False,
    per_series_models=True,
    n_jobs=-1
)

print("\nTraining WITH covariates...")
auto.fit(y_train, X_train)

print("\nSelected models:")
for series in y_train.columns:
    print(f"  ‚Ä¢ {series}: {type(auto.best_models_[series]).__name__}")

yhat_auto = auto.forecast(X_test)
auto_rmse = {c: rmse(y_test[c], yhat_auto[c]) for c in y_test.columns}
auto_mape = {c: mape(y_test[c], yhat_auto[c]) for c in y_test.columns}

# Train AutoForecaster WITHOUT covariates
auto_noX = AutoForecaster(
    candidate_models=candidates,
    metric="rmse",
    n_splits=cv_splits,
    test_size=horizon,
    window_type="expanding",
    verbose=False,
    per_series_models=True,
    n_jobs=-1
)

print("\nTraining WITHOUT covariates...")
auto_noX.fit(y_train, None)

print("\nSelected models:")
for series in y_train.columns:
    print(f"  ‚Ä¢ {series}: {type(auto_noX.best_models_[series]).__name__}")

yhat_auto_noX = auto_noX.forecast(None)
auto_noX_rmse = {c: rmse(y_test[c], yhat_auto_noX[c]) for c in y_test.columns}
auto_noX_mape = {c: mape(y_test[c], yhat_auto_noX[c]) for c in y_test.columns}

# Display results
print("\n" + "="*80)
print("RESULTS")
print("="*80)

print("\nAutoForecaster WITH covariates:")
for series in y_test.columns:
    print(f"  {series:10s}: RMSE={auto_rmse[series]:6.2f}, MAPE={auto_mape[series]:5.2f}%")

print("\nAutoForecaster WITHOUT covariates:")
for series in y_test.columns:
    print(f"  {series:10s}: RMSE={auto_noX_rmse[series]:6.2f}, MAPE={auto_noX_mape[series]:5.2f}%")

# Collect individual model results for comparison
rows = []
for level in y_test.columns:
    rows.append({"model": "AutoForecaster (with covariates)", "level": level, "rmse": float(auto_rmse[level]), "mape": float(auto_mape[level])})
for level in y_test.columns:
    rows.append({"model": "AutoForecaster (no covariates)", "level": level, "rmse": float(auto_noX_rmse[level]), "mape": float(auto_noX_mape[level])})

# Compare with individual models
var_model = VARForecaster(horizon=horizon, lags=7)
try:
    var_model.fit(y_train, X_train)
    yhat_var = var_model.predict(X_test)
    for col in y_train.columns:
        rows.append({
            "model": "VARForecaster (multivariate)",
            "level": col,
            "rmse": float(rmse(y_test[col], yhat_var[col])),
            "mape": float(mape(y_test[col], yhat_var[col])),
        })
except Exception as e:
    print(f"\nVAR model failed: {e}")

# Test individual models on each series
for proto in candidates:
    if isinstance(proto, VARForecaster):
        continue  # Already handled above
        
    for col in y_train.columns:
        m = copy.deepcopy(proto)
        try:
            X_tr = X_train if getattr(m, "supports_covariates", False) else None
            X_te = X_test if getattr(m, "supports_covariates", False) else None
            m.fit(y_train[[col]], X_tr)
            yhat = m.predict(X_te)
            rows.append({
                "model": f"{m.__class__.__name__}",
                "level": col,
                "rmse": float(rmse(y_test[col], yhat[col])),
                "mape": float(mape(y_test[col], yhat[col])),
            })
        except:
            pass

results = pd.DataFrame(rows)
comparison_df = results.groupby(["model", "level"])[["rmse", "mape"]].mean().reset_index()
comparison_df = comparison_df.sort_values(["level", "rmse"])

print("\n" + "="*80)
print("Model Comparison by Series")
print("="*80)
display(comparison_df)

## Complete Workflow Summary

**Simple 4-Step Process:**

```python
# Step 1: Define candidate models
candidates = [VARForecaster(...), LinearForecaster(...), XGBoostForecaster(...), ...]

# Step 2: Create AutoForecaster
auto = AutoForecaster(
    candidate_models=candidates,
    per_series_models=True,  # Each series gets its own best model
    n_jobs=-1                # Use all CPU cores
)

# Step 3: Fit (with optional covariates)
auto.fit(y_train, X_train)  # X_train optional (can be None or dict)

# Step 4: Forecast
forecasts = auto.forecast(X_test)  # X_test optional (must match training)
```

**Key Features:**

- ‚úÖ **Automatic Model Selection**: CV selects best model per series
- ‚úÖ **Flexible Covariates**: Pass DataFrame (all series) or dict (per-series)
- ‚úÖ **Parallel Processing**: `n_jobs=-1` uses all CPU cores
- ‚úÖ **Time-Respecting CV**: No data leakage - training never sees test data
- ‚úÖ **Transparent**: Inspect selected models via `auto.best_models_[series]`
- ‚úÖ **Model-Agnostic Tools**: DriverAnalyzer and SHAP work with any selected model

**Optional: Different covariates per series**
```python
X_dict = {
    'region_a': X_df,      # Use covariates for region_a
    'region_b': None,      # No covariates for region_b
    'total': X_df          # Use covariates for total
}
auto.fit(y_train, X_dict)
```

## 2) Hierarchical Reconciliation

**Purpose**: Enforce consistency across hierarchical series (e.g., `total = region_a + region_b`)

**Methods Available:**
- `'bottom_up'`: Aggregate from bottom level
- `'top_down'`: Disaggregate from top level
- `'mint_ols'` (used below): MinT optimal reconciliation - balanced approach

Sometimes base forecasts are already coherent. To demonstrate reconciliation, we inject a small inconsistency if needed.

In [None]:
candidates_rec=[
    VARForecaster(horizon=horizon,lags=7),
    ETSForecaster(horizon=horizon,seasonal_periods=7,trend=None,seasonal="add"),
    ARIMAForecaster(horizon=horizon,order=(1,1,1),seasonal_order=(1,0,1,7)),
    MovingAverageForecaster(horizon=horizon,window=7),
    LinearForecaster(horizon=horizon),
    RandomForestForecaster(horizon=horizon,n_lags=14,n_estimators=400,random_state=0),
    XGBoostForecaster(horizon=horizon,n_lags=14,n_estimators=400,random_state=0,max_depth=6,learning_rate=0.05),
    ProphetForecaster(horizon=horizon),
    LSTMForecaster(horizon=horizon,n_lags=21,hidden_size=32,num_layers=1,dropout=0.0,epochs=5,batch_size=64,learning_rate=0.01,random_state=0),
]

auto_rec=AutoForecaster(candidate_models=candidates_rec,metric="rmse",n_splits=cv_splits,test_size=horizon,window_type="expanding",verbose=False,per_series_models=True,n_jobs=-1)

auto_rec.fit(y_train,X_train)

yhat_raw=auto_rec.forecast(X_test)

# If the base forecasts are already coherent, reconciliation will not change anything.
# To demonstrate reconciliation behavior, inject a tiny incoherency in that case.
yhat_base=yhat_raw.copy()
gap=yhat_base["total"]-(yhat_base["region_a"]+yhat_base["region_b"])
if float(np.max(np.abs(gap.values)))<1e-8:
    yhat_base["total"]=yhat_base["total"]*1.03

tree={"total":["region_a","region_b"]}

recon=HierarchicalReconciler(yhat_base,tree).reconcile(method="ols")
yhat_recon=recon.reconciled_forecasts



changed=pd.Series({c:bool(not np.allclose(yhat_base[c].values,yhat_recon[c].values)) for c in ["region_a","region_b","total"]})
print("Reconciliation changed forecasts (base -> reconciled):")
print(changed)

rows=[]
for level in ["region_a","region_b","total"]:
    rows.append({"model":"base","level":level,"rmse":rmse(y_test[level],yhat_base[level]),"mape":mape(y_test[level],yhat_base[level])})
    rows.append({"model":"reconciled_ols","level":level,"rmse":rmse(y_test[level],yhat_recon[level]),"mape":mape(y_test[level],yhat_recon[level])})

pd.DataFrame(rows).sort_values(["level","model"]).reset_index(drop=True)

## 3) Interpretability: DriverAnalyzer & SHAP

**DriverAnalyzer**: Analyzes external covariates (like temp, promo) to understand their impact on forecasts

**How It Works Internally:**
1. **Model-Agnostic Design**: Automatically detects model type and uses appropriate method
   - Linear models ‚Üí Uses coefficient values to measure feature impact
   - Tree models (XGBoost, RandomForest) ‚Üí Uses built-in feature importance
   - Statistical models (ETS, ARIMA) ‚Üí Gracefully handles (these use patterns, not features)

2. **Focus on External Covariates**: Shows how temp, promo, etc. drive predictions
   - Filters out internal features (lags) to focus on business-relevant drivers
   - Displays feature importance scores for each covariate

3. **Simple Usage** - Works automatically with AutoForecaster's selected models:
```python
# Get the selected model for any series
model = auto.best_models_['region_a']

# DriverAnalyzer detects model type automatically
analyzer = DriverAnalyzer(model=model, feature_names=['temp', 'promo'])
importance = analyzer.calculate_feature_importance(X_train, y_train[['region_a']])

# Returns importance scores showing which covariates matter most
```

**SHAP Analysis** - For tree models, provides deeper insights:
- Shows feature contribution to each prediction
- Identifies which lags and covariates are most important
- Visual explanations of model behavior

Below we demonstrate DriverAnalyzer focusing on **external covariates only** (temp and promo).

## 4) Standalone Backtesting Module

**BacktestValidator**: Independent CV tool that works with ANY forecasting model

**Key Differences from AutoForecaster:**
- **AutoForecaster**: Uses backtesting internally for model selection (you don't see CV details)
- **BacktestValidator**: Standalone tool for transparent evaluation and comparison

**Primary Use Case: Holdout Period for Future-Looking Performance**

The most important feature of BacktestValidator is the **holdout_period** parameter, which reserves the last N data points as a completely separate test set for production evaluation.

**Recommended Workflow:**
```python
validator = BacktestValidator(
    model=model,
    n_splits=3,
    test_size=14,
    holdout_period=14  # ‚≠ê Reserve last 14 points as final holdout test
)

# Returns both CV metrics (for model tuning) and holdout metrics (for production estimate)
cv_metrics, holdout_metrics = validator.run_with_holdout(y_full, X_full)
```

**Why Use Holdout?**
- ‚úÖ **Production Evaluation**: Simulates real deployment on future unseen data
- ‚úÖ **Unbiased Estimate**: Holdout data never used during model training or CV
- ‚úÖ **Standard ML Practice**: Matches train/validation/test split convention

**How It Works:**
- With `holdout_period=14` on 240 points:
  - **CV runs on**: [0:226] (training portion)
  - **Holdout evaluates on**: [226:240] (completely separate future period)
  - Within CV portion, folds work backward to maximize training data:
    - Fold 1: Train [0:212] ‚Üí Validate [212:226]
    - Fold 2: Train [0:198] ‚Üí Validate [198:212]
    - Fold 3: Train [0:184] ‚Üí Validate [184:198]

**Use holdout_period to assess real-world future performance!**

In [None]:
print("\n## Standalone Backtesting Examples\n")

# ============================================================================
# PRIMARY USE CASE: Holdout Period for Future-Looking Performance Evaluation
# ============================================================================

print("="*80)
print("PRIMARY EXAMPLE: Holdout Period for Production Evaluation")
print("="*80)

# Combine train and test data to simulate having historical data
y_full = pd.concat([y_train, y_test])
X_full = pd.concat([X_train, X_test])

print(f"\nFull dataset: {len(y_full)} points")
print(f"  - Training/CV portion: [0:226]")
print(f"  - Holdout test period: [226:240] (FUTURE-LOOKING)")
print("\n‚≠ê This simulates deploying a model and testing on truly unseen future data\n")

rf_model = RandomForestForecaster(horizon=horizon, n_lags=14, n_estimators=200, random_state=42)
validator = BacktestValidator(
    model=rf_model,
    n_splits=3,
    test_size=horizon,
    window_type='expanding',
    holdout_period=horizon  # ‚≠ê Reserve last 14 points as holdout for production evaluation
)

# Run both CV (for model tuning) and holdout (for production estimate)
cv_metrics, holdout_metrics = validator.run_with_holdout(y_full[['total']], X_full)

print(f"Cross-Validation Results on Training Data [0:226]:")
print(f"   (Used for model tuning and hyperparameter selection)")
print(f"   RMSE={cv_metrics['rmse']:.2f}, MAE={cv_metrics['mae']:.2f}, MAPE={cv_metrics['mape']:.2f}%")

print(f"\n‚≠ê HOLDOUT TEST PERFORMANCE [226:240] (FUTURE-LOOKING):")
print(f"   (This is your production performance estimate!)")
print(f"   RMSE={holdout_metrics['rmse']:.2f}, MAE={holdout_metrics['mae']:.2f}, MAPE={holdout_metrics['mape']:.2f}%")

print("\nCV fold details (within training portion [0:226]):")
display(validator.get_fold_results()[['fold', 'train_start', 'train_end', 'test_start', 'test_end', 'rmse']])

print("\n" + "="*80)
print("Key Insights:")
print("  ‚úÖ Holdout [226:240] is completely separate from training")
print("  ‚úÖ Provides unbiased estimate of future production performance")
print("  ‚úÖ CV is for tuning, holdout is for final evaluation")
print("  ‚úÖ This matches standard ML workflow: train/validation/test split")
print("="*80)

# ============================================================================
# SECONDARY EXAMPLES: CV-only and Model Comparison
# ============================================================================

print("\n" + "="*80)
print("SECONDARY EXAMPLE 1: CV-only backtesting (no holdout)")
print("="*80)
print("(Use this when you don't have extra data for holdout)")

rf_model2 = RandomForestForecaster(horizon=horizon, n_lags=14, n_estimators=200, random_state=42)
validator2 = BacktestValidator(
    model=rf_model2,
    n_splits=3,
    test_size=horizon,
    window_type='expanding'
)

cv_metrics2 = validator2.run(y_train[['total']], X_train)
print(f"CV Results (avg across {validator2.n_splits} folds):")
print(f"   RMSE={cv_metrics2['rmse']:.2f}, MAE={cv_metrics2['mae']:.2f}, MAPE={cv_metrics2['mape']:.2f}%")

print("\n" + "="*80)
print("SECONDARY EXAMPLE 2: Model Comparison with CV")
print("="*80)

xgb_model = XGBoostForecaster(horizon=horizon, n_lags=14, n_estimators=200, 
                               max_depth=5, learning_rate=0.05, random_state=42)
xgb_validator = BacktestValidator(xgb_model, n_splits=3, test_size=horizon, window_type='expanding')
xgb_cv = xgb_validator.run(y_train[['region_a']], X_train)

rf_model3 = RandomForestForecaster(horizon=horizon, n_lags=14, n_estimators=200, random_state=42)
rf_validator = BacktestValidator(rf_model3, n_splits=3, test_size=horizon, window_type='expanding')
rf_cv = rf_validator.run(y_train[['region_a']], X_train)

print("\nModel Comparison (CV on training data):")
comparison = pd.DataFrame({
    'Model': ['XGBoostForecaster', 'RandomForestForecaster'],
    'CV_RMSE': [xgb_cv['rmse'], rf_cv['rmse']],
    'CV_MAE': [xgb_cv['mae'], rf_cv['mae']],
    'CV_MAPE': [xgb_cv['mape'], rf_cv['mape']]
})
display(comparison)

In [None]:
print("="*80)
print("DriverAnalyzer: Focus on External Covariates (temp, promo)")
print("="*80)
print("\nüìä This analysis shows how EXTERNAL covariates (temp, promo) impact forecasts")
print("   Internal features like lags are excluded to focus on business drivers\n")

# Demonstrate DriverAnalyzer with AutoForecaster's selected models
for series in ["region_a", "region_b", "total"]:
    selected_model = auto.best_models_[series]
    model_type = type(selected_model).__name__
    
    print(f"\n{'-'*80}")
    print(f"Series: {series}")
    print(f"Selected Model: {model_type}")
    print(f"{'-'*80}")
    
    # DriverAnalyzer automatically detects model type and applies appropriate method
    try:
        # For linear models: use coefficients
        if isinstance(selected_model, LinearForecaster):
            # LinearForecaster needs special handling - check if it has covariates
            if hasattr(selected_model, 'feature_names') and selected_model.feature_names:
                da = DriverAnalyzer(model=selected_model, feature_names=selected_model.feature_names)
                importance = da.calculate_feature_importance(X_train, y_train[[series]], method="coefficients")
                print(f"\n‚úÖ Covariate Importance (via coefficients):")
                print(f"   How temp and promo impact {series} predictions:")
                display(importance)
                print("\n   üìà Interpretation:")
                print(f"      ‚Ä¢ Larger absolute values = stronger impact on forecasts")
                print(f"      ‚Ä¢ Positive = increases forecast when covariate increases")
                print(f"      ‚Ä¢ Negative = decreases forecast when covariate increases")
            else:
                print(f"\n‚ö†Ô∏è  {model_type} was fitted without covariates")
                print(f"   (Cannot analyze covariate importance)")
        
        # For tree models: could use feature importance or SHAP
        elif hasattr(selected_model, 'models') and hasattr(selected_model, 'n_lags'):
            print(f"\n‚úÖ {model_type} is a tree-based model with covariate support")
            print(f"   (Use SHAP analysis below for detailed covariate importance)")
        
        # For other models with predict method
        elif hasattr(selected_model, 'predict'):
            print(f"\n‚úÖ {model_type} supports predictions")
            print(f"   (DriverAnalyzer could run permutation importance to measure covariate impact)")
        
        else:
            print(f"\n‚ö†Ô∏è  {model_type} is a statistical model without explicit covariate features")
            print(f"   (These models use historical patterns and seasonality)")
    
    except Exception as e:
        print(f"\n‚ö†Ô∏è  Error analyzing {model_type}: {str(e)}")

print("\n" + "="*80)
print("How DriverAnalyzer Works Internally:")
print("="*80)
print("  1Ô∏è‚É£  Detects model type (Linear, Tree, Statistical, etc.)")
print("  2Ô∏è‚É£  Selects appropriate method:")
print("      ‚Ä¢ Linear ‚Üí Coefficient-based importance")
print("      ‚Ä¢ Tree ‚Üí Built-in feature_importances_")
print("      ‚Ä¢ Any model ‚Üí Permutation importance (fallback)")
print("  3Ô∏è‚É£  Focuses on EXTERNAL covariates (business drivers)")
print("  4Ô∏è‚É£  Returns interpretable importance scores")
print("\n  ‚úÖ Users just pass auto.best_models_[series] - DriverAnalyzer handles the rest!")
print("="*80)

# Standalone example: LinearForecaster with explicit covariate analysis
print("\n" + "="*80)
print("Example: Standalone LinearForecaster - Covariate Impact on region_a")
print("="*80)

lin1 = LinearForecaster(horizon=1)
lin1.fit(y_train[["region_a"]], X_train)
da_standalone = DriverAnalyzer(model=lin1, feature_names=['temp', 'promo'])
coef_imp = da_standalone.calculate_feature_importance(X_train, y_train[["region_a"]], method="coefficients")

print("\nCovariate Coefficient Importance (region_a):")
display(coef_imp)
print("\nüìä Interpretation:")
print("   ‚Ä¢ These coefficients show the linear relationship between covariates and forecasts")
print("   ‚Ä¢ 1 unit increase in 'temp' changes forecast by the 'temp' coefficient value")
print("   ‚Ä¢ 1 unit increase in 'promo' changes forecast by the 'promo' coefficient value")

# SHAP demonstration - shows covariate importance in tree models
print("\n" + "="*80)
print("SHAP Analysis: Covariate Importance in XGBoost Model")
print("="*80)
print("\nüìä This analysis uses SHAP to explain how temp and promo affect predictions")
print("   We fit an XGBoost model on region_a (which has strong covariate effects)")
print("   region_a = 40 + 0.10*time + 50.0*promo + 1.6*temp + noise\n")

xgb1 = XGBoostForecaster(horizon=1, n_lags=21, n_estimators=200, max_depth=4, learning_rate=0.1, random_state=42)
xgb1.fit(y_train[["region_a"]], X_train)

# Create lag features + covariates (properly aligned)
# For predicting at time t, use: lags from t-1, t-2, ..., t-21 and covariates from time t
lag_df = pd.concat([y_train[["region_a"]].shift(l).rename(columns={"region_a": f"region_a_lag{l}"}) for l in range(1, xgb1.n_lags+1)], axis=1)
# Don't shift covariates - use them at current time to predict current target
X_train_h = pd.concat([lag_df, X_train], axis=1).dropna()

print(f"XGBoost features: 21 lags + 2 covariates (temp, promo) = 23 total features")
print(f"Feature matrix shape: {X_train_h.shape}")
print(f"Sample of covariate values - temp range: [{X_train_h['temp'].min():.2f}, {X_train_h['temp'].max():.2f}], promo sum: {X_train_h['promo'].sum()}")

# Use sample for SHAP computation
X_sample = X_train_h.sample(min(100, len(X_train_h)), random_state=42)

model = xgb1.models[0][0]
explainer = shap.Explainer(model, X_sample)
shap_values = explainer(X_sample, check_additivity=False)

# Get SHAP values as array
shap_array = shap_values.values
feature_names = list(X_sample.columns)

# Calculate mean absolute SHAP values for ranking
mean_abs_shap = np.abs(shap_array).mean(axis=0)
importance_df = pd.DataFrame({
    'feature': feature_names,
    'mean_abs_shap': mean_abs_shap
}).sort_values('mean_abs_shap', ascending=False)

print("\n" + "="*80)
print("üéØ Covariate Feature Importance (Excluding Lags):")
print("="*80)

# Filter to show ONLY external covariates
covariate_importance = importance_df[importance_df['feature'].isin(['temp', 'promo'])].copy()
covariate_importance = covariate_importance.sort_values('mean_abs_shap', ascending=False)

print("\nüìä External Covariate Importance:")
# Display with more decimal places to see precise values
covariate_importance_display = covariate_importance.copy()
covariate_importance_display['mean_abs_shap'] = covariate_importance_display['mean_abs_shap'].apply(lambda x: f"{x:.6f}")
display(covariate_importance_display)

# Calculate relative importance
if len(covariate_importance) == 2:
    temp_imp = covariate_importance[covariate_importance['feature'] == 'temp']['mean_abs_shap'].iloc[0]
    promo_imp = covariate_importance[covariate_importance['feature'] == 'promo']['mean_abs_shap'].iloc[0]
    total_cov_imp = temp_imp + promo_imp
    print(f"\nüìà Relative Covariate Impact:")
    print(f"   ‚Ä¢ temp:  {temp_imp/total_cov_imp*100:.1f}% of total covariate effect")
    print(f"   ‚Ä¢ promo: {promo_imp/total_cov_imp*100:.1f}% of total covariate effect")

# Filter SHAP values to ONLY show covariates (exclude lag features)
covariate_indices = [i for i, feat in enumerate(feature_names) if feat in ['temp', 'promo']]
X_sample_covariates = X_sample.iloc[:, covariate_indices]
shap_values_covariates = shap_values[:, covariate_indices]

# Visual: SHAP summary plot for COVARIATES ONLY
plt.figure(figsize=(10, 5))
shap.summary_plot(shap_values_covariates, X_sample_covariates, show=False)
plt.title("SHAP Summary - External Covariates ONLY (temp, promo)\n(Lag features excluded)", 
          fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

# Visual: Bar chart showing covariate importance ONLY
plt.figure(figsize=(10, 5))
shap.summary_plot(shap_values_covariates, X_sample_covariates, plot_type="bar", show=False)
plt.title("SHAP Feature Importance - Covariates ONLY\n(Focus on business drivers: temp and promo)", 
          fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n" + "="*80)
print("Key Takeaways:")
print("="*80)
print("  ‚úÖ DriverAnalyzer focuses on EXTERNAL covariates (temp, promo)")
print("  ‚úÖ Automatically selects appropriate importance method per model type")
print("  ‚úÖ SHAP provides detailed feature-level explanations for tree models")
print("  ‚úÖ Both tools help understand which business drivers impact forecasts")
print("  ‚úÖ Lag features are internal mechanics - covariates are actionable insights")
print("="*80)

## 5) Quick Parameter Reference

**For complete documentation, see [API_REFERENCE.md](../API_REFERENCE.md)**

### AutoForecaster Parameters

| Parameter | Common Values | Description |
|-----------|--------------|-------------|
| `candidate_models` | List of models | Models to compare |
| `metric` | `'rmse'`, `'mae'`, `'mape'` | Selection metric |
| `n_splits` | `2-5` | Number of CV folds |
| `test_size` | `horizon` | Validation window size |
| `per_series_models` | `True`, `False` | Per-series selection |
| `n_jobs` | `-1` (all cores) | Parallel processing |

### Common Model Parameters

**VARForecaster:**
- `lags`: 1-21 (past time steps)
- `trend`: `'c'`, `'ct'`, `'n'`

**RandomForest/XGBoost:**
- `n_lags`: 7-30 (lag features)
- `n_estimators`: 100-500 (trees)
- `max_depth`: 5-10 (tree depth)
- `learning_rate`: 0.01-0.3 (XGBoost)

**ETSForecaster:**
- `seasonal_periods`: 7 (weekly), 12 (monthly)
- `trend`/`seasonal`: None, `'add'`, `'mul'`

**ARIMAForecaster:**
- `order`: (p, d, q) - typically p,q: 1-5, d: 0-2
- `seasonal_order`: (P, D, Q, s)

**LSTMForecaster:**
- `n_lags`: 14-60 (sequence length)
- `hidden_size`: 16-128
- `epochs`: 5-50

### Hierarchical Reconciliation

| Method | Description |
|--------|-------------|
| `'bottom_up'` | Aggregate from bottom |
| `'top_down'` | Disaggregate from top |
| `'mint_ols'` | MinT optimal (balanced) |
| `'mint_shrink'` | MinT with shrinkage |

### DriverAnalyzer Methods

- `'coefficients'`: Linear model coefficients
- `'permutation'`: Permutation importance
- `'shap'`: SHAP values (tree models)

---

**üìö Complete documentation: [API_REFERENCE.md](../API_REFERENCE.md)**