# Foundation Models for Time Series Forecasting

This notebook demonstrates zero-shot forecasting with pretrained foundation models.

**Models:**
- **TimesFM 2.5** (Google): Decoder-only transformer, 200M parameters
- **Chronos 2** (Amazon): T5-based encoder-decoder, 120M parameters  
- **Exponential Smoothing**: Traditional baseline for comparison

**Datasets:**
1. **Air Passengers**: Monthly data with clear seasonal patterns (1949-1960)
2. **Energy Load**: Hourly electricity demand with complex multi-scale seasonality

Both foundation models provide probabilistic forecasts with uncertainty quantification. We'll compare their zero-shot performance against a traditional approach.

For architecture details and training methodology, see the [Foundation Models User Guide](../docs/userguide/foundation_models.md).

## 1. Installation

Install Darts with foundation model support:

In [None]:
# Uncomment to install:
# !pip install "darts[timesfm,chronos]"
# or with uv:
# !uv pip install "darts[timesfm,chronos]"

## 2. Imports and Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch

from darts import TimeSeries
from darts.datasets import AirPassengersDataset, EnergyDataset
from darts.models import TimesFMModel, ChronosModel, ExponentialSmoothing
from darts.metrics import mape

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Configuration
SPLIT_RATIO = 0.8
NUM_SAMPLES = 100

# Define consistent color palette for all plots (using matplotlib default colors)
TIMESFM_COLOR = 'C3'  # Red (distinct from ground truth black)
CHRONOS_COLOR = 'C1'   # Orange
EXP_COLOR = 'C2'       # Green

In [None]:
def plot_median_comparison(val, forecasts, mapes, model_names, colors, dataset_name, ylabel):
    """Plot median forecast comparison across all models."""
    fig, ax = plt.subplots(figsize=(14, 5))
    
    # Extract medians
    medians = [fc.quantile(0.5) for fc in forecasts]
    
    # Plot ground truth
    val.plot(ax=ax, label="ground truth", color='black', linewidth=2.5)
    
    # Plot each model's median
    for median, mape_val, name, color in zip(medians, mapes, model_names, colors):
        median.plot(ax=ax, label=f"{name} ({mape_val:.2f}% MAPE)", 
                   color=color, linewidth=2)
    
    ax.set_title(f"{dataset_name}: Median Forecast Comparison")
    ax.set_ylabel(ylabel)
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    return medians  # Return for later use in residual analysis


def plot_probabilistic_forecasts(val, forecasts, model_names, colors, dataset_name, ylabel):
    """Plot probabilistic forecasts with confidence intervals."""
    fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharex=True, sharey=True)
    
    for i, (fc, name, color) in enumerate(zip(forecasts, model_names, colors)):
        val.plot(ax=axes[i], label="ground truth", color='black', linewidth=2.5)
        fc.plot(ax=axes[i], label=name, color=color)
        
        # Set title based on model type
        if i == 0:
            axes[i].set_title(f"{name}: Probabilistic Forecast (Quantile Head)")
            axes[i].set_ylabel(ylabel)  # Only leftmost gets ylabel
        elif i == 1:
            axes[i].set_title(f"{name}: Probabilistic Forecast (Sampling)")
        else:
            axes[i].set_title(f"{name}: Probabilistic Forecast")
        
        axes[i].legend()
        axes[i].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()


def plot_standardized_errors(val, medians, model_names, colors, dataset_name):
    """Plot standardized forecast errors with statistics table."""
    # Calculate standardized errors
    std_errors_list = []
    for median in medians:
        errors = (val - median).values().flatten()
        std_errors = errors / errors.std()
        std_errors_list.append(std_errors)
    
    # Create boxplot
    fig, ax = plt.subplots(figsize=(10, 6))
    bp = ax.boxplot(std_errors_list, tick_labels=model_names, patch_artist=True,
                    showmeans=True, meanline=True,
                    medianprops=dict(color='black', linewidth=2),
                    meanprops=dict(color='red', linewidth=2, linestyle='--'))
    
    # Color the boxes
    for patch, color in zip(bp['boxes'], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.6)
    
    ax.axhline(0, color='black', linestyle='--', linewidth=1, alpha=0.5)
    ax.set_title(f'{dataset_name}: Standardized Forecast Errors')
    ax.set_ylabel('Standardized Error (σ)')
    ax.grid(True, alpha=0.3, axis='y')
    plt.tight_layout()
    plt.show()
    
    # Calculate and display statistics
    quantiles = [0.05, 0.15, 0.25, 0.5, 0.75, 0.85, 0.95]
    stats_data = []
    
    for errors, label in zip(std_errors_list, model_names):
        stats = {
            'Model': label,
            'Mean': f'{errors.mean():.3f}',
            'Std': f'{errors.std():.3f}',
        }
        for q in quantiles:
            stats[f'Q{int(q*100)}'] = f'{np.percentile(errors, q*100):.3f}'
        stats_data.append(stats)
    
    stats_df = pd.DataFrame(stats_data)
    print(f"\n{dataset_name} - Standardized Error Distribution Statistics:")
    print("="*100)
    print(stats_df.to_string(index=False))
    print("="*100)

### Helper Functions for Plotting

### Load and Visualize

In [None]:
# Load dataset
air_series = AirPassengersDataset().load()
air_train, air_val = air_series.split_before(SPLIT_RATIO)

print(f"Total length: {len(air_series)}")
print(f"Training: {len(air_train)} points")
print(f"Validation: {len(air_val)} points")

# Visualize with train/test split
fig, ax = plt.subplots(figsize=(12, 6))
air_series.plot(ax=ax, label="Historical", color='black', linewidth=2)

# Add train/test split line
split_time = air_train.end_time()
ax.axvline(split_time, color='red', linestyle='--', linewidth=2,
           label='Train/Test Split', alpha=0.7)

ax.set_title("Air Passengers Dataset: Train/Test Split (80/20)")
ax.set_ylabel("Passengers (thousands)")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Generate Forecasts

In [None]:
# TimesFM 2.5
print("Generating TimesFM 2.5 forecast...")
timesfm_model = TimesFMModel(
    context_length=512,
    device='auto'
)
air_timesfm_forecast = timesfm_model.predict(n=len(air_val), series=air_train, num_samples=NUM_SAMPLES)
air_timesfm_mape = mape(air_val, air_timesfm_forecast)
print(f"TimesFM 2.5 MAPE: {air_timesfm_mape:.2f}%")

In [None]:
# Chronos 2
print("Generating Chronos 2 forecast...")
chronos_model = ChronosModel(context_length=512)
air_chronos_forecast = chronos_model.predict(n=len(air_val), series=air_train, num_samples=NUM_SAMPLES)
air_chronos_mape = mape(air_val, air_chronos_forecast)
print(f"Chronos 2 MAPE: {air_chronos_mape:.2f}%")

# Model display names for plots
timesfm_name = "TimesFM 2.5 200M"
chronos_name = "Chronos 2 Base"

In [None]:
# Exponential Smoothing (Probabilistic)
print("Generating Exponential Smoothing forecast...")
exp_model = ExponentialSmoothing()
exp_model.fit(air_train)
air_exp_forecast = exp_model.predict(n=len(air_val), num_samples=NUM_SAMPLES)
air_exp_mape = mape(air_val, air_exp_forecast)
print(f"Exponential Smoothing MAPE: {air_exp_mape:.2f}%")

### Compare Forecasts

We compare the models in two ways:
1. **Median Comparison**: Quick comparison of point forecasts across all models
2. **Probabilistic Forecasts**: Individual uncertainty quantification with confidence intervals (50%, 75%, 90%, 95%)

In [None]:
# Median Comparison - All Models
air_medians = plot_median_comparison(
    air_val,
    [air_timesfm_forecast, air_chronos_forecast, air_exp_forecast],
    [air_timesfm_mape, air_chronos_mape, air_exp_mape],
    [timesfm_name, chronos_name, 'Exp. Smoothing'],
    [TIMESFM_COLOR, CHRONOS_COLOR, EXP_COLOR],
    "Air Passengers",
    "Passengers (thousands)"
)

In [None]:
# Probabilistic Forecasts with Confidence Intervals
plot_probabilistic_forecasts(
    air_val,
    [air_timesfm_forecast, air_chronos_forecast, air_exp_forecast],
    [timesfm_name, chronos_name, 'Exp. Smoothing'],
    [TIMESFM_COLOR, CHRONOS_COLOR, EXP_COLOR],
    "Air Passengers",
    "Passengers (thousands)"
)

### Performance Summary

In [None]:
# Create performance table
air_results = pd.DataFrame({
    'Model': [timesfm_name, chronos_name, 'Exponential Smoothing'],
    'MAPE (%)': [air_timesfm_mape, air_chronos_mape, air_exp_mape]
})
air_results = air_results.sort_values('MAPE (%)')
print("\nAir Passengers Performance:")
print(air_results.to_string(index=False))

In [None]:
# Standardized Forecast Errors (Normalized Residuals)
plot_standardized_errors(
    air_val,
    air_medians,
    [timesfm_name, chronos_name, 'Exp. Smoothing'],
    [TIMESFM_COLOR, CHRONOS_COLOR, EXP_COLOR],
    "Air Passengers"
)

### Residual Analysis

Understanding forecast errors helps identify model biases and reliability:
- **Centered at zero:** Good - no systematic bias
- **Symmetric distribution:** Good - balanced over/under-prediction
- **Low variance:** Good - consistent accuracy

## 5. Dataset 2: Energy Load (Complex Hourly Patterns)

Energy load data contains complex multi-scale seasonality (daily and weekly patterns) - testing how models handle hierarchical temporal structure.

### Load and Visualize

In [None]:
# Load dataset (single component, subset for tutorial speed)
energy = EnergyDataset().load()
energy_series = energy['total load actual'][-1000:]  # Use last 1000 points
energy_train, energy_val = energy_series.split_before(SPLIT_RATIO)

print(f"Total length: {len(energy_series)}")
print(f"Training: {len(energy_train)} points")
print(f"Validation: {len(energy_val)} points")

# Visualize with train/test split
fig, ax = plt.subplots(figsize=(12, 6))
energy_series.plot(ax=ax, label="Historical", color='black', linewidth=2)

# Add train/test split line
split_time = energy_train.end_time()
ax.axvline(split_time, color='red', linestyle='--', linewidth=2,
           label='Train/Test Split', alpha=0.7)

ax.set_title("Energy Load Dataset: Train/Test Split (80/20)")
ax.set_ylabel("Load (MW)")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Generate Forecasts

In [None]:
# TimesFM 2.5
print("Generating TimesFM 2.5 forecast...")
energy_timesfm_forecast = timesfm_model.predict(n=len(energy_val), series=energy_train, num_samples=NUM_SAMPLES)
energy_timesfm_mape = mape(energy_val, energy_timesfm_forecast)
print(f"TimesFM 2.5 MAPE: {energy_timesfm_mape:.2f}%")

In [None]:
# Chronos 2
print("Generating Chronos 2 forecast...")
energy_chronos_forecast = chronos_model.predict(n=len(energy_val), series=energy_train, num_samples=NUM_SAMPLES)
energy_chronos_mape = mape(energy_val, energy_chronos_forecast)
print(f"Chronos 2 MAPE: {energy_chronos_mape:.2f}%")

### Chronos 2 with Time Covariates

Chronos 2's architecture is optimized for **multivariate forecasting with covariates**. The paper states it achieves "largest improvements with exogenous features."

Energy load has strong temporal patterns:
- **Daily cycles**: Business hours vs night
- **Weekly patterns**: Weekday vs weekend  
- **Seasonal variation**: Heating (winter) vs cooling (summer)

Let's test whether adding time-based covariates (hour, day-of-week, month) unlocks Chronos 2's full potential.

In [None]:
def create_time_covariates(series: TimeSeries) -> TimeSeries:
    """
    Create time-based covariates from series timestamps.
    
    Features:
    - hour_sin, hour_cos: Cyclic encoding of hour (0-23)
    - dow_sin, dow_cos: Cyclic encoding of day of week (0-6)
    - month_sin, month_cos: Cyclic encoding of month (1-12)
    
    Cyclic encoding ensures continuity (hour 23 → 0, Dec → Jan).
    """
    df = series.to_dataframe()
    index = df.index
    
    # Hour of day (0-23) - Daily cycles
    hour = index.hour
    df['hour_sin'] = np.sin(2 * np.pi * hour / 24)
    df['hour_cos'] = np.cos(2 * np.pi * hour / 24)
    
    # Day of week (0=Monday, 6=Sunday) - Weekly patterns
    dow = index.dayofweek
    df['dow_sin'] = np.sin(2 * np.pi * dow / 7)
    df['dow_cos'] = np.cos(2 * np.pi * dow / 7)
    
    # Month (1-12) - Seasonal variation
    month = index.month
    df['month_sin'] = np.sin(2 * np.pi * month / 12)
    df['month_cos'] = np.cos(2 * np.pi * month / 12)
    
    # Keep only covariate columns
    df = df.drop(columns=df.columns[0])
    
    return TimeSeries.from_dataframe(df)

# Create covariates for full series (train + validation)
energy_covariates = create_time_covariates(energy_series)

# Split to match train/val split
energy_cov_train = energy_covariates[:len(energy_train)]     # Past covariates
energy_cov_future = energy_covariates[len(energy_train):]    # Future covariates

print(f"Created covariates: {energy_covariates.components.tolist()}")
print(f"Covariate train: {len(energy_cov_train)} points")
print(f"Covariate future: {len(energy_cov_future)} points")

In [None]:
# Chronos 2 WITH time covariates
print("Generating Chronos 2 forecast WITH time covariates...")
energy_chronos_cov_forecast = chronos_model.predict(
    n=len(energy_val),
    series=energy_train,
    past_covariates=energy_cov_train,      # Historical temporal features
    future_covariates=energy_cov_future,   # Known future temporal features  
    num_samples=NUM_SAMPLES
)
energy_chronos_cov_mape = mape(energy_val, energy_chronos_cov_forecast)
print(f"Chronos 2 (with covariates): {energy_chronos_cov_mape:.2f}% MAPE")

# Calculate improvement
improvement = energy_chronos_mape - energy_chronos_cov_mape
improvement_pct = (improvement / energy_chronos_mape) * 100
print(f"\nImprovement from covariates: {improvement:.2f}% MAPE ({improvement_pct:.1f}% reduction)")

In [None]:
# Exponential Smoothing (Probabilistic)
print("Generating Exponential Smoothing forecast...")
exp_model_energy = ExponentialSmoothing()
exp_model_energy.fit(energy_train)
energy_exp_forecast = exp_model_energy.predict(n=len(energy_val), num_samples=NUM_SAMPLES)
energy_exp_mape = mape(energy_val, energy_exp_forecast)
print(f"Exponential Smoothing MAPE: {energy_exp_mape:.2f}%")

### Compare Forecasts

We compare the models in two ways:
1. **Median Comparison**: Quick comparison of point forecasts across all models
2. **Probabilistic Forecasts**: Individual uncertainty quantification with confidence intervals (50%, 75%, 90%, 95%)

In [None]:
# Median Comparison - All Models
energy_medians = plot_median_comparison(
    energy_val,
    [energy_timesfm_forecast, energy_chronos_forecast, energy_exp_forecast],
    [energy_timesfm_mape, energy_chronos_mape, energy_exp_mape],
    [timesfm_name, chronos_name, 'Exp. Smoothing'],
    [TIMESFM_COLOR, CHRONOS_COLOR, EXP_COLOR],
    "Energy Load",
    "Load (MW)"
)

In [None]:
# Probabilistic Forecasts with Confidence Intervals
plot_probabilistic_forecasts(
    energy_val,
    [energy_timesfm_forecast, energy_chronos_forecast, energy_exp_forecast],
    [timesfm_name, chronos_name, 'Exp. Smoothing'],
    [TIMESFM_COLOR, CHRONOS_COLOR, EXP_COLOR],
    "Energy Load",
    "Load (MW)"
)

In [None]:
# Standardized Forecast Errors (Normalized Residuals)
plot_standardized_errors(
    energy_val,
    energy_medians,
    [timesfm_name, chronos_name, 'Exp. Smoothing'],
    [TIMESFM_COLOR, CHRONOS_COLOR, EXP_COLOR],
    "Energy Load"
)

### Residual Analysis

Understanding standardized forecast errors helps identify model biases and reliability:

**Boxplot interpretation:**
- **Median near zero**: No systematic bias
- **Symmetric distribution**: Balanced over/under-prediction
- **Tight IQR (box)**: Consistent accuracy
- **Few outliers**: Robust to unusual patterns

**Comparison across models:**
If TimesFM's box is much tighter than Chronos, it may suggest familiarity with similar grid load patterns (even if not explicitly in documented training data).

### Performance Summary

In [None]:
# Create performance table
energy_results = pd.DataFrame({
    'Model': [
        timesfm_name,
        f'{chronos_name} (with covariates)',
        chronos_name,
        'Exponential Smoothing'
    ],
    'MAPE (%)': [
        energy_timesfm_mape,
        energy_chronos_cov_mape,
        energy_chronos_mape,
        energy_exp_mape
    ]
})
energy_results = energy_results.sort_values('MAPE (%)')
print("\nEnergy Load Performance:")
print(energy_results.to_string(index=False))

# Calculate improvement from covariates
improvement_pct = ((energy_chronos_mape - energy_chronos_cov_mape) / energy_chronos_mape) * 100
print(f"\n✨ Chronos 2 improved {improvement_pct:.1f}% with time covariates!")

## 6. Performance Summary Across All Datasets

Comparing model performance across different data patterns:

In [None]:
# Create comprehensive comparison table
summary = pd.DataFrame({
    'Dataset': ['Air Passengers', 'Energy Load'],
    f'{timesfm_name} MAPE (%)': [air_timesfm_mape, energy_timesfm_mape],
    f'{chronos_name} MAPE (%)': [air_chronos_mape, energy_chronos_mape],
    'Exp. Smoothing MAPE (%)': [air_exp_mape, energy_exp_mape]
})

print("\n" + "="*70)
print("COMPREHENSIVE PERFORMANCE COMPARISON")
print("="*70)
print(summary.to_string(index=False))
print("="*70)

# Calculate average performance
avg_timesfm = summary[f'{timesfm_name} MAPE (%)'].mean()
avg_chronos = summary[f'{chronos_name} MAPE (%)'].mean()
avg_exp = summary['Exp. Smoothing MAPE (%)'].mean()

print(f"\nAverage MAPE Across Datasets:")
print(f"  {timesfm_name}:          {avg_timesfm:.2f}%")
print(f"  {chronos_name}:            {avg_chronos:.2f}%")
print(f"  Exp. Smoothing:       {avg_exp:.2f}%")
print("="*70)

## 7. Key Takeaways

**What We Learned:**

1. **Zero-Shot Power**: Foundation models work immediately without training
   - TimesFM 2.5 and Chronos 2 deliver competitive accuracy out-of-the-box
   - No hyperparameter tuning required
   - Ideal for rapid prototyping and cold-start scenarios

2. **Model Specializations**:
   - **TimesFM 2.5**: Fast, optimized for univariate forecasting with native quantile head
   - **Chronos 2**: Benefits from time covariates on complex temporal patterns
   - **Traditional models**: Remain valuable for explainability and simple patterns

3. **Probabilistic Forecasting**: Both foundation models provide uncertainty quantification
   - TimesFM 2.5: 10 quantiles via native quantile head (0.0-0.9)
   - Chronos 2: 21 quantiles via sampling-based forecasting

4. **Practical Trade-offs**:
   - Foundation models require more memory (200M+ parameters)
   - Inference time varies: TimesFM 2.5 fastest, Chronos moderate
   - Complex patterns often favor foundation models over traditional approaches

### Resources

- **TimesFM Paper**: [Das et al., ICML 2024](https://arxiv.org/abs/2310.10688)
- **Chronos Paper**: [Amazon Science, 2024](https://arxiv.org/abs/2403.07815)  
- **Darts Documentation**: [User Guide](https://unit8co.github.io/darts/)
- **Foundation Models Guide**: [Architecture Details](../docs/userguide/foundation_models.md)

**Next Steps**: Experiment with your own data, try different context lengths, and explore ensemble approaches.