# Multivariate Forecasting with Foundation Models

Using Chronos-2 for Economic and Financial Forecasts

## Research Questions

1. Do multivariate (MV) methods produce better predictions than univariate (UV) ones when foundation models (FMs, transformers) are used for both?

2. Is MV forecasting accuracy better for stocks versus interest rates?

3. Is MV forecasting better when both stocks and interest rates are forecast together?

4. Can we build a large-scale "world" forecasting model?

---

**Implementation per README specifications**

## Setup and Imports

In [None]:
import pandas as pd
import numpy as np
import torch
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

from data_loader import DataLoader
from chronos_experiment_runner import ChronosExperimentRunner
from experiment_config import ExperimentConfig, TEST_CONFIG, DEFAULT_CONFIG
from metrics_calculator import MetricsCalculator

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## Step 1: Download Data

Per README: Create a script to download data so it can be re-run and updated.

**Data series:**
1. Stock data: Magnificent-7 (K=7)
2. Interest rates: FRED Constant Maturity rates (K=10)
3. Both combined (K=17)

In [None]:
# Initialize data loader
loader = DataLoader(data_dir="data")

# Download all datasets
print("Downloading data from 2000-01-01 to present...\n")

stocks_df = loader.download_stocks(start_date="2000-01-01")
rates_df = loader.download_interest_rates(start_date="2000-01-01")
combined_df = loader.download_combined(start_date="2000-01-01")

print("\n" + "="*60)
print("DATA SUMMARY")
print("="*60)
print(f"Stocks: {stocks_df.shape[0]} rows, {stocks_df.shape[1]-1} series (K=7)")
print(f"Interest Rates: {rates_df.shape[0]} rows, {rates_df.shape[1]-1} series (K=10)")
print(f"Combined: {combined_df.shape[0]} rows, {combined_df.shape[1]-1} series (K=17)")

In [None]:
# Preview data
print("\nStocks (Magnificent-7):")
display(stocks_df.head())

print("\nInterest Rates (FRED):")
display(rates_df.head())

print("\nCombined Dataset:")
display(combined_df.head())

## Step 2: Single Test Experiment

Per README Section 5:
- Set n=252 (1 year of trading days)
- Set m=21 (1 month forecast)
- Choose date t = 03/31/2025

Test with one stock (NVDA) to verify the code works.

In [None]:
# Initialize experiment runner with test configuration
test_runner = ChronosExperimentRunner(config=TEST_CONFIG)

# Prepare data for single test
test_date = datetime(2025, 3, 31)
n = 252  # 1 year history
m = 21   # 1 month forecast

print(f"\nTest Experiment Configuration:")
print(f"  Target date: {test_date.strftime('%Y-%m-%d')}")
print(f"  History length (n): {n} days")
print(f"  Forecast horizon (m): {m} days")
print(f"  Test series: NVDA")

In [None]:
# Run single test experiment
stocks_df['item_id'] = 'stocks'

result = test_runner.run_single_experiment(
    df=stocks_df,
    target_date=test_date,
    n=n,
    m=m,
    series_name='NVDA',
    dataset_type='stocks'
)

if result:
    print("\n" + "="*60)
    print("TEST EXPERIMENT RESULTS")
    print("="*60)
    print(f"\nUnivariate (UV) Metrics:")
    print(f"  RMSE: {result['uv_rmse']:.4f}")
    print(f"  MAPE: {result['uv_mape']:.2f}%")
    print(f"\nMultivariate (MV) Metrics:")
    print(f"  RMSE: {result['mv_rmse']:.4f}")
    print(f"  MAPE: {result['mv_mape']:.2f}%")
    print(f"\nComparison:")
    print(f"  RMSE Improvement: {result['rmse_improvement_pct']:.2f}%")
    print(f"  MAPE Improvement: {result['mape_improvement_pct']:.2f}%")
    print(f"  MV Better (MAPE): {result['mv_better_mape']}")
else:
    print("Test experiment failed - check data availability")

In [None]:
# Visualize test results
if result:
    fig, ax = plt.subplots(figsize=(14, 6))
    
    timestamps = pd.date_range(start=test_date, periods=m+1, freq='B')[1:]
    
    ax.plot(timestamps, result['actual_values'], 'ko-', label='Actual', linewidth=2, markersize=6)
    ax.plot(timestamps, result['uv_predictions'], 'b^--', label='UV Forecast', linewidth=2, markersize=6)
    ax.plot(timestamps, result['mv_predictions'], 'rs--', label='MV Forecast', linewidth=2, markersize=6)
    
    ax.set_xlabel('Date', fontsize=12)
    ax.set_ylabel('NVDA Stock Price ($)', fontsize=12)
    ax.set_title(f'Test Experiment: NVDA Forecast (n={n}, m={m})', fontsize=14, fontweight='bold')
    ax.legend(fontsize=11)
    ax.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    print("✓ Test experiment completed successfully!")

## Step 3: Experiment 1 - Magnificent-7 Stocks (K=7)

Per README:
- Vary n: α = {0.5, 1, 2, 3} where n = α * 252
- Vary m: {21, 63} (1 month, 3 months)
- Time period: 01/01/2000 to 09/30/2025, monthly rolling
- Forecast all 7 stocks

In [None]:
# Initialize runner with full configuration
runner = ChronosExperimentRunner(config=DEFAULT_CONFIG)

# Prepare stocks data
stocks_df['item_id'] = 'stocks'
stock_series = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA']

print("\nExperiment 1: Magnificent-7 Stocks")
print("="*60)
print(f"Series: {stock_series}")
print(f"Alpha values: {DEFAULT_CONFIG.alpha_values}")
print(f"Forecast horizons: {DEFAULT_CONFIG.forecast_horizons}")
print(f"Time period: {DEFAULT_CONFIG.start_date} to {DEFAULT_CONFIG.end_date}")
print(f"Rolling step: {DEFAULT_CONFIG.step_months} month(s)")

In [None]:
# Run experiments for stocks
# WARNING: This will take a long time! Start with a subset for testing.

# For testing, use a smaller subset:
test_stocks = ['AAPL', 'NVDA']  # Just 2 stocks
test_config = ExperimentConfig(
    alpha_values=[1.0],  # Just n=252
    forecast_horizons=[21],  # Just 1 month
    start_date="2024-01-01",  # Recent data only
    end_date="2024-12-31",
    step_months=3  # Quarterly instead of monthly
)

test_runner_stocks = ChronosExperimentRunner(config=test_config)
stocks_results = test_runner_stocks.run_rolling_experiments(
    df=stocks_df,
    dataset_type='stocks',
    series_names=test_stocks
)

In [None]:
# Save stocks results
if stocks_results:
    json_path, csv_path = test_runner_stocks.save_results(stocks_results, output_dir="results/stocks")
    print(f"\n✓ Saved {len(stocks_results)} stock experiment results")

In [None]:
# Analyze stocks results
if stocks_results:
    df_stocks = pd.DataFrame(stocks_results)
    
    print("\n" + "="*60)
    print("STOCKS EXPERIMENT SUMMARY")
    print("="*60)
    
    # Overall MV win rate
    mv_win_rate = (df_stocks['mv_better_mape'].sum() / len(df_stocks)) * 100
    print(f"\nMV Win Rate (MAPE): {mv_win_rate:.1f}%")
    
    # Average improvements
    avg_mape_improvement = df_stocks['mape_improvement_pct'].mean()
    avg_rmse_improvement = df_stocks['rmse_improvement_pct'].mean()
    print(f"Average MAPE Improvement: {avg_mape_improvement:.2f}%")
    print(f"Average RMSE Improvement: {avg_rmse_improvement:.2f}%")
    
    # By series
    print("\nBy Series:")
    series_summary = df_stocks.groupby('series').agg({
        'mv_better_mape': 'mean',
        'mape_improvement_pct': 'mean',
        'uv_mape': 'mean',
        'mv_mape': 'mean'
    }).round(2)
    series_summary.columns = ['MV Win Rate', 'Avg MAPE Improvement %', 'Avg UV MAPE', 'Avg MV MAPE']
    display(series_summary)

## Step 4: Experiment 2 - Interest Rates (K=10)

Same parameter sweep for FRED interest rates.

In [None]:
# Prepare rates data
rates_df['item_id'] = 'rates'
rate_series = ['DGS3MO', 'DGS6MO', 'DGS1', 'DGS2', 'DGS3', 'DGS5', 'DGS7', 'DGS10', 'DGS20', 'DGS30']

# For testing, use subset
test_rates = ['DGS1', 'DGS10', 'DGS30']

test_runner_rates = ChronosExperimentRunner(config=test_config)
rates_results = test_runner_rates.run_rolling_experiments(
    df=rates_df,
    dataset_type='rates',
    series_names=test_rates
)

In [None]:
# Save rates results
if rates_results:
    json_path, csv_path = test_runner_rates.save_results(rates_results, output_dir="results/rates")
    print(f"\n✓ Saved {len(rates_results)} rate experiment results")

In [None]:
# Analyze rates results
if rates_results:
    df_rates = pd.DataFrame(rates_results)
    
    print("\n" + "="*60)
    print("INTEREST RATES EXPERIMENT SUMMARY")
    print("="*60)
    
    mv_win_rate = (df_rates['mv_better_mape'].sum() / len(df_rates)) * 100
    print(f"\nMV Win Rate (MAPE): {mv_win_rate:.1f}%")
    
    avg_mape_improvement = df_rates['mape_improvement_pct'].mean()
    print(f"Average MAPE Improvement: {avg_mape_improvement:.2f}%")
    
    print("\nBy Maturity:")
    series_summary = df_rates.groupby('series').agg({
        'mv_better_mape': 'mean',
        'mape_improvement_pct': 'mean'
    }).round(2)
    display(series_summary)

## Step 5: Experiment 3 - Combined Dataset (K=17)

Test with both stocks and interest rates together.

In [None]:
# Prepare combined data
combined_df['item_id'] = 'combined'

# Test with subset from each category
test_combined = ['AAPL', 'NVDA', 'DGS10', 'DGS30']

test_runner_combined = ChronosExperimentRunner(config=test_config)
combined_results = test_runner_combined.run_rolling_experiments(
    df=combined_df,
    dataset_type='combined',
    series_names=test_combined
)

In [None]:
# Save combined results
if combined_results:
    json_path, csv_path = test_runner_combined.save_results(combined_results, output_dir="results/combined")
    print(f"\n✓ Saved {len(combined_results)} combined experiment results")

## Step 6: Comparative Analysis

Answer the research questions.

In [None]:
# Compare stocks vs rates vs combined
if stocks_results and rates_results and combined_results:
    print("\n" + "="*60)
    print("COMPARATIVE ANALYSIS: STOCKS vs RATES vs COMBINED")
    print("="*60)
    
    df_stocks = pd.DataFrame(stocks_results)
    df_rates = pd.DataFrame(rates_results)
    df_combined = pd.DataFrame(combined_results)
    
    comparison = pd.DataFrame({
        'Dataset': ['Stocks (K=7)', 'Rates (K=10)', 'Combined (K=17)'],
        'MV Win Rate %': [
            (df_stocks['mv_better_mape'].sum() / len(df_stocks)) * 100,
            (df_rates['mv_better_mape'].sum() / len(df_rates)) * 100,
            (df_combined['mv_better_mape'].sum() / len(df_combined)) * 100
        ],
        'Avg MAPE Improvement %': [
            df_stocks['mape_improvement_pct'].mean(),
            df_rates['mape_improvement_pct'].mean(),
            df_combined['mape_improvement_pct'].mean()
        ],
        'Avg UV MAPE': [
            df_stocks['uv_mape'].mean(),
            df_rates['uv_mape'].mean(),
            df_combined['uv_mape'].mean()
        ],
        'Avg MV MAPE': [
            df_stocks['mv_mape'].mean(),
            df_rates['mv_mape'].mean(),
            df_combined['mv_mape'].mean()
        ]
    })
    
    display(comparison.round(2))
    
    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # MV Win Rate
    axes[0].bar(comparison['Dataset'], comparison['MV Win Rate %'], color=['#1f77b4', '#ff7f0e', '#2ca02c'])
    axes[0].set_ylabel('MV Win Rate (%)', fontsize=12)
    axes[0].set_title('MV Win Rate by Dataset', fontsize=13, fontweight='bold')
    axes[0].axhline(y=50, color='r', linestyle='--', alpha=0.5, label='50% baseline')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # MAPE Improvement
    axes[1].bar(comparison['Dataset'], comparison['Avg MAPE Improvement %'], color=['#1f77b4', '#ff7f0e', '#2ca02c'])
    axes[1].set_ylabel('Avg MAPE Improvement (%)', fontsize=12)
    axes[1].set_title('Average MAPE Improvement by Dataset', fontsize=13, fontweight='bold')
    axes[1].axhline(y=0, color='r', linestyle='--', alpha=0.5, label='No improvement')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## Step 7: Research Question Answers

Based on experimental results.

### 1. Do MV methods produce better predictions than UV with foundation models?

[Fill in based on results]

### 2. Is MV forecasting accuracy better for stocks versus interest rates?

[Fill in based on results]

### 3. Is MV forecasting better when both stocks and interest rates are forecast together?

[Fill in based on results]

### 4. Can we build a large-scale "world" forecasting model?

[Fill in based on results]

## Step 8: Time Period Analysis (Pre vs Post 2023)

Per README: Check if forecast errors increase after 2023 (potential training cutoff).

In [None]:
# Analyze performance before and after 2023
if stocks_results:
    df_stocks = pd.DataFrame(stocks_results)
    df_stocks['target_date'] = pd.to_datetime(df_stocks['target_date'])
    df_stocks['year'] = df_stocks['target_date'].dt.year
    
    pre_2023 = df_stocks[df_stocks['year'] < 2023]
    post_2023 = df_stocks[df_stocks['year'] >= 2023]
    
    if len(pre_2023) > 0 and len(post_2023) > 0:
        print("\n" + "="*60)
        print("TIME PERIOD ANALYSIS: Pre-2023 vs Post-2023")
        print("="*60)
        
        time_comparison = pd.DataFrame({
            'Period': ['Pre-2023', 'Post-2023'],
            'Avg UV MAPE': [pre_2023['uv_mape'].mean(), post_2023['uv_mape'].mean()],
            'Avg MV MAPE': [pre_2023['mv_mape'].mean(), post_2023['mv_mape'].mean()],
            'MV Win Rate %': [
                (pre_2023['mv_better_mape'].sum() / len(pre_2023)) * 100,
                (post_2023['mv_better_mape'].sum() / len(post_2023)) * 100
            ]
        })
        
        display(time_comparison.round(2))
        
        print("\nInterpretation:")
        print("If errors increase significantly post-2023, this suggests potential data leakage")
        print("(Chronos-2 may have been trained on pre-2023 data)")

## Notes

**Implementation follows README specifications:**

✓ Data download script for stocks and rates  
✓ RMSE and MAPE metrics per README formulas  
✓ Parameter sweep: α={0.5,1,2,3}, m={21,63}  
✓ Rolling monthly forecasts from 2000-2025  
✓ Single test experiment (n=252, m=21, t=03/31/2025)  
✓ Three datasets: Stocks (K=7), Rates (K=10), Combined (K=17)  
✓ Store all x and y data plus error metrics  
✓ UV vs MV comparison for all experiments  

**For full experiments:**
- Remove the test subsets and use full series lists
- Change test_config back to DEFAULT_CONFIG
- Be prepared for long runtime (hours to days depending on hardware)