# Theta Method Forecasting

The **Theta method**, introduced by Assimakopoulos & Nikolopoulos (2000), decomposes a time series into **trend + curvature** components, combining a linear extrapolation with a smoothed component. Despite its simplicity, it won the M3 forecasting competition and remains a robust baseline.

## Key Intuition

The Theta method works by:
1. **Decomposing** the series into "theta-lines" that amplify or dampen curvature
2. **Forecasting** each theta-line separately (linear trend for Œ∏=0, SES for Œ∏=2)
3. **Combining** the forecasts to leverage both trend stability and adaptive smoothing

---

## 1. Mathematical Foundation

### Theta Decomposition

The core idea is to transform the original series $y_t$ into **theta-lines** $Z_\theta(t)$ that modify the curvature:

$$Z_\theta(t) = \theta \cdot y_t + (1-\theta) \cdot L_t$$

where:
- $y_t$ is the original series
- $L_t$ is the **linear trend** fitted via OLS: $L_t = a + bt$
- $\theta$ controls curvature amplification

### Second Differences Property

The key mathematical insight is that theta-lines preserve the structure of second differences:

$$\nabla^2 Z_\theta(t) = \theta \cdot \nabla^2 y_t$$

where $\nabla^2 y_t = y_t - 2y_{t-1} + y_{t-2}$ (second difference).

**Interpretation:**
- When $\theta = 0$: $Z_0(t) = L_t$ (pure linear trend, no curvature)
- When $\theta = 1$: $Z_1(t) = y_t$ (original series, unchanged)
- When $\theta = 2$: $Z_2(t) = 2y_t - L_t$ (doubled curvature, amplified seasonality)

### Theta Lines Visualization Concept

| Theta Value | Line Name | Curvature | Use Case |
|-------------|-----------|-----------|----------|
| $\theta = 0$ | Trend Line | None (linear) | Long-term direction |
| $\theta = 1$ | Original | Normal | Baseline |
| $\theta = 2$ | Amplified | 2√ó original | Captures short-term patterns |

### Standard Theta Method (Œ∏=0 and Œ∏=2)

The classic Theta method uses only two lines:

1. **Œ∏=0 line (trend):** Forecast using linear extrapolation
   $$\hat{Z}_0(t+h) = a + b(t+h)$$

2. **Œ∏=2 line (amplified):** Forecast using **Simple Exponential Smoothing (SES)**
   $$\hat{Z}_2(t+h) = \text{SES}_\alpha(Z_2)$$

### Combination Forecast

The final forecast is the **simple average** of both theta-line forecasts:

$$\hat{y}_{t+h} = \frac{1}{2}\left[\hat{Z}_0(t+h) + \hat{Z}_2(t+h)\right]$$

This combination leverages:
- **Stability** from the linear trend (Œ∏=0)
- **Adaptability** from the smoothed amplified series (Œ∏=2)

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from sktime.datasets import load_airline

# Reproducibility
np.random.seed(42)

y = load_airline()
y.name = "Passengers"



In [None]:
---

## 2. Visualizing Theta Decomposition

Let's build intuition by visualizing how theta-lines transform the original series.

In [None]:
# Helper function to compute linear trend
def fit_linear_trend(y: np.ndarray) -> tuple:
    """
    Fit OLS linear trend: L_t = a + b*t
    
    Parameters:
    -----------
    y : np.ndarray
        Time series values
        
    Returns:
    --------
    tuple: (intercept, slope)
    """
    n = len(y)
    t = np.arange(1, n + 1)
    
    # OLS formulas
    t_mean = t.mean()
    y_mean = y.mean()
    
    slope = np.sum((t - t_mean) * (y - y_mean)) / np.sum((t - t_mean) ** 2)
    intercept = y_mean - slope * t_mean
    
    return intercept, slope

def compute_theta_line(y: np.ndarray, theta: float) -> np.ndarray:
    """
    Compute theta-line: Z_theta = theta * y + (1 - theta) * L
    
    Parameters:
    -----------
    y : np.ndarray
        Original time series
    theta : float
        Theta parameter (0 = trend, 1 = original, 2 = amplified)
        
    Returns:
    --------
    np.ndarray: Theta-transformed series
    """
    n = len(y)
    t = np.arange(1, n + 1)
    
    # Fit linear trend
    intercept, slope = fit_linear_trend(y)
    L = intercept + slope * t  # Linear trend values
    
    # Compute theta line
    Z_theta = theta * y + (1 - theta) * L
    
    return Z_theta, L

# Compute theta lines for visualization
y_values = y.values
Z_0, L = compute_theta_line(y_values, theta=0)  # Pure trend
Z_1, _ = compute_theta_line(y_values, theta=1)  # Original (should equal y)
Z_2, _ = compute_theta_line(y_values, theta=2)  # Amplified curvature

print(f"Linear Trend: L_t = {fit_linear_trend(y_values)[0]:.2f} + {fit_linear_trend(y_values)[1]:.2f} * t")

### Original Series with Theta-Lines Overlay

This visualization shows how different theta values transform the series:
- **Œ∏=0 (blue dashed):** Pure linear trend - removes all curvature
- **Œ∏=1 (original):** The actual time series
- **Œ∏=2 (red):** Amplified curvature - exaggerates seasonal patterns

In [None]:
# Plot theta lines overlay
fig = go.Figure()

timestamps = y.index.to_timestamp()

fig.add_trace(go.Scatter(
    x=timestamps, y=Z_0, 
    name="Œ∏=0 (Linear Trend)", 
    line=dict(dash="dash", color="blue", width=2)
))

fig.add_trace(go.Scatter(
    x=timestamps, y=y_values, 
    name="Œ∏=1 (Original Series)", 
    line=dict(color="black", width=2)
))

fig.add_trace(go.Scatter(
    x=timestamps, y=Z_2, 
    name="Œ∏=2 (Amplified Curvature)", 
    line=dict(color="red", width=1.5)
))

fig.update_layout(
    title="Theta-Lines: How Œ∏ Modifies Curvature",
    xaxis_title="Date",
    yaxis_title="Value",
    legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01),
    hovermode="x unified"
)
fig

### Decomposition: Trend vs Curvature

The theta method essentially decomposes the series into:
- **Trend Component ($L_t$):** The linear backbone
- **Curvature Component ($y_t - L_t$):** The deviation from linearity (seasonality + noise)

In [None]:
# Decomposition visualization
from plotly.subplots import make_subplots

curvature = y_values - L  # Deviation from linear trend

fig = make_subplots(
    rows=3, cols=1, 
    shared_xaxes=True,
    subplot_titles=("Original Series (y_t)", "Linear Trend (L_t)", "Curvature Component (y_t - L_t)"),
    vertical_spacing=0.08
)

fig.add_trace(
    go.Scatter(x=timestamps, y=y_values, name="Original", line=dict(color="black")),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=timestamps, y=L, name="Trend", line=dict(color="blue", dash="dash")),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=timestamps, y=curvature, name="Curvature", line=dict(color="green")),
    row=3, col=1
)
fig.add_hline(y=0, line_dash="dot", line_color="gray", row=3, col=1)

fig.update_layout(
    height=600,
    title_text="Theta Decomposition: Separating Trend from Curvature",
    showlegend=False
)
fig

### Combined Forecast with Confidence Bands

The final Theta forecast combines both lines. We'll add approximate confidence bands based on training residuals.

In [None]:
# Visualize individual theta-line forecasts
fig = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    subplot_titles=("Œ∏=0 Line: Trend Extrapolation", "Œ∏=2 Line: SES Forecast"),
    vertical_spacing=0.1
)

# Œ∏=0 line and forecast
fig.add_trace(
    go.Scatter(x=train_timestamps, y=result['Z_0'], name="Œ∏=0 (fitted)", 
               line=dict(color="blue")),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=test_timestamps, y=result['forecast_Z0'], name="Œ∏=0 (forecast)",
               line=dict(color="blue", dash="dash")),
    row=1, col=1
)

# Œ∏=2 line and forecast
fig.add_trace(
    go.Scatter(x=train_timestamps, y=result['Z_2'], name="Œ∏=2 (fitted)",
               line=dict(color="red")),
    row=2, col=1
)
fig.add_trace(
    go.Scatter(x=test_timestamps, y=result['forecast_Z2'], name="Œ∏=2 (forecast)",
               line=dict(color="red", dash="dash")),
    row=2, col=1
)

fig.update_layout(
    height=500,
    title_text="Individual Theta-Line Forecasts",
    showlegend=True
)
fig

### Individual Theta-Line Forecasts

Let's visualize how each theta-line is forecasted separately:
- **Œ∏=0 forecast:** Simple linear extrapolation (extends the trend)
- **Œ∏=2 forecast:** SES applied to the amplified series (flat forecast from final smoothed level)

In [None]:
def simple_exponential_smoothing(y: np.ndarray, alpha: float = None) -> tuple:
    """
    Simple Exponential Smoothing (SES) with optimal alpha selection.
    
    Parameters:
    -----------
    y : np.ndarray
        Time series values
    alpha : float, optional
        Smoothing parameter (0 < alpha < 1). If None, optimizes via grid search.
        
    Returns:
    --------
    tuple: (fitted_values, optimal_alpha, final_level)
    """
    if alpha is None:
        # Grid search for optimal alpha (minimize MSE)
        best_alpha, best_mse = 0.1, np.inf
        for a in np.linspace(0.01, 0.99, 99):
            level = y[0]
            mse = 0
            for i in range(1, len(y)):
                mse += (y[i] - level) ** 2
                level = a * y[i] + (1 - a) * level
            mse /= (len(y) - 1)
            if mse < best_mse:
                best_mse = mse
                best_alpha = a
        alpha = best_alpha
    
    # Apply SES with optimal alpha
    n = len(y)
    fitted = np.zeros(n)
    level = y[0]
    fitted[0] = level
    
    for i in range(1, n):
        level = alpha * y[i] + (1 - alpha) * level
        fitted[i] = level
    
    return fitted, alpha, level


def ses_forecast(y: np.ndarray, h: int, alpha: float = None) -> tuple:
    """
    Generate h-step ahead forecasts using SES.
    """
    fitted, alpha_used, final_level = simple_exponential_smoothing(y, alpha)
    forecasts = np.full(h, final_level)  # SES produces flat forecasts
    return forecasts, alpha_used


def theta_forecast(y: np.ndarray, h: int, alpha: float = None) -> dict:
    """
    Complete Theta method implementation.
    
    Combines:
    - Œ∏=0 line: Linear trend extrapolation
    - Œ∏=2 line: SES forecasts
    """
    n = len(y)
    
    # Step 1: Fit linear trend
    intercept, slope = fit_linear_trend(y)
    
    # Step 2: Compute theta lines
    t = np.arange(1, n + 1)
    L = intercept + slope * t
    Z_0 = L  # Œ∏=0 line (pure trend)
    Z_2 = 2 * y - L  # Œ∏=2 line (amplified curvature)
    
    # Step 3: Forecast Œ∏=0 line (linear extrapolation)
    t_future = np.arange(n + 1, n + h + 1)
    forecast_Z0 = intercept + slope * t_future
    
    # Step 4: Forecast Œ∏=2 line using SES
    forecast_Z2, alpha_used = ses_forecast(Z_2, h, alpha)
    
    # Step 5: Combine forecasts (simple average)
    combined_forecast = 0.5 * (forecast_Z0 + forecast_Z2)
    
    return {
        'forecast': combined_forecast,
        'forecast_Z0': forecast_Z0,
        'forecast_Z2': forecast_Z2,
        'Z_0': Z_0,
        'Z_2': Z_2,
        'trend': L,
        'intercept': intercept,
        'slope': slope,
        'alpha': alpha_used
    }

# Split data for forecasting
from sktime.forecasting.model_selection import temporal_train_test_split
y_train, y_test = temporal_train_test_split(y, test_size=24)
train_timestamps = y_train.index.to_timestamp()
test_timestamps = y_test.index.to_timestamp()

# Test our implementation
result = theta_forecast(y_train.values, h=24)
print(f"Theta Method Parameters:")
print(f"  Linear Trend: {result['intercept']:.2f} + {result['slope']:.2f} * t")
print(f"  SES Alpha: {result['alpha']:.4f}")
print(f"  Forecast horizon: 24 periods")

---

## 3. Low-Level NumPy Implementation

Now let's implement the complete Theta method from scratch using only NumPy. This helps understand exactly how the algorithm works.

---

## 4. Using sktime's ThetaForecaster

Now let's compare our NumPy implementation with sktime's optimized `ThetaForecaster`.

In [None]:
from sktime.forecasting.theta import ThetaForecaster
from sktime.forecasting.base import ForecastingHorizon

model = ThetaForecaster(sp=12)

fh = ForecastingHorizon(y_test.index, is_relative=False)

model.fit(y_train)
pred = model.predict(fh)

print(f"sktime ThetaForecaster fitted successfully")
print(f"Forecast horizon: {len(pred)} periods")

### sktime Forecast Visualization

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_train.index.to_timestamp(), y=y_train, name="Train"))
fig.add_trace(go.Scatter(x=y_test.index.to_timestamp(), y=y_test, name="Test"))
fig.add_trace(go.Scatter(x=pred.index.to_timestamp(), y=pred, name="Forecast"))
fig.update_layout(title="Theta forecast vs actual")
fig

In [None]:
# Compare our implementation vs sktime
sktime_mae = np.mean(np.abs(y_test.values - pred.values))
sktime_rmse = np.sqrt(np.mean((y_test.values - pred.values) ** 2))
sktime_mape = np.mean(np.abs((y_test.values - pred.values) / y_test.values)) * 100

# Comparison table
comparison_data = {
    'Implementation': ['NumPy (from scratch)', 'sktime ThetaForecaster'],
    'MAE': [mae, sktime_mae],
    'RMSE': [rmse, sktime_rmse],
    'MAPE (%)': [mape, sktime_mape]
}
comparison_df = pd.DataFrame(comparison_data)
print("Implementation Comparison:")
print(comparison_df.to_string(index=False))

# Plot both forecasts
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=train_timestamps, y=y_train.values,
    name="Training Data", line=dict(color="black")
))

fig.add_trace(go.Scatter(
    x=test_timestamps, y=y_test.values,
    name="Actual (Test)", line=dict(color="gray", width=2)
))

fig.add_trace(go.Scatter(
    x=test_timestamps, y=forecast,
    name="NumPy Implementation", line=dict(color="green", dash="solid")
))

fig.add_trace(go.Scatter(
    x=test_timestamps, y=pred.values,
    name="sktime ThetaForecaster", line=dict(color="purple", dash="dash")
))

fig.update_layout(
    title="NumPy vs sktime Theta Forecasts",
    xaxis_title="Date",
    yaxis_title="Passengers",
    legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01)
)
fig

---

## 5. When to Use the Theta Method

### ‚úÖ Strengths
- **Robust baseline:** Won the M3 competition despite simplicity
- **No hyperparameter tuning:** Only Œ± for SES (auto-optimized)
- **Fast computation:** Linear time complexity
- **Works well for seasonal data:** The Œ∏=2 line captures seasonality

### ‚ö†Ô∏è Limitations
- **Assumes linear trend:** May underperform with exponential/complex trends
- **Limited flexibility:** Only uses Œ∏=0 and Œ∏=2 (Generalized Theta extends this)
- **No exogenous variables:** Purely univariate

### üéØ Best Use Cases
1. **Forecasting competitions:** Strong baseline that's hard to beat
2. **Seasonal univariate series:** Particularly monthly/quarterly data
3. **Quick reliable forecasts:** When interpretability matters
4. **Benchmarking:** Compare against more complex models

### üìö References
- Assimakopoulos, V. & Nikolopoulos, K. (2000). "The theta model: a decomposition approach to forecasting"
- Hyndman, R.J. & Billah, B. (2003). "Unmasking the Theta method"