# Seasonal Naive Forecasting

A naive forecast uses simple rules to generate predictions. It is the **most important baseline** in time series. If your model does not beat this, stop and rethink.

**Naive (last):** \n\(\hat{y}_{t+h} = y_t\)

**Seasonal naive:** \n\(\hat{y}_{t+h} = y_{t+h-s}\)


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from sktime.datasets import load_airline

# Reproducibility
np.random.seed(42)

y = load_airline()
y.name = "Passengers"



## Train/Test split and forecasting horizon


In [None]:
from sktime.forecasting.model_selection import temporal_train_test_split, ForecastingHorizon

y_train, y_test = temporal_train_test_split(y, test_size=24)
fh = ForecastingHorizon(y_test.index, is_relative=False)



## Fit the model


In [None]:
from sktime.forecasting.naive import NaiveForecaster

model = NaiveForecaster(strategy="seasonal_last", sp=12)
model.fit(y_train)
pred = model.predict(fh)



## Visualize forecast


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_train.index.to_timestamp(), y=y_train, name="Train"))
fig.add_trace(go.Scatter(x=y_test.index.to_timestamp(), y=y_test, name="Test"))
fig.add_trace(go.Scatter(x=pred.index.to_timestamp(), y=pred, name="Forecast"))
fig.update_layout(title="Naive forecast vs actual")
fig

## Evaluate


In [None]:
from sktime.performance_metrics.forecasting import mean_absolute_error, mean_squared_error

mae = mean_absolute_error(y_test, pred)
rmse = mean_squared_error(y_test, pred, square_root=True)
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")


## When to use

- **Naive** is surprisingly strong for short horizons when the series is persistent.
- **Seasonal naive** is a must‑beat baseline for seasonal series.

These are also excellent debugging tools: if your evaluation is *worse* than naive, you likely have leakage or mis‑specification.


## Mathematical Foundation

### Seasonal Naive Forecast Formula

The seasonal naive method repeats the value from the same season in the previous cycle:

$$\hat{y}_{T+h} = y_{T+h-m(k+1)}$$

where:
- $T$ = last observed time point
- $h$ = forecast horizon (steps ahead)
- $m$ = seasonal period (e.g., 12 for monthly data with yearly seasonality)
- $k = \lfloor(h-1)/m\rfloor$ = number of complete seasonal cycles in the forecast horizon

### Alternative Formulation (Multiple Seasons)

For forecasts spanning multiple seasons:

$$\hat{y}_{T+h} = y_{T + ((h-1) \mod m) + 1 - m}$$

This directly maps each forecast step to its corresponding seasonal observation.

### Residual Variance Estimation

The variance of seasonal residuals is estimated as:

$$\hat{\sigma}^2 = \frac{1}{T-m}\sum_{t=m+1}^{T}(y_t - y_{t-m})^2$$

This measures the average squared difference between consecutive seasonal cycles.

### Prediction Intervals

Under the assumption of i.i.d. residuals, the $(1-\alpha)$ prediction interval is:

$$\hat{y}_{T+h} \pm z_{\alpha/2}\hat{\sigma}\sqrt{k+1}$$

where:
- $z_{\alpha/2}$ = critical value from standard normal distribution
- $\sqrt{k+1}$ = uncertainty grows with each complete seasonal cycle

## Low-Level NumPy Implementation

Building seasonal naive from scratch to understand the mechanics.

In [None]:
def seasonal_naive(y: np.ndarray, h: int, period: int) -> np.ndarray:
    """
    Generate seasonal naive forecasts.
    
    Parameters
    ----------
    y : np.ndarray
        Historical time series values
    h : int
        Forecast horizon (number of steps ahead)
    period : int
        Seasonal period (e.g., 12 for monthly data with yearly seasonality)
    
    Returns
    -------
    np.ndarray
        Forecasted values of length h
    """
    T = len(y)
    forecasts = np.zeros(h)
    
    for step in range(1, h + 1):
        # k = number of complete seasonal cycles
        k = (step - 1) // period
        # Index into historical data
        lag_index = T + step - period * (k + 1) - 1
        forecasts[step - 1] = y[lag_index]
    
    return forecasts


def compute_seasonal_residuals(y: np.ndarray, period: int) -> tuple[np.ndarray, float]:
    """
    Compute seasonal residuals and their standard deviation.
    
    Parameters
    ----------
    y : np.ndarray
        Historical time series values
    period : int
        Seasonal period
    
    Returns
    -------
    residuals : np.ndarray
        Seasonal differences (y_t - y_{t-period})
    sigma : float
        Estimated standard deviation of residuals
    """
    # Residuals: difference from same season in previous cycle
    residuals = y[period:] - y[:-period]
    
    # Variance estimate (using T-m degrees of freedom)
    sigma_sq = np.sum(residuals ** 2) / (len(y) - period)
    sigma = np.sqrt(sigma_sq)
    
    return residuals, sigma


def seasonal_naive_with_intervals(
    y: np.ndarray, 
    h: int, 
    period: int, 
    alpha: float = 0.05
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Seasonal naive forecast with prediction intervals.
    
    Parameters
    ----------
    y : np.ndarray
        Historical time series values
    h : int
        Forecast horizon
    period : int
        Seasonal period
    alpha : float
        Significance level (default 0.05 for 95% intervals)
    
    Returns
    -------
    forecasts : np.ndarray
        Point forecasts
    lower : np.ndarray
        Lower prediction interval bounds
    upper : np.ndarray
        Upper prediction interval bounds
    """
    from scipy import stats
    
    # Point forecasts
    forecasts = seasonal_naive(y, h, period)
    
    # Estimate residual standard deviation
    _, sigma = compute_seasonal_residuals(y, period)
    
    # Critical value for (1-alpha) confidence
    z = stats.norm.ppf(1 - alpha / 2)
    
    # Prediction interval width grows with each complete cycle
    intervals = np.zeros(h)
    for step in range(1, h + 1):
        k = (step - 1) // period
        intervals[step - 1] = z * sigma * np.sqrt(k + 1)
    
    lower = forecasts - intervals
    upper = forecasts + intervals
    
    return forecasts, lower, upper


# Test the implementations
y_values = y_train.values
h_test = len(y_test)
period = 12

# Basic seasonal naive
snaive_forecast = seasonal_naive(y_values, h_test, period)
print(f"Forecast shape: {snaive_forecast.shape}")
print(f"First 6 forecasts: {snaive_forecast[:6]}")

# With prediction intervals
forecast, lower, upper = seasonal_naive_with_intervals(y_values, h_test, period, alpha=0.05)
print(f"\nWith 95% prediction intervals:")
print(f"Forecast[0]: {forecast[0]:.1f} [{lower[0]:.1f}, {upper[0]:.1f}]")
print(f"Forecast[12]: {forecast[12]:.1f} [{lower[12]:.1f}, {upper[12]:.1f}]")

## Plotly Visualizations

### Seasonal Pattern Extraction

In [None]:
# Seasonal Pattern Extraction Visualization
# Reshape data to show seasonal patterns across years

n_complete_years = len(y_train) // period
y_seasonal = y_train.values[-(n_complete_years * period):].reshape(n_complete_years, period)

fig = go.Figure()

# Plot each year's seasonal pattern
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

for i in range(n_complete_years):
    year = y_train.index[-(n_complete_years - i) * period].year
    fig.add_trace(go.Scatter(
        x=months,
        y=y_seasonal[i],
        mode='lines+markers',
        name=f'{year}',
        opacity=0.6
    ))

# Highlight the last year (used for seasonal naive)
fig.add_trace(go.Scatter(
    x=months,
    y=y_seasonal[-1],
    mode='lines+markers',
    name='Last Year (Forecast Basis)',
    line=dict(width=4, color='red'),
    marker=dict(size=10)
))

fig.update_layout(
    title='Seasonal Pattern Extraction: Each Year Overlaid',
    xaxis_title='Month',
    yaxis_title='Passengers',
    hovermode='x unified'
)
fig

### Multi-Step Forecast with Prediction Intervals

In [None]:
# Multi-Step Forecast with Prediction Intervals
forecast, lower, upper = seasonal_naive_with_intervals(y_values, h_test, period, alpha=0.05)

# Convert forecast index to timestamps
forecast_dates = y_test.index.to_timestamp()
train_dates = y_train.index.to_timestamp()

fig = go.Figure()

# Training data
fig.add_trace(go.Scatter(
    x=train_dates,
    y=y_train.values,
    mode='lines',
    name='Training Data',
    line=dict(color='blue')
))

# Actual test values
fig.add_trace(go.Scatter(
    x=forecast_dates,
    y=y_test.values,
    mode='lines+markers',
    name='Actual',
    line=dict(color='green', dash='dot'),
    marker=dict(size=6)
))

# Prediction intervals (fill between)
fig.add_trace(go.Scatter(
    x=list(forecast_dates) + list(forecast_dates[::-1]),
    y=list(upper) + list(lower[::-1]),
    fill='toself',
    fillcolor='rgba(255, 165, 0, 0.2)',
    line=dict(color='rgba(255,255,255,0)'),
    name='95% Prediction Interval',
    showlegend=True
))

# Point forecast
fig.add_trace(go.Scatter(
    x=forecast_dates,
    y=forecast,
    mode='lines+markers',
    name='Seasonal Naive Forecast',
    line=dict(color='orange', width=2),
    marker=dict(size=6)
))

# Add vertical line at train/test split
split_date = train_dates[-1]
fig.add_vline(x=split_date, line_dash="dash", line_color="gray", 
              annotation_text="Train/Test Split")

fig.update_layout(
    title='Seasonal Naive Forecast with 95% Prediction Intervals',
    xaxis_title='Date',
    yaxis_title='Passengers',
    hovermode='x unified',
    legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01)
)
fig

### Residual Seasonality Check

If seasonal naive is appropriate, residuals should show no remaining seasonal pattern.

In [None]:
# Residual Seasonality Check
residuals, sigma = compute_seasonal_residuals(y_values, period)

# Create residual time index (starts at period+1)
residual_dates = y_train.index[period:].to_timestamp()

from plotly.subplots import make_subplots

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Seasonal Residuals Over Time',
        'Residual Distribution',
        'Residuals by Month (Seasonality Check)',
        'Residual ACF'
    )
)

# 1. Residuals over time
fig.add_trace(
    go.Scatter(x=residual_dates, y=residuals, mode='lines', name='Residuals',
               line=dict(color='steelblue')),
    row=1, col=1
)
fig.add_hline(y=0, line_dash="dash", line_color="red", row=1, col=1)

# 2. Distribution of residuals
fig.add_trace(
    go.Histogram(x=residuals, nbinsx=20, name='Distribution',
                 marker_color='steelblue', showlegend=False),
    row=1, col=2
)

# 3. Box plot by month (check for remaining seasonality)
months_residual = [(y_train.index[period:][i].month) for i in range(len(residuals))]
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

for m in range(1, 13):
    month_residuals = residuals[np.array(months_residual) == m]
    fig.add_trace(
        go.Box(y=month_residuals, name=month_names[m-1], showlegend=False,
               marker_color='steelblue'),
        row=2, col=1
    )

# 4. ACF of residuals
max_lag = min(24, len(residuals) - 1)
acf_values = np.array([np.corrcoef(residuals[:-lag], residuals[lag:])[0, 1] 
                       for lag in range(1, max_lag + 1)])
lags = np.arange(1, max_lag + 1)

# Confidence bounds (approximate 95% CI)
ci = 1.96 / np.sqrt(len(residuals))

fig.add_trace(
    go.Bar(x=lags, y=acf_values, name='ACF', marker_color='steelblue', showlegend=False),
    row=2, col=2
)
fig.add_hline(y=ci, line_dash="dash", line_color="red", row=2, col=2)
fig.add_hline(y=-ci, line_dash="dash", line_color="red", row=2, col=2)
fig.add_hline(y=0, line_color="black", row=2, col=2)

fig.update_layout(
    title=f'Residual Diagnostics (σ = {sigma:.2f})',
    height=600,
    showlegend=False
)

fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Residual Value", row=1, col=2)
fig.update_xaxes(title_text="Month", row=2, col=1)
fig.update_xaxes(title_text="Lag", row=2, col=2)

fig.update_yaxes(title_text="Residual", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=2)
fig.update_yaxes(title_text="Residual", row=2, col=1)
fig.update_yaxes(title_text="Correlation", row=2, col=2)

fig.show()

print(f"Residual Statistics:")
print(f"  Mean: {np.mean(residuals):.2f} (should be ~0)")
print(f"  Std Dev (σ): {sigma:.2f}")
print(f"  Min: {np.min(residuals):.2f}, Max: {np.max(residuals):.2f}")