# Time Series — Core Overview

Time series data are **ordered observations** collected over time. The ordering is not just a detail—it is the structure. We cannot shuffle time. This notebook builds the foundations you need for forecasting, anomaly detection, and time‑series modeling in general.

We will cover: structure & indexing, stationarity, trend/seasonality, autocorrelation, train/test splitting, baselines, and rolling backtests.


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from sktime.datasets import load_airline

# Reproducibility
np.random.seed(42)

y = load_airline()
y.name = "Passengers"



## 1) What makes time series special?

A time series \(y_t\) is indexed by time \(t\). Unlike i.i.d. data, consecutive observations are **dependent**. This dependence appears as:
- **Trend**: long‑term direction
- **Seasonality**: periodic patterns
- **Autocorrelation**: values correlate with lagged versions of themselves

The core modeling question is: **how does the future depend on the past?**


In [None]:
fig = px.line(y, title="Airline passengers over time")
fig.update_layout(xaxis_title="Time", yaxis_title="Passengers")
fig

## 2) Indexing & frequency

Time series are meaningful only with a well‑defined index. The index encodes frequency (monthly, daily, hourly), which drives seasonality and modeling decisions.

Common index types:
- `DatetimeIndex`: timestamps
- `PeriodIndex`: fixed frequency periods (often preferred for forecasts)

If frequency is missing or irregular, you must **resample** or **impute**.


In [None]:
# Inspect index type and frequency
print(type(y.index), y.index.freq)


## 3) Trend, seasonality, and noise (STL decomposition)

A classic decomposition view:

\[y_t = T_t + S_t + e_t\] (additive)

or

\[y_t = T_t 	imes S_t 	imes e_t\] (multiplicative)

STL (Seasonal‑Trend decomposition using Loess) separates these components.


In [None]:
from statsmodels.tsa.seasonal import STL

stl = STL(y, period=12).fit()
components = pd.DataFrame({
    "trend": stl.trend,
    "seasonal": stl.seasonal,
    "residual": stl.resid,
})

fig = go.Figure()
for col in components.columns:
    fig.add_trace(go.Scatter(x=components.index.to_timestamp(), y=components[col], name=col))
fig.update_layout(title="STL components (trend / seasonal / residual)")
fig

## 4) Stationarity

Many classical models (ARMA/ARIMA) assume **stationarity**: the statistical properties (mean, variance, autocorrelation) do not change over time.

Tests:
- **ADF** (Augmented Dickey‑Fuller): null = non‑stationary
- **KPSS**: null = stationary

If the series is non‑stationary, we **difference** it:

\[

abla y_t = y_t - y_{t-1}

\]

Seasonal differencing:

\[

abla_s y_t = y_t - y_{t-s}

\]


In [None]:
from statsmodels.tsa.stattools import adfuller, kpss

adf_stat, adf_p, *_ = adfuller(y)
kpss_stat, kpss_p, *_ = kpss(y, nlags="auto")
print(f"ADF p-value: {adf_p:.4f}")
print(f"KPSS p-value: {kpss_p:.4f}")


## 5) Autocorrelation (ACF) and Partial Autocorrelation (PACF)

Autocorrelation at lag \(k\):

\[ho_k = rac{\mathrm{Cov}(y_t, y_{t-k})}{\mathrm{Var}(y_t)}\]

- **ACF** shows correlation with all previous lags
- **PACF** isolates *direct* correlation at lag \(k\)


In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(y, ax=axes[0], lags=36)
plot_pacf(y, ax=axes[1], lags=36, method="ywm")
axes[0].set_title("ACF")
axes[1].set_title("PACF")
plt.tight_layout()


## 6) Forecasting horizons and splits

Time series splits must be **temporal**. We define a **forecasting horizon** (fh): the steps ahead we want to predict.

Common strategies:
- **Single split**: train/test once
- **Rolling (sliding)** window: fixed train length
- **Expanding** window: grows over time


In [None]:
from sktime.forecasting.model_selection import temporal_train_test_split, ForecastingHorizon

# 24 months test
y_train, y_test = temporal_train_test_split(y, test_size=24)
fh = ForecastingHorizon(y_test.index, is_relative=False)
print(y_train.shape, y_test.shape)



## 7) Baselines & metrics

Baselines anchor your expectations. If your fancy model doesn't beat a **naive** forecast, you should not deploy it.

Common metrics:
- MAE, RMSE, MAPE, sMAPE

**Note:** MAPE can explode when values approach zero.


In [None]:
from sktime.performance_metrics.forecasting import mean_absolute_error, mean_squared_error

# Example: naive baseline
from sktime.forecasting.naive import NaiveForecaster

model = NaiveForecaster(strategy="last")
model.fit(y_train)

pred = model.predict(fh)
mae = mean_absolute_error(y_test, pred)
rmse = mean_squared_error(y_test, pred, square_root=True)
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")



## 8) Backtesting (rolling evaluation)

Backtesting simulates how your model would have performed in the past by repeatedly training and forecasting on historical windows. This is essential for trustworthy evaluation.


In [None]:
from sktime.forecasting.model_selection import SlidingWindowSplitter
from sktime.forecasting.model_evaluation import evaluate

cv = SlidingWindowSplitter(fh=[1, 2, 3, 6, 12], window_length=60, step_length=12)
results = evaluate(
    model,
    y,
    cv=cv,
    strategy="refit",
    scoring=mean_absolute_error,
)
results.head()


## Summary

You now have the conceptual toolkit for time‑series forecasting: decomposition, stationarity, autocorrelation, proper evaluation, and baseline discipline. The next notebooks implement core models and show how these ideas translate to practice.
