# Time Series Regression

## Purpose
- Explain a target series using exogenous signals and lags.
- Combine time-indexed features with historical values.
- Support causal and predictive analysis.

## Key questions this section answers
- Which external drivers influence the series?
- How many lags and transforms are needed?
- How do we validate without leakage?

## Topics
- Lagged features and distributed lags
- Exogenous variables and regression baselines
- Regularization for multicollinearity
- Evaluation on rolling windows

## References
- statsmodels (SARIMAX), scikit-learn regression, sktime regression


In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

rng = np.random.default_rng(9)

n = 180
t = np.arange(n)
exog = rng.normal(0, 1, n)
y = 0.05 * t + 1.5 * exog + rng.normal(0, 0.5, n)

series = pd.Series(y)

df = pd.DataFrame({"y": series, "exog": exog})
df["lag_1"] = df["y"].shift(1)

df = df.dropna()
X = np.column_stack([np.ones(len(df)), df["exog"].values, df["lag_1"].values])
coef, *_ = np.linalg.lstsq(X, df["y"].values, rcond=None)
pred = X @ coef

fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=df["y"], name="actual"))
fig.add_trace(go.Scatter(x=df.index, y=pred, name="regression fit"))
fig.update_layout(
    title="Time series regression with exogenous feature + lag",
    xaxis_title="t",
    yaxis_title="y",
)
fig.show()


## Takeaway
Feature extraction turns time series into a tabular regression problem that is easy to baseline.

