# AutoARIMA

ARIMA models explain a series using **autoregression**, **differencing**, and **moving average** components.

ARIMA(p, d, q):
- **p**: number of autoregressive lags
- **d**: number of differences to achieve stationarity
- **q**: number of moving‑average lags

A stationary ARMA(p, q) model is:

\[y_t = c + \sum_{i=1}^p \phi_i y_{t-i} + \epsilon_t + \sum_{j=1}^q \theta_j \epsilon_{t-j}\]


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from sktime.datasets import load_airline

# Reproducibility
np.random.seed(42)

y = load_airline()
y.name = "Passengers"



## Make the series more stationary (log + difference)


In [None]:
y_log = np.log(y)
y_diff = y_log.diff().dropna()

fig = px.line(y_diff, title="Log-differenced series")
fig.show()


## ACF/PACF to guide p and q


In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(y_diff, ax=axes[0], lags=36)
plot_pacf(y_diff, ax=axes[1], lags=36, method="ywm")
plt.tight_layout()


## Fit the model with sktime


In [None]:
from sktime.forecasting.model_selection import temporal_train_test_split, ForecastingHorizon
from sktime.performance_metrics.forecasting import mean_absolute_error
from sktime.forecasting.arima import AutoARIMA
model = AutoARIMA(sp=12, suppress_warnings=True, stepwise=True)


y_train, y_test = temporal_train_test_split(y, test_size=24)
fh = ForecastingHorizon(y_test.index, is_relative=False)

model.fit(y_train)
pred = model.predict(fh)

mae = mean_absolute_error(y_test, pred)
print(f"MAE: {mae:.3f}")



## Forecast plot


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_train.index.to_timestamp(), y=y_train, name="Train"))
fig.add_trace(go.Scatter(x=y_test.index.to_timestamp(), y=y_test, name="Test"))
fig.add_trace(go.Scatter(x=pred.index.to_timestamp(), y=pred, name="Forecast"))
fig.update_layout(title="ARIMA forecast vs actual")
fig.show()


## Diagnostics

Check residuals for autocorrelation and non‑normality. A well‑specified ARIMA model leaves **white noise** residuals.


In [None]:
resid = y_test - pred
fig = px.histogram(resid, nbins=30, title="Residual distribution")
fig.show()


## When to use

- Strong for univariate, moderately seasonal series
- Requires stationarity (differencing)
- AutoARIMA helps automate order selection, but diagnostics still matter
