# ARIMA (AutoRegressive Integrated Moving Average)

ARIMA models explain a time series using **autoregression**, **differencing**, and **moving average** components. This notebook covers the mathematical theory, low-level NumPy implementations, and practical usage with sktime.

## ARIMA(p, d, q) Parameters

| Parameter | Description |
|-----------|-------------|
| **p** | Number of autoregressive (AR) lags |
| **d** | Number of differences to achieve stationarity |
| **q** | Number of moving-average (MA) lags |

---

## Mathematical Foundation

### 1. Autoregressive Process AR(p)

An AR(p) process models the current value as a linear combination of its past values:

$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t$$

Or in summation notation:

$$y_t = c + \sum_{i=1}^{p} \phi_i y_{t-i} + \epsilon_t$$

where:
- $c$ is a constant (intercept)
- $\phi_i$ are the AR coefficients
- $\epsilon_t \sim WN(0, \sigma^2)$ is white noise

**Characteristic Polynomial (Stationarity Condition):**

$$\Phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p$$

The process is **stationary** if all roots of $\Phi(z) = 0$ lie outside the unit circle ($|z| > 1$).

---

### 2. Moving Average Process MA(q)

An MA(q) process models the current value as a linear combination of current and past error terms:

$$y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q}$$

Or in summation notation:

$$y_t = c + \epsilon_t + \sum_{j=1}^{q} \theta_j \epsilon_{t-j}$$

where:
- $\theta_j$ are the MA coefficients
- $\epsilon_t \sim WN(0, \sigma^2)$

**Characteristic Polynomial (Invertibility Condition):**

$$\Theta(z) = 1 + \theta_1 z + \theta_2 z^2 + \cdots + \theta_q z^q$$

The process is **invertible** if all roots of $\Theta(z) = 0$ lie outside the unit circle.

---

### 3. ARMA(p, q) Process

Combining AR and MA components:

$$y_t = c + \sum_{i=1}^{p} \phi_i y_{t-i} + \epsilon_t + \sum_{j=1}^{q} \theta_j \epsilon_{t-j}$$

Using the **backshift operator** $B$ where $B^k y_t = y_{t-k}$:

$$\Phi(B) y_t = c + \Theta(B) \epsilon_t$$

---

### 4. Differencing for ARIMA

Non-stationary series can often be made stationary through differencing:

**First Difference:**
$$\nabla y_t = y_t - y_{t-1} = (1 - B) y_t$$

**d-th Order Difference:**
$$\nabla^d y_t = (1 - B)^d y_t$$

For $d = 2$:
$$\nabla^2 y_t = \nabla(\nabla y_t) = y_t - 2y_{t-1} + y_{t-2}$$

The **ARIMA(p, d, q)** model applies ARMA(p, q) to the differenced series:

$$\Phi(B)(1 - B)^d y_t = c + \Theta(B) \epsilon_t$$

---

## Low-Level NumPy Implementation

Let's build ARIMA from scratch to understand each component deeply.

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sktime.datasets import load_airline

# Reproducibility
np.random.seed(42)

# Load data
y = load_airline()
y.name = "Passengers"

# Convert to numpy for our implementations
y_values = y.values.astype(float)

### Differencing Functions

The differencing operator transforms a non-stationary series into a stationary one. We implement both forward differencing and its inverse for reconstructing forecasts.

### AR Model Fitting via Ordinary Least Squares

For an AR(p) model, we can estimate coefficients using **least squares regression**:

$$y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t$$

In matrix form: $\mathbf{y} = \mathbf{X}\boldsymbol{\phi} + \boldsymbol{\epsilon}$

The OLS solution is:

$$\hat{\boldsymbol{\phi}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$$

where $\mathbf{X}$ is the design matrix of lagged values.

### Stationarity Check via Characteristic Polynomial

An AR(p) process is stationary if all roots of the characteristic polynomial lie **outside** the unit circle.

$$\Phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p = 0$$

We check this condition for our fitted model:

### AR Prediction (Multi-Step Forecasting)

For multi-step forecasting, we recursively apply the AR model:

$$\hat{y}_{t+h} = c + \phi_1 \hat{y}_{t+h-1} + \phi_2 \hat{y}_{t+h-2} + \cdots + \phi_p \hat{y}_{t+h-p}$$

Note: For $h > 1$, we use previously forecasted values $\hat{y}$ when actual values aren't available.

### Complete ARIMA Forecast Function

Putting it all together: difference → fit AR → forecast → inverse difference

---

## Visualizations with Plotly

### ACF and PACF for Order Selection

**ACF (Autocorrelation Function)** shows correlation of series with its lagged values.  
**PACF (Partial Autocorrelation Function)** shows direct correlation after removing intermediate effects.

| Pattern | Suggests |
|---------|----------|
| ACF tails off, PACF cuts off at lag p | AR(p) |
| ACF cuts off at lag q, PACF tails off | MA(q) |
| Both tail off | ARMA(p, q) |

### Fitted Values vs Actual

### Residual Diagnostics (Plotly)

A well-specified ARIMA model should produce residuals that are:
1. **Uncorrelated** (white noise) - check via ACF of residuals
2. **Normally distributed** - check via histogram/Q-Q plot
3. **Homoscedastic** - check via residual plot over time

In [None]:
# Compute residuals
residuals = y_values[fitted_start_idx:fitted_start_idx + len(fitted_original)] - fitted_original
standardized_resid = (residuals - np.mean(residuals)) / np.std(residuals)

# Compute ACF of residuals
resid_acf = compute_acf(residuals, max_lag=20)

# Create diagnostic plots
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        "Residuals Over Time",
        "Histogram of Residuals",
        "ACF of Residuals",
        "Q-Q Plot"
    )
)

# 1. Residuals over time
fig.add_trace(
    go.Scatter(
        x=np.arange(len(residuals)),
        y=residuals,
        mode="lines+markers",
        marker=dict(size=4),
        line=dict(color="steelblue"),
        showlegend=False
    ),
    row=1, col=1
)
fig.add_hline(y=0, line_dash="dash", line_color="red", row=1, col=1)

# 2. Histogram
fig.add_trace(
    go.Histogram(
        x=residuals,
        nbinsx=25,
        marker_color="steelblue",
        showlegend=False
    ),
    row=1, col=2
)

# 3. ACF of residuals
conf_bound = 1.96 / np.sqrt(len(residuals))
fig.add_trace(
    go.Bar(
        x=np.arange(len(resid_acf)),
        y=resid_acf,
        marker_color="darkorange",
        showlegend=False
    ),
    row=2, col=1
)
fig.add_hline(y=conf_bound, line_dash="dash", line_color="red", row=2, col=1)
fig.add_hline(y=-conf_bound, line_dash="dash", line_color="red", row=2, col=1)

# 4. Q-Q plot (approximate)
sorted_resid = np.sort(standardized_resid)
n = len(sorted_resid)
theoretical_quantiles = np.array([np.percentile(np.random.randn(10000), 100 * (i + 0.5) / n) 
                                  for i in range(n)])
fig.add_trace(
    go.Scatter(
        x=theoretical_quantiles,
        y=sorted_resid,
        mode="markers",
        marker=dict(color="steelblue", size=5),
        showlegend=False
    ),
    row=2, col=2
)
# Add reference line
qq_min, qq_max = theoretical_quantiles.min(), theoretical_quantiles.max()
fig.add_trace(
    go.Scatter(
        x=[qq_min, qq_max],
        y=[qq_min, qq_max],
        mode="lines",
        line=dict(color="red", dash="dash"),
        showlegend=False
    ),
    row=2, col=2
)

fig.update_layout(
    title="ARIMA(2,1,0) Residual Diagnostics",
    height=600,
    showlegend=False
)
fig.update_xaxes(title_text="Time", row=1, col=1)
fig.update_yaxes(title_text="Residual", row=1, col=1)
fig.update_xaxes(title_text="Residual Value", row=1, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=2)
fig.update_xaxes(title_text="Lag", row=2, col=1)
fig.update_yaxes(title_text="ACF", row=2, col=1)
fig.update_xaxes(title_text="Theoretical Quantiles", row=2, col=2)
fig.update_yaxes(title_text="Sample Quantiles", row=2, col=2)

fig.show()

# Ljung-Box test statistic (simplified)
def ljung_box_stat(residuals: np.ndarray, max_lag: int) -> float:
    """Compute Ljung-Box Q statistic for autocorrelation test."""
    n = len(residuals)
    acf = compute_acf(residuals, max_lag)
    Q = n * (n + 2) * np.sum([acf[k]**2 / (n - k) for k in range(1, max_lag + 1)])
    return Q

Q_stat = ljung_box_stat(residuals, 10)
print(f"\nLjung-Box Q statistic (lag 10): {Q_stat:.2f}")
print("Large Q values suggest residual autocorrelation (model misspecification)")

In [None]:
# Compute fitted values on original scale using our NumPy implementation
y_diff = difference(y_values, d=1)
phi, c, _ = fit_ar_ols(y_diff, p=2)
fitted_diff = ar_fitted_values(y_diff, phi, c)

# Convert fitted values back to original scale
# For d=1: y_t = y_{t-1} + diff_t
# Fitted values start at index p=2 in differenced series
# Which corresponds to index p+d=3 in original series
fitted_original = y_values[2:-1] + fitted_diff  # Add previous actual to get fitted

# Create time index (use original index if available)
time_index = np.arange(len(y_values))

fig = go.Figure()

# Actual series
fig.add_trace(go.Scatter(
    x=time_index,
    y=y_values,
    name="Actual",
    mode="lines",
    line=dict(color="steelblue", width=2)
))

# Fitted values (starts from index 3 due to d=1 and p=2)
fitted_start_idx = 3
fig.add_trace(go.Scatter(
    x=time_index[fitted_start_idx:fitted_start_idx + len(fitted_original)],
    y=fitted_original,
    name="Fitted (NumPy)",
    mode="lines",
    line=dict(color="crimson", width=2, dash="dash")
))

fig.update_layout(
    title="ARIMA(2,1,0) Fitted Values vs Actual - NumPy Implementation",
    xaxis_title="Time Index",
    yaxis_title="Passengers",
    legend=dict(x=0.02, y=0.98),
    height=450
)

fig.show()

# Calculate in-sample metrics
mse = np.mean((y_values[fitted_start_idx:fitted_start_idx + len(fitted_original)] - fitted_original) ** 2)
rmse = np.sqrt(mse)
mae = np.mean(np.abs(y_values[fitted_start_idx:fitted_start_idx + len(fitted_original)] - fitted_original))
print(f"In-sample metrics (NumPy implementation):")
print(f"  RMSE: {rmse:.2f}")
print(f"  MAE:  {mae:.2f}")

In [None]:
def compute_acf(y: np.ndarray, max_lag: int) -> np.ndarray:
    """Compute autocorrelation function up to max_lag."""
    n = len(y)
    y_centered = y - np.mean(y)
    var = np.var(y)
    
    acf_values = np.zeros(max_lag + 1)
    for k in range(max_lag + 1):
        if k == 0:
            acf_values[k] = 1.0
        else:
            acf_values[k] = np.sum(y_centered[k:] * y_centered[:-k]) / (n * var)
    return acf_values


def compute_pacf(y: np.ndarray, max_lag: int) -> np.ndarray:
    """
    Compute partial autocorrelation using Durbin-Levinson recursion.
    PACF(k) is the correlation between y_t and y_{t-k} after removing 
    the effect of intermediate lags.
    """
    acf = compute_acf(y, max_lag)
    pacf_values = np.zeros(max_lag + 1)
    pacf_values[0] = 1.0
    
    if max_lag == 0:
        return pacf_values
    
    # Durbin-Levinson algorithm
    phi = np.zeros((max_lag + 1, max_lag + 1))
    phi[1, 1] = acf[1]
    pacf_values[1] = acf[1]
    
    for k in range(2, max_lag + 1):
        # Compute phi[k,k]
        num = acf[k] - np.sum(phi[k-1, 1:k] * acf[k-1:0:-1])
        den = 1 - np.sum(phi[k-1, 1:k] * acf[1:k])
        
        if abs(den) < 1e-10:
            phi[k, k] = 0
        else:
            phi[k, k] = num / den
        
        # Update other coefficients
        for j in range(1, k):
            phi[k, j] = phi[k-1, j] - phi[k, k] * phi[k-1, k-j]
        
        pacf_values[k] = phi[k, k]
    
    return pacf_values


# Compute ACF/PACF for the differenced series
y_diff = difference(y_values, d=1)
max_lag = 24
acf_vals = compute_acf(y_diff, max_lag)
pacf_vals = compute_pacf(y_diff, max_lag)

# Confidence bounds (approximate 95% CI)
n = len(y_diff)
conf_bound = 1.96 / np.sqrt(n)

# Create ACF/PACF plots with Plotly
fig = make_subplots(rows=1, cols=2, subplot_titles=("ACF (Autocorrelation)", "PACF (Partial Autocorrelation)"))

lags = np.arange(max_lag + 1)

# ACF plot
fig.add_trace(
    go.Bar(x=lags, y=acf_vals, name="ACF", marker_color="steelblue", showlegend=False),
    row=1, col=1
)
fig.add_hline(y=conf_bound, line_dash="dash", line_color="red", row=1, col=1)
fig.add_hline(y=-conf_bound, line_dash="dash", line_color="red", row=1, col=1)
fig.add_hline(y=0, line_color="black", row=1, col=1)

# PACF plot
fig.add_trace(
    go.Bar(x=lags, y=pacf_vals, name="PACF", marker_color="darkorange", showlegend=False),
    row=1, col=2
)
fig.add_hline(y=conf_bound, line_dash="dash", line_color="red", row=1, col=2)
fig.add_hline(y=-conf_bound, line_dash="dash", line_color="red", row=1, col=2)
fig.add_hline(y=0, line_color="black", row=1, col=2)

fig.update_layout(
    title="ACF and PACF of Differenced Airline Series",
    height=400,
    showlegend=False
)
fig.update_xaxes(title_text="Lag", row=1, col=1)
fig.update_xaxes(title_text="Lag", row=1, col=2)
fig.update_yaxes(title_text="Correlation", row=1, col=1)

fig.show()

print("Interpretation:")
print("- Significant spikes at lags 12, 24 suggest seasonal pattern (monthly data)")
print("- PACF cuts off suggesting AR component")
print("- Red dashed lines = 95% confidence bounds")

In [None]:
def arima_forecast(y: np.ndarray, p: int, d: int, q: int, h: int) -> dict:
    """
    Full ARIMA(p, d, q) forecasting pipeline.
    
    Note: This simplified implementation only handles the AR component (q=0).
    For full MA estimation, iterative methods like MLE are typically used.
    
    Pipeline:
    1. Apply d-order differencing
    2. Fit AR(p) model via OLS
    3. Generate h-step forecasts on differenced series
    4. Inverse difference to get forecasts on original scale
    
    Parameters
    ----------
    y : np.ndarray
        Original time series
    p : int
        AR order
    d : int
        Differencing order
    q : int
        MA order (not implemented, must be 0)
    h : int
        Forecast horizon
        
    Returns
    -------
    dict with keys:
        'forecast': h-step ahead forecasts in original scale
        'forecast_diff': forecasts on differenced scale
        'phi': AR coefficients
        'c': intercept
        'sigma2': noise variance
        'roots': characteristic polynomial roots
        'is_stationary': stationarity indicator
    """
    if q != 0:
        print("Warning: MA component not implemented. Using AR(p) only.")
    
    # Step 1: Difference the series
    y_diff = difference(y, d) if d > 0 else y.copy()
    
    # Step 2: Fit AR(p) model
    phi, c, sigma2 = fit_ar_ols(y_diff, p)
    
    # Step 3: Check stationarity
    roots, is_stationary = check_stationarity(phi)
    
    # Step 4: Forecast on differenced scale
    forecast_diff = ar_predict(y_diff, phi, c, h)
    
    # Step 5: Inverse difference to original scale
    if d > 0:
        forecast = inverse_difference(forecast_diff, y, d)
    else:
        forecast = forecast_diff
    
    return {
        'forecast': forecast,
        'forecast_diff': forecast_diff,
        'phi': phi,
        'c': c,
        'sigma2': sigma2,
        'roots': roots,
        'is_stationary': is_stationary
    }


# Apply our ARIMA implementation
result = arima_forecast(y_values, p=2, d=1, q=0, h=24)

print("=" * 50)
print("ARIMA(2, 1, 0) Results - NumPy Implementation")
print("=" * 50)
print(f"\nModel Parameters:")
print(f"  c (intercept):  {result['c']:.4f}")
print(f"  φ₁:             {result['phi'][0]:.4f}")
print(f"  φ₂:             {result['phi'][1]:.4f}")
print(f"  σ²:             {result['sigma2']:.4f}")
print(f"\nStationarity: {'✓ Yes' if result['is_stationary'] else '✗ No'}")
print(f"\nForecasts (first 6 of {len(result['forecast'])}):")
for i, f in enumerate(result['forecast'][:6]):
    print(f"  t+{i+1}: {f:.1f}")

In [None]:
def ar_predict(y: np.ndarray, phi: np.ndarray, c: float, h: int) -> np.ndarray:
    """
    Generate h-step ahead forecasts using AR model.
    
    Recursive forecasting:
    ŷ_{t+h} = c + φ₁ŷ_{t+h-1} + φ₂ŷ_{t+h-2} + ... + φₚŷ_{t+h-p}
    
    Parameters
    ----------
    y : np.ndarray
        Historical time series (at least p observations)
    phi : np.ndarray
        AR coefficients [φ₁, φ₂, ..., φₚ]
    c : float
        Intercept term
    h : int
        Forecast horizon
        
    Returns
    -------
    forecast : np.ndarray
        h-step ahead forecasts
    """
    p = len(phi)
    
    # Initialize with last p observations
    history = list(y[-p:])
    forecasts = []
    
    for _ in range(h):
        # y_hat = c + φ₁*y_{t-1} + φ₂*y_{t-2} + ...
        y_hat = c + np.dot(phi, history[-p:][::-1])
        forecasts.append(y_hat)
        history.append(y_hat)
    
    return np.array(forecasts)


def ar_fitted_values(y: np.ndarray, phi: np.ndarray, c: float) -> np.ndarray:
    """
    Compute in-sample fitted values for AR model.
    
    Parameters
    ----------
    y : np.ndarray
        Time series data
    phi : np.ndarray
        AR coefficients
    c : float
        Intercept
        
    Returns
    -------
    fitted : np.ndarray
        Fitted values (length = len(y) - p)
    """
    p = len(phi)
    n = len(y)
    fitted = np.zeros(n - p)
    
    for t in range(p, n):
        # Get lagged values [y_{t-1}, y_{t-2}, ..., y_{t-p}]
        lagged = y[t-p:t][::-1]
        fitted[t - p] = c + np.dot(phi, lagged)
    
    return fitted


# Generate forecasts on differenced series
h = 24  # Forecast horizon
forecast_diff = ar_predict(y_diff, phi, c, h)
print(f"AR(2) forecasts on differenced series (next {h} periods):")
print(f"  First 5: {forecast_diff[:5].round(2)}")

# Compute fitted values
fitted_diff = ar_fitted_values(y_diff, phi, c)
print(f"\nIn-sample fitted values: {len(fitted_diff)} observations")

In [None]:
def check_stationarity(phi: np.ndarray) -> tuple[np.ndarray, bool]:
    """
    Check if AR process is stationary by examining characteristic polynomial roots.
    
    Characteristic polynomial: Φ(z) = 1 - φ₁z - φ₂z² - ... - φₚzᵖ
    Process is stationary if all roots have |z| > 1
    
    Parameters
    ----------
    phi : np.ndarray
        AR coefficients [φ₁, φ₂, ..., φₚ]
        
    Returns
    -------
    roots : np.ndarray
        Roots of the characteristic polynomial
    is_stationary : bool
        True if all roots lie outside unit circle
    """
    # Polynomial coefficients: [1, -φ₁, -φ₂, ..., -φₚ]
    # np.roots expects highest degree first, so we reverse
    poly_coeffs = np.concatenate([[1], -phi])[::-1]
    roots = np.roots(poly_coeffs)
    
    is_stationary = np.all(np.abs(roots) > 1)
    
    return roots, is_stationary


# Check stationarity of our fitted AR(2) model
roots, is_stationary = check_stationarity(phi)

print("Characteristic Polynomial Analysis:")
print(f"  Φ(z) = 1 - ({phi[0]:.4f})z - ({phi[1]:.4f})z²")
print(f"\nRoots of Φ(z) = 0:")
for i, root in enumerate(roots):
    print(f"  z_{i+1} = {root:.4f}, |z_{i+1}| = {np.abs(root):.4f}")
print(f"\nStationarity: {'✓ STATIONARY' if is_stationary else '✗ NON-STATIONARY'}")
print("  (All roots must have |z| > 1 for stationarity)")

In [None]:
def fit_ar_ols(y: np.ndarray, p: int) -> tuple[np.ndarray, float, float]:
    """
    Fit an AR(p) model using Ordinary Least Squares.
    
    Model: y_t = c + φ₁y_{t-1} + φ₂y_{t-2} + ... + φₚy_{t-p} + ε_t
    
    OLS solution: φ̂ = (X'X)⁻¹X'y
    
    Parameters
    ----------
    y : np.ndarray
        Time series data
    p : int
        AR order (number of lags)
        
    Returns
    -------
    phi : np.ndarray
        AR coefficients [φ₁, φ₂, ..., φₚ]
    c : float
        Intercept term
    sigma2 : float
        Estimated noise variance
    """
    n = len(y)
    
    # Build design matrix X with lagged values
    # Each row: [1, y_{t-1}, y_{t-2}, ..., y_{t-p}]
    X = np.zeros((n - p, p + 1))
    X[:, 0] = 1  # Intercept column
    
    for i in range(p):
        X[:, i + 1] = y[p - 1 - i:n - 1 - i]
    
    # Target vector: [y_p, y_{p+1}, ..., y_{n-1}]
    y_target = y[p:]
    
    # OLS solution: β̂ = (X'X)⁻¹X'y
    XtX = X.T @ X
    Xty = X.T @ y_target
    beta = np.linalg.solve(XtX, Xty)  # More stable than inv(X'X) @ X'y
    
    c = beta[0]        # Intercept
    phi = beta[1:]     # AR coefficients
    
    # Estimate residual variance
    y_fitted = X @ beta
    residuals = y_target - y_fitted
    sigma2 = np.var(residuals, ddof=p + 1)  # Adjust for estimated parameters
    
    return phi, c, sigma2


# Fit AR(2) model on differenced airline data
y_diff = difference(y_values, d=1)
phi, c, sigma2 = fit_ar_ols(y_diff, p=2)

print("AR(2) Model on Differenced Series:")
print(f"  Intercept (c):    {c:.4f}")
print(f"  AR coefficients:  φ₁ = {phi[0]:.4f}, φ₂ = {phi[1]:.4f}")
print(f"  Noise variance:   σ² = {sigma2:.4f}")
print(f"\nModel equation:")
print(f"  ∇y_t = {c:.3f} + {phi[0]:.3f}·∇y_{{t-1}} + {phi[1]:.3f}·∇y_{{t-2}} + ε_t")

In [None]:
def difference(y: np.ndarray, d: int = 1) -> np.ndarray:
    """
    Apply d-th order differencing to a time series.
    
    Mathematical definition:
    ∇¹y_t = y_t - y_{t-1}
    ∇ᵈy_t = (1 - B)^d y_t
    
    Parameters
    ----------
    y : np.ndarray
        Original time series
    d : int
        Order of differencing
        
    Returns
    -------
    np.ndarray
        Differenced series (length = len(y) - d)
    """
    y_diff = y.copy()
    for _ in range(d):
        y_diff = np.diff(y_diff)
    return y_diff


def inverse_difference(y_diff: np.ndarray, y_orig: np.ndarray, d: int = 1) -> np.ndarray:
    """
    Reverse differencing to reconstruct the original scale.
    
    For d=1: y_t = y_{t-1} + ∇y_t
    
    Parameters
    ----------
    y_diff : np.ndarray
        Differenced forecast values
    y_orig : np.ndarray
        Original series (need last d values as anchors)
    d : int
        Order of differencing used
        
    Returns
    -------
    np.ndarray
        Reconstructed series in original scale
    """
    y_reconstructed = y_diff.copy()
    
    for i in range(d):
        # Use the last value from the original series at each differencing level
        anchor = y_orig[-(d - i)]
        y_reconstructed = np.cumsum(np.concatenate([[anchor], y_reconstructed]))[1:]
        # Actually we need to add from the anchor
        y_reconstructed = anchor + np.cumsum(y_diff if i == 0 else y_reconstructed)
        
    # Simpler approach for d=1 case (most common):
    if d == 1:
        anchor = y_orig[-1]
        return anchor + np.cumsum(y_diff)
    
    return y_reconstructed


# Demonstrate differencing
print("Original series (first 10):", y_values[:10])
y_diff1 = difference(y_values, d=1)
print("First difference (first 10):", y_diff1[:10])
y_diff2 = difference(y_values, d=2)
print("Second difference (first 10):", y_diff2[:10])

# Verify inverse differencing
y_forecast_diff = np.array([5, 10, 15])  # Example differenced forecast
y_reconstructed = inverse_difference(y_forecast_diff, y_values, d=1)
print(f"\nInverse differencing demo:")
print(f"Last original value: {y_values[-1]}")
print(f"Differenced forecast: {y_forecast_diff}")
print(f"Reconstructed: {y_reconstructed}")

---

## Practical Application with sktime

Now that we understand the theory and NumPy implementation, let's use the production-ready sktime library.

### Make the series more stationary (log + difference)

In [None]:
y_log = np.log(y)
y_diff = y_log.diff().dropna()

fig = px.line(y_diff, title="Log-differenced series")
fig

## ACF/PACF to guide p and q


In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(y_diff, ax=axes[0], lags=36)
plot_pacf(y_diff, ax=axes[1], lags=36, method="ywm")
plt.tight_layout()


## Fit the model with sktime


In [None]:
from sktime.forecasting.model_selection import temporal_train_test_split, ForecastingHorizon
from sktime.performance_metrics.forecasting import mean_absolute_error
from sktime.forecasting.arima import ARIMA
model = ARIMA(order=(1,1,1))


y_train, y_test = temporal_train_test_split(y, test_size=24)
fh = ForecastingHorizon(y_test.index, is_relative=False)

model.fit(y_train)
pred = model.predict(fh)

mae = mean_absolute_error(y_test, pred)
print(f"MAE: {mae:.3f}")



## Forecast plot


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_train.index.to_timestamp(), y=y_train, name="Train"))
fig.add_trace(go.Scatter(x=y_test.index.to_timestamp(), y=y_test, name="Test"))
fig.add_trace(go.Scatter(x=pred.index.to_timestamp(), y=pred, name="Forecast"))
fig.update_layout(title="ARIMA forecast vs actual")
fig

## Diagnostics

Check residuals for autocorrelation and non‑normality. A well‑specified ARIMA model leaves **white noise** residuals.


In [None]:
resid = y_test - pred
fig = px.histogram(resid, nbins=30, title="Residual distribution")
fig

In [None]:
# Generate forecasts using our NumPy implementation on the same train/test split
y_train_np = y_values[:len(y_train)]
y_test_np = y_values[len(y_train):]

# Our ARIMA(2,1,0) forecast
numpy_result = arima_forecast(y_train_np, p=2, d=1, q=0, h=len(y_test))
numpy_forecast = numpy_result['forecast']

# Compare with sktime forecast
fig = go.Figure()

# Training data
train_dates = y_train.index.to_timestamp() if hasattr(y_train.index, 'to_timestamp') else np.arange(len(y_train))
test_dates = y_test.index.to_timestamp() if hasattr(y_test.index, 'to_timestamp') else np.arange(len(y_train), len(y_train) + len(y_test))

fig.add_trace(go.Scatter(
    x=train_dates,
    y=y_train.values,
    name="Training Data",
    line=dict(color="steelblue", width=2)
))

# Actual test data
fig.add_trace(go.Scatter(
    x=test_dates,
    y=y_test.values,
    name="Actual (Test)",
    line=dict(color="gray", width=2)
))

# sktime forecast
fig.add_trace(go.Scatter(
    x=test_dates,
    y=pred.values,
    name="sktime ARIMA(1,1,1)",
    line=dict(color="crimson", width=2, dash="dash")
))

# NumPy forecast
fig.add_trace(go.Scatter(
    x=test_dates,
    y=numpy_forecast,
    name="NumPy ARIMA(2,1,0)",
    line=dict(color="green", width=2, dash="dot")
))

fig.update_layout(
    title="Forecast Comparison: NumPy Implementation vs sktime",
    xaxis_title="Date",
    yaxis_title="Passengers",
    legend=dict(x=0.02, y=0.98),
    height=500
)

fig.show()

# Compare MAE
numpy_mae = np.mean(np.abs(y_test_np - numpy_forecast))
sktime_mae = mean_absolute_error(y_test, pred)

print("Forecast Comparison:")
print(f"  sktime ARIMA(1,1,1) MAE:  {sktime_mae:.2f}")
print(f"  NumPy ARIMA(2,1,0) MAE:   {numpy_mae:.2f}")
print("\nNote: sktime uses MLE fitting and includes MA component,")
print("      while our NumPy version uses OLS with AR only.")

### NumPy vs sktime Comparison

Let's compare our from-scratch implementation with sktime's optimized ARIMA.

---

## Summary & When to Use ARIMA

### Key Takeaways

| Component | Formula | Purpose |
|-----------|---------|---------|
| **AR(p)** | $y_t = c + \sum_{i=1}^p \phi_i y_{t-i} + \epsilon_t$ | Capture autocorrelation |
| **I(d)** | $\nabla^d y_t = (1-B)^d y_t$ | Achieve stationarity |
| **MA(q)** | $y_t = c + \epsilon_t + \sum_{j=1}^q \theta_j \epsilon_{t-j}$ | Model shock persistence |

### When to Use ARIMA

✅ **Good for:**
- Univariate time series forecasting
- Series with trend (use differencing)
- Moderately seasonal series (consider SARIMA)
- When interpretability is important

⚠️ **Limitations:**
- Requires stationarity (must difference)
- Linear relationships only
- Univariate (no external regressors without ARIMAX)
- Manual order selection can be tricky

### Order Selection Tips

1. **Visual inspection**: Plot ACF/PACF of differenced series
2. **Automated**: Use `auto_arima` or information criteria (AIC, BIC)
3. **Validation**: Always check residual diagnostics
4. **Parsimony**: Prefer simpler models when performance is similar