# ARIMA

The Autoregressive Integrated Moving Average (ARIMA) model combines three concepts, autoregresion (AR), "integrated" (I) and moving-average (MA). We will look at these individually briefly before bringing them together to form ARIMA.

# Autoregressive Models

Autoregression is a timeseries forecasting method in which you forecast the subsequent value as a linear combination of the previous values (lags) of the same time series. The "auto" comes from the fact that we are regressing a target value against itself. Autoregressive models consider a finite number, $p$, of lags known as the *order*, which are notated as an $AR(p)$ model. An autoregression model can thus be expressed as
\begin{equation}
	y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \epsilon_t,
\end{equation}
where $y_t$ is the predicted value for the timeseries at time $t$, $\phi_i$ are the fitted lag coefficients, $\epsilon_t$ is a white noise process, and $p$ is the number of lagged components considered in the model. 

By considering certain special cases for the coefficients $\phi_i$, we can validate that the autogressive model can reproduce some standard timeseries processes:
- If $\phi_i = 0\ \forall\ i$, then $y_t$ is a white-noise process.
- If $\phi_1 = 1$, $\phi_i = 0\ \forall\ i \neq 1$ and $c=0$, then $y_t$ is a random walk.
- If $\phi_1 = 1$, $\phi_i = 0\ \forall\ i \neq 1$  and $c\neq0$, then $y_t$ is a random walk with drift.

### Weak Sense Stationarity Condition

Some constraints are imposed on the values for $\phi_i$ for the model to remain weak-sense stationary (causal). For example, any $AR(1)$ processes with $\lvert \phi_1 \rvert \geq 1$ is not stationary. The precise requirement is that the each complex root $z_i$ of the polynomial $1-\sum_{i=1}^p\phi_iz^i$ must satisfy $\lvert z_i \rvert > 1$. 

For AR(1) and AR(2) processes, this requirement can be written down fairly succinctly. 
- For an AR(1) process, this requirement corresponds to
\begin{equation}
\lvert \phi \rvert < 1.
\end{equation}
- For an $AR(2)$ process, the requirement is equivalent to
\begin{equation}
\phi_1 + \phi_2 < 1,\ \phi_2 - \phi_1 < 1, \qquad \textrm{and} \qquad \lvert\phi_2\rvert < 1.
\end{equation}

See P. 89 of Time series analysis and its applications: with R examples (3rd ed.) by Shumway, Robert H.; Stoffer, David for details.

### The Requirements for Autoregression

Autoregressive models assume stationarity in the underlying timeseries. Non-stationary time series exhibit changing statistical properties over time, including the mean, variance, and other moments. Applying an AR model to non-stationary data, can lead to inaccurate predictions.

### Order Selection

Since the partial autocorrelation of an $AR(p)$ process equals zero at lags larger than p, the appropriate maximum lag p is the one after which the partial autocorrelations are all zero.

### Calculation of the AR Parameters

Many ways are used to estimate the coefficients, among the most common being ordinary least squares (OLS), or the Yule-Walker equations. In particular the `statsmodels` [AutoReg model uses OLS](https://www.statsmodels.org/dev/examples/notebooks/generated/autoregressions.html).

# Order of Integration

In statistics, the [order of integration](https://en.wikipedia.org/wiki/Order_of_integration), denoted I(d), of a time series is a statistic, which reports the minimum number of differences required to obtain a stationary timeseries. Differencing de-trends a time series and tends to make the mean constant. You can apply differencing several times, but often the series is sufficiently stationary after a single differencing step.



## Moving Average Models

In [None]:
# Arima