<a href="https://colab.research.google.com/github/djdunc/casa0018/blob/main/Week6/CASA0018_6_1_ARIMA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ARIMA

One of the most common methods used in time series forecasting is known as the ARIMA model, which stands for AutoregRessive Integrated Moving Average. 
ARIMA, is a forecasting method for univariate time series data that can be fitted to time series data in order to better understand or predict future points in the series.

ARIMA as it stands has limited support for seasonal data.

# Set Up Python Imports

In [None]:
import pandas as pd
import numpy as np

from pandas import datetime
from pandas import DataFrame
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from matplotlib import pyplot as plt
from pandas.plotting import autocorrelation_plot

from matplotlib.pyplot import figure

from google.colab import files
from pandas import read_csv


--------------------------------------------------------------------------------
# Air Passenger Data Set

The Box and Jenkins Airline Passengers dataset describes the total number of US international airline passengers over a period of time. The units are a count of the number of monthly international airline passengers in thousands. There are 144 monthly observations from 1949 to 1960. 

The data set and book it comes from are classics in time series analysis.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) Time Series Analysis, Forecasting and Control. Third Edition. Holden-Day. Series G.

Available from:
https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv

# Create Data Frame

In [None]:
uploaded = files.upload()

series = read_csv('airline-passengers.csv', header=0, index_col=0)
series.plot()
plt.show()

date_rng = pd.date_range('1949-01', periods=144, freq='MS')

# Define a dataframe using x and y values.
df = pd.DataFrame(data=series['Passengers'], index=date_rng, columns=['Passengers'])

autocorrelation_plot(df)
plt.show()


# Create ARIMA Model

Generally start with low values of p,d,q and use trial and error. 

However, we know the data has a trend so use d = 1 here.
The data is also seasonal so a p  term > 1 will help, but more on this later.
We can leave q = 0.

In [None]:
arima_model = ARIMA(df, order=(1, 1, 0) )
arima_model_fit = arima_model.fit()
print(arima_model_fit.summary())

# ARIMA Forecast Air Passengers

In [None]:
forecast_length=24
forecast_rng = pd.date_range('1961-01-01', periods=forecast_length, freq='MS')

forecast = arima_model_fit.forecast(steps=forecast_length)

plt.figure(figsize=(12,8))
plt.plot(df.index, df['Passengers'])
plt.plot(forecast_rng, forecast[0])
plt.show()

# Create SARIMA Model

Seasonal ARIMA (SARIMA) extends ARIMA to cope with seasonal data. It takes additional parameters (P,D,Q,M)

> SARIMA(p, d, q)(P, D, Q, M)

where P,D,Q are seasonal variants of (p,d,q) and M is the number of time steps for a single seasonal period - 12 for this data set.

In [None]:
sarima_model = SARIMAX(df, order=(1, 1, 0), seasonal_order=(1,1,0,12) )
sarima_model_fit = sarima_model.fit()
print(sarima_model_fit.summary())

# SARIMA Forecast Air Passengers

In [None]:
forecast = sarima_model_fit.forecast(steps=forecast_length)

plt.figure(figsize=(12,8))
plt.plot(df.index, df['Passengers'])
plt.plot(forecast_rng, forecast)
plt.show()
