# **Time Series**: Forecasting with Seasonal ARIMA

Source:  [https://github.com/d-insight/code-bank.git](https://github.com/d-insight/code-bank.git)  
License: [MIT License](https://opensource.org/licenses/MIT). See open source [license](LICENSE) in the Code Bank repository. 

-------------

## Overview

This illustration shows the use of Autoregressive Integrated Moving Average model (ARIMA) and a "Seasonal ARIMA" ("SARIMA") to predict univariate time series data with a seasonal component. The illustration uses monthly airline passenger data from 1949 to 1960.

Source: https://github.com/advaitsave/Introduction-to-Time-Series-forecasting-Python/blob/master/Time%20Series%20in%20Python.ipynb

-------------

## **Part 0**: Setup

### Import packages

In [None]:
# Import all packages 
import warnings
warnings.filterwarnings("ignore")

import pandas            as pd
import numpy             as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [16, 8]
plt.rcParams['lines.linewidth'] = 4
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Time series packages/classes
from statsmodels.tsa.statespace.sarimax import SARIMAX
from pmdarima.arima                     import auto_arima
from math                               import sqrt
import itertools

# Other imports
from sklearn.metrics import mean_squared_error, mean_absolute_error,\
                            median_absolute_error, mean_squared_log_error,\
                            mean_absolute_error, r2_score


### Helper functions

In [None]:
def evaluate_forecast(y_true, y_pred):
    """ 
    Function to evaluate time series forecast with R2, MAE, MSE, MSLE, and RMSE
    
    Args:
        y:list         ground truth data
        pred:list      predictions
        
    Return:
        DataFrame with all evaluation metrics
    """
    
    results = pd.DataFrame({'r2_score':r2_score(y_true, y_pred),
                           }, index=[0])
    results['mean_absolute_error'] = mean_absolute_error(y_true, y_pred)
    results['median_absolute_error'] = median_absolute_error(y_true, y_pred)
    results['mse'] = mean_squared_error(y_true, y_pred)
    results['msle'] = mean_squared_log_error(y_true, y_pred)
    results['rmse'] = np.sqrt(results['mse'])
    
    return results

## **Part 1**: Load data

In [None]:
df = pd.read_csv('data/international-airline-passengers.csv',header=None)
df.columns = ['year','passengers']

# set year-month as the index
df['year'] = pd.to_datetime(df['year'], format='%Y-%m')
df.index = pd.DatetimeIndex(df.year.values, freq = 'MS')
df.drop(['year'], axis=1, inplace=True)

print(df.shape)

In [None]:
df.plot()
plt.grid()
plt.show()

In [None]:
#divide into train and validation set
train = df[:int(0.75*(len(df)))]
valid = df[int(0.75*(len(df))):]

print('Train length: {}'.format(len(train)))
print('Valid length: {}'.format(len(valid)))

## **Part 2**: Fit SARIMA model

SARIMA adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

Trend Elements:

There are three trend elements that require configuration. They are the same as the ARIMA model, specifically:

- p: Trend autoregression order.
- d: Trend difference order.
- q: Trend moving average order.

Seasonal Elements:

There are four seasonal elements that are not part of ARIMA that must be configured; they are:

- P: Seasonal autoregressive order.
- D: Seasonal difference order.
- Q: Seasonal moving average order.
- m: The number of time steps for a single seasonal period. For example, an S of 12 for monthly data suggests a yearly seasonal cycle.

In [None]:
# SARIMA example

# suppreses convergence warnings 
warnings.filterwarnings("ignore")

# fit model
model = SARIMAX(train, order=(3, 1, 3), seasonal_order=(1, 1, 1, 6))
model_fit = model.fit(disp=False)

# set prediction range
start_index = valid.index.min()
end_index = valid.index.max()

# predictions
predictions_SARIMAX = model_fit.predict(start=start_index, end=end_index)

In [None]:
# report performance
mse = mean_squared_error(df[start_index:end_index], predictions_SARIMAX)
rmse = sqrt(mse)
print('RMSE: {}, MSE: {}'.format(rmse,mse))

In [None]:
plt.plot(df)
plt.plot(predictions_SARIMAX)
plt.grid()
plt.title('RMSE: %.4f'% rmse)
plt.show()

In [None]:
# evaluate predictions
evaluate_forecast(df[start_index:end_index], predictions_SARIMAX)

## **Part 3**: Fit auto ARIMA model

Auto ARIMA evaluates different ARIMA-type models of a univariate time series to find the best-fitting model. For details see: https://alkaline-ml.com/pmdarima/tips_and_tricks.html

In [None]:
# fit ARIMA models
model = auto_arima(train, trace=True, error_action='ignore', suppress_warnings=True, seasonal=True, m=6, stepwise=True)
model.fit(train)

In [None]:
# predict validation set 
predictions_AUTO = model.predict(n_periods = len(valid))
predictions_AUTO = pd.DataFrame(predictions_AUTO, index = valid.index, columns=['Prediction'])

In [None]:
# report performance
mse = mean_squared_error(df[start_index:end_index], predictions_AUTO)
rmse = sqrt(mse)
print('RMSE: {}, MSE: {}'.format(rmse, mse))

In [None]:
# plot the predictions for validation set
plt.plot(df.passengers, label='Train')
plt.plot(predictions_AUTO, label='Prediction')
plt.grid()
plt.title('RMSE: %.4f'% rmse)
plt.show()

In [None]:
# evaluate predictions
evaluate_forecast(df[start_index:end_index], predictions_AUTO)

## **Part 4**: Tune SARIMA model

Finally, we can tune the p, d, and q hyperparameters and compare performances.

In [None]:
p = d = q = range(1, 3)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

In [None]:
# suppreses convergence warnings 
warnings.filterwarnings("ignore")

# test different hyperparameter combinations
min_aic = 999999999
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = SARIMAX(train,
                          order=param,
                          seasonal_order=param_seasonal,
                          enforce_stationarity=False,
                          enforce_invertibility=False)
            
            results = mod.fit()
            print('ARIMA{} x {} 12 - AIC: {}'.format(param, param_seasonal, results.aic))
            
            #Check for best model with lowest AIC
            if results.aic < min_aic:
                min_aic = results.aic
                min_aic_model = results
        except:
            continue

In [None]:
# inspect model details of the best-fitting model 
min_aic_model.summary()


In [None]:
# Predictions (with confidence interval)
predictions_TUNED = min_aic_model.get_prediction(start=start_index, end=end_index, dynamic=False)

In [None]:
# report performance
mse = mean_squared_error(df[start_index:end_index], predictions_TUNED.predicted_mean)
rmse = sqrt(mse)
print('RMSE: {}, MSE: {}'.format(rmse, mse))

In [None]:
# plot predictions 
pred_ci = predictions_TUNED.conf_int()
ax = df['1949':].plot(label='observed')
predictions_TUNED.predicted_mean.plot(ax=ax, label='Forecast', alpha=.7, figsize=(14, 7))
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Passengers')
plt.legend()
plt.grid()
plt.title('RMSE: %.4f'% rmse)
plt.show()

In [None]:
# evaluate predictions
evaluate_forecast(df[start_index:end_index], predictions_TUNED.predicted_mean)

## **Part 5**: Compare model performances

In [None]:
df_performances = evaluate_forecast(df[start_index:end_index], predictions_SARIMAX)
df_performances = pd.concat([df_performances, evaluate_forecast(df[start_index:end_index], predictions_AUTO)], axis=0)
df_performances = pd.concat([df_performances, evaluate_forecast(df[start_index:end_index], predictions_TUNED.predicted_mean)], axis=0)
df_performances.index = ['SARIMAX', 'AUTO-ARIMA', 'TUNED-ARIMA']

df_performances