## Choosing the best fit model for time series forecasting on air quality data

We compare four variants of ARIMA:
   - First Order Autoregressive (1, 0, 0)
   - Damped-trend linear exponential smoothing (1, 1, 2)
   - Differenced first order autoregresive (1, 1, 0)
   - Seasonal ARIMA (0, 1, 1) x (0, 1, 1, 12)
   
And, four machine learning methods:
   - Support Vector Regressor
   - XGBoost Regressor
   - Decision Tree Regressor
   - Random Forest Regressor

**The purpose of this experiment is to find the best fitted model for the air quality forecasting**

In [None]:
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from math import sqrt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.model_selection import train_test_split
import invutility as inv

from sklearn.svm import SVR
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

### Load the dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data/delhi.csv', parse_dates = ['date'])
df.set_index('date', inplace = True)
try:
    df.index.freq = 'D'
except:
    print('Cannot change frequency')
df.head()

### ARIMA Forecast

In [None]:
models = {'first_order_autoregressive':(1, 0, 0), 'damped_trend_lin_exp_smoothing':(1, 1, 2),
         'differenced_first_order_autoregressive':(1, 1, 0), 'sarima':(0, 1, 1, 12)}

In [None]:
model = ARIMA(df.PM25, order = models['first_order_autoregressive'], freq = 'D')
model_fit = model.fit()
pred = model_fit.predict()
inv.metrics(df.PM25, pred)

### Machine Learning Forecast

In [None]:
x, y = inv.build_matrix(df.PM25.values, look_back = 20)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, shuffle = False)

In [None]:
models = {'svr':SVR(kernel = 'rbf'), 'xgb':XGBRegressor(), 'dtr':DecisionTreeRegressor(),
         'rfr':RandomForestRegressor()}

**Note:** Change the name of model to see effect

In [None]:
model = models['svr'] # Try changing name of model. Pick name from models dictionary
original, predictions = inv.forecastML(model, x_train, y_train, x_test, y_test)
inv.metrics(original, predictions)

Complete.