# A Demo on Backtesting M3 with Various Models

This notebook aims to
1. provide a simple demo how to backtest models with orbit provided functions.  
2. add transperancy how our accuracy metrics are derived in https://arxiv.org/abs/2004.08492.

Due to versioning and random seed, there could be subtle difference for the final numbers. This notebook should also be available in colab.

In [1]:
!pip install orbit-ml==1.0.13
!pip install fbprophet==0.7.1

Looking in indexes: https://yoober11:****@pypi.uberinternal.com/index
Collecting matplotlib==3.3.4
  Downloading https://pypi.uberinternal.com/packages/packages/7e/32/46285e083ce5b4a46468236e3073c794324700e62d7fbf26894ec390d99a/matplotlib-3.3.4-cp37-cp37m-macosx_10_9_x86_64.whl (8.5 MB)
[K     |████████████████████████████████| 8.5 MB 1.6 MB/s eta 0:00:011
Installing collected packages: matplotlib
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.4.2
    Uninstalling matplotlib-3.4.2:
      Successfully uninstalled matplotlib-3.4.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ts-benchmark 0.0.1 requires matplotlib==3.2.1, but you have matplotlib 3.3.4 which is incompatible.[0m
Successfully installed matplotlib-3.3.4
Looking in indexes: https://yoober11:****@pypi.uberinternal.com/index


In [2]:
import numpy as np
import tqdm
import pandas as pd
import statsmodels.api as sm
import inspect
import random
from fbprophet import Prophet
from statsmodels.tsa.statespace.sarimax import SARIMAX

import orbit
from orbit.models import DLT
from orbit.utils.dataset import load_m3monthly
from orbit.diagnostics.backtest import BackTester
from orbit.diagnostics.metrics import smape

In [3]:
seed=2021
n_sample=10
random.seed(seed)

We can load the m3 dataset from orbit repository.  For demo purpose, i set `n_sample` to be `10`. Feel free to adjust it or simply run the entire dataset.

In [4]:
data = load_m3monthly()
unique_keys = data['key'].unique().tolist()
if n_sample > 0:
    sample_keys = random.sample(unique_keys, 10)
    # just get the first 5 series for demo
    data = data[data['key'].isin(sample_keys)].reset_index(drop=True)
else:
    sample_keys = unique_keys
print(sample_keys)

['N2229', 'N2691', 'N2516', 'N1968', 'N1908', 'N2702', 'N1472', 'N2310', 'N2372', 'N2578']


In [5]:
data.columns

Index(['key', 'value', 'date'], dtype='object')

We need to provide some meta data such as date column, response column etc.

In [6]:
key_col='key'
response_col='value'
date_col='date'
seasonality=12

We also provide some setting mimic M3 (see https://forecasters.org/resources/time-series-data/m3-competition/) criteria.

In [7]:
backtest_args = {
    'min_train_len': 1, # not useful; a placeholder
    'incremental_len': 18,  # not useful; a placeholder
    'forecast_len': 18,
    'n_splits': 1,
    'window_type': "expanding",
}

We are using `DLT` here.  To use a multiplicative form, we need a natural log transformation of response.  Hence, we need to a wrapper for `DLT`.  We also need to build wrapper for signature prupose for `prophet` and `sarima`.
Note that prophet comes with its own multiplicative form.

In [12]:
class DLTMAPWrapper(object):
    def __init__(self, response_col, date_col, **kwargs):
        kw_params = locals()['kwargs']
        for key, value in kw_params.items():
            setattr(self, key, value)
        self.response_col = response_col
        self.date_col = date_col
        self.model = DLT(
            response_col=response_col,
            date_col=date_col,
            estimator='stan-map',
            **kwargs)

    def fit(self, df):
        df = df.copy()
        df[[self.response_col]] = df[[self.response_col]].apply(np.log1p)
        self.model.fit(df)

    def predict(self, df):
        df = df.copy()
        pred_df = self.model.predict(df)
        pred_df['prediction'] = np.clip(np.expm1(pred_df['prediction']).values, 0, None)
        return pred_df

In [9]:
class SARIMAXWrapper(object):
    def __init__(self, response_col, date_col, **kwargs):
        kw_params = locals()['kwargs']
        for key, value in kw_params.items():
            setattr(self, key, value)
        self.response_col = response_col
        self.date_col = date_col
        self.model = None
        self.df = None

    def fit(self, df):

        df_copy = df.copy()
        infer_freq = pd.infer_freq(df_copy[self.date_col])
        df_copy = df_copy.set_index(self.date_col)
        df_copy = df_copy.asfreq(infer_freq)
        endog = df_copy[self.response_col]
        sig = inspect.signature(SARIMAX)
        all_params = dict()
        for key in sig.parameters.keys():
            if hasattr(self, key):
                all_params[key] = getattr(self, key)
        self.df = df_copy
        self.model = SARIMAX(endog=endog, **all_params).fit(disp=False)

    def predict(self, df, **kwargs):
        df_copy = df.copy()
        infer_freq = pd.infer_freq(df_copy[self.date_col])
        df_copy = df_copy.set_index(self.date_col)
        df_copy = df_copy.asfreq(infer_freq)

        pred_array = np.array(self.model.predict(start=df_copy.index[0],
                                                 end=df_copy.index[-1],
                                                 **kwargs))

        out = pd.DataFrame({
            self.date_col: df[self.date_col],
            'prediction': pred_array
        })
        return out

In [10]:
class ProphetWrapper(object):
    def __init__(self, response_col, date_col, **kwargs):
        kw_params = locals()['kwargs']
        for key, value in kw_params.items():
            setattr(self, key, value)
        self.response_col = response_col
        self.date_col = date_col
        self.model = Prophet(**kwargs)

    def fit(self, df):
        sig = inspect.signature(Prophet)
        all_params = dict()
        for key in sig.parameters.keys():
            if hasattr(self, key):
                all_params[key] = getattr(self, key)
        object_type = type(self.model)
        self.model = object_type(**all_params)

        train_df = df.copy()
        train_df = train_df.rename(columns={self.date_col: "ds", self.response_col: "y"})
        self.model.fit(train_df)

    def predict(self, df):
        df = df.copy()
        df = df.rename(columns={self.date_col: "ds"})
        pred_df = self.model.predict(df)
        pred_df = pred_df.rename(columns={'yhat': 'prediction', 'ds': self.date_col})
        pred_df = pred_df[[self.date_col, 'prediction']]
        return pred_df

Declare model objects and run backtest. Score shows in the end.

In [13]:
dlt = DLTMAPWrapper(
    response_col=response_col,
    date_col=date_col,
    seasonality=seasonality,
    seed=seed,
)

sarima = SARIMAXWrapper(
    response_col=response_col,
    date_col=date_col,
    seasonality=seasonality,
    seed=seed,
)

prophet = ProphetWrapper(
    response_col=response_col,
    date_col=date_col,
)

In [14]:
all_scores = []

for key in tqdm.tqdm(sample_keys):
    # dlt
    df = data[data[key_col] == key]
    bt = BackTester(
        model=dlt,
        df=df,
        **backtest_args,
    )
    bt.fit_predict()
    scores_df = bt.score(metrics=[smape])
    scores_df[key_col] = key
    scores_df['model'] = 'dlt'
    all_scores.append(scores_df)
    # sarima
    df = data[data[key_col] == key]
    bt = BackTester(
        model=sarima,
        df=df,
        **backtest_args,
    )
    bt.fit_predict()
    scores_df = bt.score(metrics=[smape])
    scores_df[key_col] = key
    scores_df['model'] = 'sarima'
    all_scores.append(scores_df)
    # prophet
    df = data[data[key_col] == key]
    bt = BackTester(
        model=prophet,
        df=df,
        **backtest_args,
    )
    bt.fit_predict()
    scores_df = bt.score(metrics=[smape])
    scores_df[key_col] = key
    scores_df['model'] = 'prophet'
    all_scores.append(scores_df)


all_scores = pd.concat(all_scores, axis=0, ignore_index=True)

  warn('Non-stationary starting autoregressive parameters'
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
  warn('Non-stationary starting autoregressive parameters'
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
 20%|██        | 2/10 [00:06<00:25,  3.21s/it]INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
 30%|███       | 3/10 [00:09<00:21,  3.14s/it]INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_s

In [15]:
all_scores.groupby('model')['metric_values'].apply(np.mean).reset_index()

Unnamed: 0,model,metric_values
0,dlt,0.056457
1,prophet,0.111645
2,sarima,0.097403
