## Remove Time-Series Model

While experimenting with different time-series models and trying to improve that part of the model's performance, I realised how complicated and hacky my work-arounds were to be able to include an ARIMA model in the ensemble. I couldn't get `darts` to work, for example, because they don't accept data without a consistent frequency. I could lie about the dates and just make all matches exactly a week apart, but that's not great. Also, the ARIMA model contributed almost nothing to the final ensemble, its predictions having a feature importance of about zero for the meta-estimator.

I've decided that maintaining the extra dependencies and code for a time-series model isn't worth the trouble, but it's good to check the impact just to be sure.

## Code Setup

In [1]:
%load_ext autoreload

In [11]:
%autoreload 2

import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.base import clone

from augury.ml_estimators import StackingEstimator
from augury.ml_data import MLData
from augury.settings import CV_YEAR_RANGE, SEED
from augury.model_tracking import score_model

np.random.seed(SEED)

In [4]:
data = MLData(train_year_range=(max(CV_YEAR_RANGE),))
data.data

2021-02-13 08:45:49,296 - kedro.io.data_catalog - INFO - Loading data from `full_data` (JSONDataSet)...


Unnamed: 0,Unnamed: 1,Unnamed: 2,team,oppo_team,round_type,venue,prev_match_oppo_team,oppo_prev_match_oppo_team,date,team_goals,team_behinds,score,...,oppo_rolling_prev_match_time_on_ground_skew,oppo_rolling_prev_match_time_on_ground_std,oppo_last_year_brownlow_votes_sum,oppo_last_year_brownlow_votes_max,oppo_last_year_brownlow_votes_min,oppo_last_year_brownlow_votes_skew,oppo_last_year_brownlow_votes_std,oppo_cum_matches_played,oppo_rolling_prev_match_goals_plus_rolling_prev_match_behinds,oppo_rolling_prev_match_goals_divided_by_rolling_prev_match_goals_plus_rolling_prev_match_behinds
Adelaide,1991,1,Adelaide,Hawthorn,Regular,Football Park,0,Melbourne,1991-03-22 03:56:00+00:00,24,11,155,...,0.0,0.0,72,15,0,1.565197,4.070433,80,1,0
Adelaide,1991,2,Adelaide,Carlton,Regular,Football Park,Hawthorn,Fitzroy,1991-03-31 03:56:00+00:00,12,9,81,...,0.0,0.0,51,16,0,2.449132,3.913203,60,1,0
Adelaide,1991,3,Adelaide,Sydney,Regular,S.C.G.,Carlton,Hawthorn,1991-04-07 03:05:00+00:00,19,18,132,...,0.0,0.0,33,7,0,1.403576,2.433862,92,1,0
Adelaide,1991,4,Adelaide,Essendon,Regular,Windy Hill,Sydney,North Melbourne,1991-04-13 03:30:00+00:00,6,11,47,...,0.0,0.0,71,13,0,1.262708,4.524495,69,1,0
Adelaide,1991,5,Adelaide,West Coast,Regular,Subiaco,Essendon,North Melbourne,1991-04-21 05:27:00+00:00,9,11,65,...,0.0,0.0,48,9,0,0.913203,3.218368,48,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Western Bulldogs,2021,19,Western Bulldogs,Adelaide,Regular,Eureka Stadium,Gold Coast,West Coast,2021-07-24 02:20:00+00:00,0,0,0,...,0.0,0.0,0,0,0,0.000000,0.000000,0,0,0
Western Bulldogs,2021,20,Western Bulldogs,Melbourne,Regular,M.C.G.,Adelaide,Gold Coast,2021-07-31 02:20:00+00:00,0,0,0,...,0.0,0.0,0,0,0,0.000000,0.000000,0,0,0
Western Bulldogs,2021,21,Western Bulldogs,Essendon,Regular,Docklands,Melbourne,Sydney,2021-08-07 02:20:00+00:00,0,0,0,...,0.0,0.0,0,0,0,0.000000,0.000000,0,0,0
Western Bulldogs,2021,22,Western Bulldogs,Hawthorn,Regular,York Park,Essendon,Collingwood,2021-08-14 02:11:00+00:00,0,0,0,...,0.0,0.0,0,0,0,0.000000,0.000000,0,0,0


## Check baseline model performance

Default model still has the ARIMA model.

In [5]:
stacking_estimator = StackingEstimator()

### Stacking model with ARIMA model

In [6]:
stacking_estimator_scores = score_model(stacking_estimator, data, n_jobs=-1)

stacking_estimator_scores

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


{'fit_time': array([77.07063055, 79.15697074, 81.10464835, 83.06620646, 66.62547207]),
 'score_time': array([1.14102602, 1.16394162, 1.02003813, 1.04491282, 0.95373678]),
 'test_neg_mean_absolute_error': array([-30.28812232, -29.43355381, -28.41689136, -26.57813729,
        -27.75947225]),
 'test_match_accuracy': array([0.73300971, 0.72463768, 0.67149758, 0.71014493, 0.64251208])}

In [7]:
print('Mean accuracy:', stacking_estimator_scores['test_match_accuracy'].mean())
print('Mean MAE:', abs(stacking_estimator_scores['test_neg_mean_absolute_error'].mean()))

Mean accuracy: 0.6963603958538529
Mean MAE: 28.495235406352286


### Stacking model without ARIMA model

In [29]:
stacking_no_arima = clone(stacking_estimator)
stacking_no_arima.pipeline.regressors = stacking_no_arima.pipeline.regressors[:-1]
[regressor.steps[-1][0] for regressor in stacking_no_arima.pipeline.regressors]

['extratreesregressor', 'eloregressor']

In [30]:
no_arima_scores = score_model(stacking_no_arima, data, n_jobs=-1)

no_arima_scores

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.1min finished


{'fit_time': array([61.81500745, 63.84841537, 66.91292977, 67.77418733, 54.83943653]),
 'score_time': array([0.58800077, 0.83151627, 0.53033352, 0.46322918, 0.47495866]),
 'test_neg_mean_absolute_error': array([-30.29299786, -29.44361589, -28.39776664, -26.59934979,
        -27.73168043]),
 'test_match_accuracy': array([0.74271845, 0.72463768, 0.67149758, 0.71497585, 0.63285024])}

In [32]:
print('Mean accuracy:', no_arima_scores['test_match_accuracy'].mean())
print('Mean MAE:', abs(no_arima_scores['test_neg_mean_absolute_error'].mean()))

Mean accuracy: 0.6973359598517893
Mean MAE: 28.49308211978655


In [34]:
print(
    'Mean accuracy gained:',
    no_arima_scores['test_match_accuracy'].mean() - stacking_estimator_scores['test_match_accuracy'].mean()
)
print(
    'Mean MAE lost:',
    abs(no_arima_scores['test_neg_mean_absolute_error'].mean()) - abs(stacking_estimator_scores['test_neg_mean_absolute_error'].mean())
)

Mean accuracy gained: 0.0009755639979364128
Mean MAE lost: -0.002153286565736323


## Conclusion

There was almost no change in the performance metrics, and that tiny change was even to improve slightly with the removal of ARIMA from the ensemble.