# *Exponential Smoothing*

In [7]:
import pandas as pd
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import plotly.graph_objects as go
from sklearn.metrics import mean_absolute_percentage_error
import numpy as np
from sklearn.model_selection import TimeSeriesSplit

In [8]:
wks_df = pd.read_pickle('data/LCL_unstack_wks.pkl')

In [9]:
wks_df.index

DatetimeIndex(['2012-01-01', '2012-01-08', '2012-01-15', '2012-01-22',
               '2012-01-29', '2012-02-05', '2012-02-12', '2012-02-19',
               '2012-02-26', '2012-03-04',
               ...
               '2013-12-29', '2014-01-05', '2014-01-12', '2014-01-19',
               '2014-01-26', '2014-02-02', '2014-02-09', '2014-02-16',
               '2014-02-23', '2014-03-02'],
              dtype='datetime64[ns]', name='DateTime', length=114, freq='W-SUN')

Exponential smoothing is a time series method that considers a window of past data points, assigning exponentially reduced weights to points further away to that being predicted. A simple exponential smoothing algorithm looks like:

$$ \ell_0 = y_0 $$
$$ \hat y_{t+h|t} = \ell_t = \alpha y_t + (1-\alpha)\ell_{t -1}$$
$$t>0, \ \ 0<\alpha<1$$

Where $\hat y_{t+h|t}$ is the predicted point, $t$ is time, $y$ is the actual data point, and $\alpha$ is a smoothing factor.

Expanding to $\ell_{t-2}$, and substituting $\ell_{t-1}$ we can see the exponential nature of this formula.

$$  \ell_{t-1} = \alpha y_{t-1} + (1-\alpha)\ell_{t-2} \\
\ell_t = \alpha y_t + (1-\alpha)y_{t -1} +(1-\alpha)^2\ell_{t-2}$$

We can see that as the number of terms increases, the weighing progresses geometrically by a factor of $(1-\alpha)$.

The Holt-Winters method, otherwise known as triple exponential smoothing, is a modification of simple exponential smoothing. As the name suggests, two additional formulae are considered, one accounting for a trend in the data, and another for the seasonality.

For a multiplicative model:

$$ \ell_t = \alpha \frac {y_t}{s_{t-m}} + (1-\alpha)(\ell_{t-1} + b_{t-1})\\
b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta)b_{t-1} \\
s_t = \gamma \frac {y_t}{\ell_{t-1}+b_{t-1}}$$

Where $b$ is the trend component, $s$ is the seasonal component, $\beta$ and $\gamma$ are their respective smoothing factors, and $m$ is the seasonal period.

The multiplicative forecast equation combines the smoothing equations:

$$\hat y_{t+h|t} = (\ell_t + hb_t)s_{t+1+(h-1)\mod m}





---
## *Standard Tariff*

Whilst a cross validated evaluation would result in a better understanding of how the model would perform on other datasets and thus improve robustness, this was not possible as the `ExponentialSmoothing` API requires two full seasonal cycles to run. Running a cross validation naturally reduces the amount of data available and results in an error when feeding the smaller splits into the model. 

We will use only a single train-test split and evaluate the MAPEs, keeping in mind that this may not necessarily be an optimal comparison as there is a possibility that the forecast could fit especially well or poorly out of chance, especially given the extremely small test split that required for the model to have enough training data.

In [17]:
Std_test = wks_df['Std'].tail(round(0.09*wks_df["Std"].count()))
Std_train = wks_df['Std'].loc[wks_df.index < Std_test.index[0]]

In [35]:
ES_Std = ExponentialSmoothing(endog = Std_train, seasonal = 'mul', trend = 'mul', seasonal_periods=52).fit()

Std_pred_train = ES_Std.predict(Std_train.index[0], end=Std_train.index[-1])
Std_pred_test = ES_Std.predict(Std_test.index[0], end=Std_test.index[-1])

print(f'\n\n===============================\
    \nTrain MAPE: {round(mean_absolute_percentage_error(Std_train, Std_pred_train)*100, 2)}%\
    \nTest MAPE: {round(mean_absolute_percentage_error(Std_test, Std_pred_test)*100, 2)}%\
    \n===============================\
    \n\n{ES_Std.summary()}')



Train MAPE: 2.75%    
Test MAPE: 6.6%    

                       ExponentialSmoothing Model Results                       
Dep. Variable:                      Std   No. Observations:                  104
Model:             ExponentialSmoothing   SSE                              0.031
Optimized:                         True   AIC                           -732.294
Trend:                   Multiplicative   BIC                           -584.208
Seasonal:                Multiplicative   AICC                          -580.205
Seasonal Periods:                    52   Date:                 Sat, 05 Nov 2022
Box-Cox:                          False   Time:                         15:34:42
Box-Cox Coeff.:                    None                                         
                          coeff                 code              optimized      
---------------------------------------------------------------------------------
smoothing_level               1.0000000                alpha  

In [36]:
fig = go.Figure()
fig.add_trace(go.Line(x=Std_train.index, y=Std_train, mode='lines', name="Train"))
fig.add_trace(go.Line(x=Std_test.index, y=Std_test, mode='lines', name="Test"))
fig.add_trace(go.Line(x=Std_test.index, y=Std_pred_test, mode='lines', name="Test Predictions"))
fig.add_trace(go.Line(x=Std_train.index, y=Std_pred_train, mode='lines', name="Train Predictions"))
fig.update_xaxes(title_text = 'Date-Time')
fig.update_yaxes(title_text = 'Power, KW')
fig.update_layout(title = 'Exponential Smoothing Performance on Standard Tariff Data')
fig.show()


plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.




From the model summary, we can see that the train and test MAPEs are comparable to the SARIMA model for standard tariffs, however from the plot, it can be seen that the exponential smoothing model seems to do a better job of predicting  periods of larger variance i.e. the winter months. We do not see the same anomalous spike in February as we did with the SARIMA model.

Like the SARIMA model, exponential smoothing overpredicts the test data. This again could be attributed to the limited number of seasons captured in the dataset.

---

## *Variable Tariff*

In [30]:
ToU_test = wks_df['ToU'].tail(round(0.09*wks_df["ToU"].count()))
ToU_train = wks_df['ToU'].loc[wks_df.index < ToU_test.index[0]]

In [31]:
ES_ToU = ExponentialSmoothing(endog = ToU_train, seasonal = 'mul', trend = 'mul', seasonal_periods=52).fit()

ToU_pred_train = ES_ToU.predict(ToU_train.index[0], end=ToU_train.index[-1])
ToU_pred_test = ES_ToU.predict(ToU_test.index[0], end=ToU_test.index[-1])

print(f'\n\n===============================\
    \nTrain MAPE: {round(mean_absolute_percentage_error(ToU_train, ToU_pred_train)*100, 2)}%\
    \nTest MAPE: {round(mean_absolute_percentage_error(ToU_test, ToU_pred_test)*100, 2)}%\
    \n===============================\
    \n\n{ES_ToU.summary()}')



Train MAPE: 2.56%    
Test MAPE: 3.46%    

                       ExponentialSmoothing Model Results                       
Dep. Variable:                      ToU   No. Observations:                  104
Model:             ExponentialSmoothing   SSE                              0.024
Optimized:                         True   AIC                           -758.329
Trend:                   Multiplicative   BIC                           -610.244
Seasonal:                Multiplicative   AICC                          -606.241
Seasonal Periods:                    52   Date:                 Sat, 05 Nov 2022
Box-Cox:                          False   Time:                         15:19:14
Box-Cox Coeff.:                    None                                         
                          coeff                 code              optimized      
---------------------------------------------------------------------------------
smoothing_level               1.0000000                alpha 

In [37]:
fig = go.Figure()
fig.add_trace(go.Line(x=ToU_train.index, y=ToU_train, mode='lines', name="Train"))
fig.add_trace(go.Line(x=ToU_test.index, y=ToU_test, mode='lines', name="Test"))
fig.add_trace(go.Line(x=ToU_test.index, y=ToU_pred_test, mode='lines', name="Test Predictions"))
fig.add_trace(go.Line(x=ToU_train.index, y=ToU_pred_train, mode='lines', name="Train Predictions"))
fig.update_xaxes(title_text = 'Date-Time')
fig.update_yaxes(title_text = 'Power, KW')
fig.update_layout(title = 'Exponential Smoothing Performance on Variable Tariff Data')
fig.show()

For the variable tariff, the train and test MAPEs are significantly improved from both the standard tariff model, and the equivalent SARIMA model. The test errors are vastly reduced, this could be attributed to the fact that there is slightly less variance in the variable tariff data than there is in the standard tariff data, however without more seasonal cycles and a cross validation, it is difficult to tell whether this is the case, or the improvement is by chance given the small test set.