# Recursive Forecasting with machine learning

[Forecasting with Machine Learning - Course](https://www.trainindata.com/p/forecasting-with-machine-learning)

In this notebook, we carry out recursive forecasting to predict multiple steps into the future by using a Lasso regression.

In [51]:
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.metrics import root_mean_squared_error

from skforecast.recursive import ForecasterRecursive

# Load data

We will use the electricity demand dataset found [here](https://github.com/tidyverts/tsibbledata/tree/master/data-raw/vic_elec/VIC2015).

**Citation:**

Godahewa, Rakshitha, Bergmeir, Christoph, Webb, Geoff, Hyndman, Rob, & Montero-Manso, Pablo. (2021). Australian Electricity Demand Dataset (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4659727

**Description of data:**

A description of the data can be found [here](https://rdrr.io/cran/tsibbledata/man/vic_elec.html). The data contains electricity demand in Victoria, Australia, at 30 minute intervals over a period of 12 years, from 2002 to early 2015. There is also the temperature in Melbourne at 30 minute intervals and public holiday dates.

In [None]:
# Electricity demand.
url = "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv"
df = pd.read_csv(url)

df.drop(columns=["Industrial"], inplace=True)

# Convert the integer Date to an actual date with datetime type
df["date"] = df["Date"].apply(
    lambda x: pd.Timestamp("1899-12-30") + pd.Timedelta(x, unit="days")
)

# Create a timestamp from the integer Period representing 30 minute intervals
df["date_time"] = df["date"] + \
    pd.to_timedelta((df["Period"] - 1) * 30, unit="m")

df.dropna(inplace=True)

# Rename columns
df = df[["date_time", "OperationalLessIndustrial"]]

df.columns = ["date_time", "demand"]

# Resample to hourly
df = (
    df.set_index("date_time")
    .resample("h")
    .agg({"demand": "sum"})
)

df.head()

In [None]:
df.tail()

In [54]:
# Split into train and test

# We leave 2015 in the test set

end_train = '2014-12-31 23:59:59'
X_train = df.loc[:end_train]
X_test  = df.loc[end_train:]

## Plot time series

In [None]:
# plot the time series

fig, ax=plt.subplots(figsize=(7, 3))
X_train.plot(ax=ax, label='train')
X_test.plot(ax=ax, label='test')
ax.set_title('Hourly energy consumption.')
ax.legend(["train", "test"])
plt.show()

Too many time points, we don't see much.

Let's plot less time points.

In [None]:
# Let's zoom in to see a bit more detail:

fig, ax = plt.subplots(figsize=(7, 3))
X_train.tail(2000).plot(ax=ax)
X_test.plot(ax=ax)
ax.set_title('Hourly energy consumption.')
ax.legend(["train", "test"])
plt.show()

In [None]:
# And zooming in a bit more:

fig, ax = plt.subplots(figsize=(7, 3))
X_train.tail(500).plot(ax=ax)
X_test.head(500).plot(ax=ax)
ax.set_title('Hourly energy consumption.')
ax.legend(["train", "test"])
plt.show()

## Create and train forecaster

In [58]:
# Lasso regression model

lasso = Lasso(random_state=9)

In [None]:
forecaster = ForecasterRecursive(
    regressor=lasso,            # the machine learning model
    lags=[1, 24, 7*24],         # the lag features to create
    forecaster_id="recursive"
)

# fit the forecaster
forecaster.fit(y=X_train["demand"])

# print
forecaster

The forecaster stores a lot of important information. For example:
    
- `window size` tells us the amount of datapoints in the past that we need to be able to create all the features for the forecast.

- See that the window size coincides with the biggest lag feature.

It also contains the time window over which the Lasso was trained (`Training range`), which is important if we store the model for future use.

This trained forecaster is able to forecast from the time point right after this date out of the box. 

But if we were to forecast future values, we would have to feed the **historical data** needed to forecast that value to the forecaster. We will see this in action as we proceed with this notebook.

## Predict the next 24 hs after the training set

In [None]:
# Forecast the next 24 hours (starting on 
# last fit date: 2014-12-31 23:00:00 + 1 hr)

predictions = forecaster.predict(steps=24)

predictions.head()

Note that the first step in the horizon is 1 hr after the last point in the training set.

In [None]:
# Plot the forecast vs the actual

fig, ax = plt.subplots(figsize=(6, 3))
X_train.tail(100).plot(ax=ax, label='train')
X_test.head(24).plot(ax=ax, label='train')
predictions.plot(ax=ax, label='predictions')
plt.title("Lasso forecasting")
plt.ylabel('Energy demand per hour')
ax.legend(bbox_to_anchor=(1.3, 1.0));

Let's now calculate the error over those 24 hs. That is the difference between each forecast and the actual values.

In [None]:
# Prediction error

error_mse = mean_squared_error(
                y_true = X_test["demand"].head(24), # this is dirty, better to slice with pd offset or the right dates. 
                y_pred = predictions
            )

print(f"Test error (mse): {error_mse}")

In [None]:
# Prediction error

error_rmse = root_mean_squared_error(
                y_true = X_test["demand"].head(24), # this is dirty, better to slice with pd offset or the right dates. 
                y_pred = predictions,
            )

print(f"Test error (rmse): {error_rmse}")

Say we want to predict energy demand later in the future with the model we just trained.

First, we need to gather the data necessary to create the lags. 

And then pass that past data to the forecaster.

## Predict any time point in the future

In [None]:
# the amount of data in the past that we need 
# to create the features for the Lasso

forecaster.window_size

In [None]:
# Say we want to predict energy demand for 1st of February

forecast_start = '2015-02-01 00:00:00'

# We need the energy demand up to 168 hs before that point
past_data_available = X_test[:'2015-01-31 23:59:59'].tail(168)

past_data_available.head()

In [None]:
past_data_available.tail()

In [None]:
# Forecast next 24 hs starting Feb 2015

predictions = forecaster.predict(
    steps=24, 
    last_window=past_data_available["demand"], # we pass the data up to Feb 2015
)

predictions.head()

In [None]:
# Plot the forecast vs the actual

fig, ax = plt.subplots(figsize=(6, 3))
X_test['2015-01-31 23:59:59':].head(24).plot(ax=ax, label='test')
predictions.plot(ax=ax, label='predictions')
plt.title("Lasso forecasting")
plt.ylabel('Energy demand per hour')
ax.legend(bbox_to_anchor=(1.3, 1.0));

In [None]:
# Prediction error

error_mse = mean_squared_error(
                y_true = X_test['2015-01-31 23:59:59':]["demand"].head(24),
                y_pred = predictions
            )

print(f"Test error (mse): {error_mse}")

In [None]:
# Prediction error

error_rmse = root_mean_squared_error(
                y_true = X_test['2015-01-31 23:59:59':]["demand"].head(24),
                y_pred = predictions,
            )

print(f"Test error (rmse): {error_rmse}")

That's it! Now, we've trained a Lasso regression that we can use to forecast the next 24 hs based on historical data, at any point, provided we have the energy demand up to 144 hours before that point.