# Exercise 2: Multiple dependent time series

[Forecasting for machine learning](https://www.trainindata.com/p/forecasting-with-machine-learning)

In this notebook we have an exercise to do multiple dependent time series forecasting. The solutions we show are only one way of answering these questions.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Data preparation

The dataset we shall use is the Quarterly overnight trips (in thousands) from 1998 Q1 to 2016 Q4 across
Australia. The number of trips is split by `State`, `Region`, and `Purpose`. 

**In this exercise we are going to forecast the total number of trips for each State (there are 8 states therefore we will have 8 time series). We shall treat this as a multivariate forecasting problem.**

Source: A new tidy data structure to support
exploration and modeling of temporal data, Journal of Computational and
Graphical Statistics, 29:3, 466-478, doi:10.1080/10618600.2019.1695624.
Shape of the dataset: (24320, 5)

In [None]:
from skforecast.datasets import fetch_dataset

# Load the data
data = fetch_dataset(name="australia_tourism", raw=True)
data.head()

Pre-process the data by performing the following:
1) Convert the `date_time` column to datetime type
2) Create a dataframe with one column per `Region` which gives the total number of Trips for each date.
3) Ensure the index is `date_time` and resampled to quarterly start `QS`


Check for missing values.

Assign the name of each state to a variable `states`. We will use this later.

# Exploratory data analysis

Print the number of data points in the time series, the start time, and the end time of the time series.

Plot the time series summed over all states.

Plot all of the time series.

It appears that there is yearly seasonality for these series and they appear to be anti-correlated (i.e., some areas experience peaks whilst others experience troughs).

Create a quarter of the year feature which could help with the yearly seasonality.

# Forecasting

Import the class needed for recursive forecasting for multiple dependent time series from `skforecast`.

Import a transformer from sklearn to scale the data.

Import a model of your choice.

Assign the names of the states to a `target_cols` variable and any exogenous features to an `exog_cols` variable.

Specify a forecast horizon and assign it to a variable `steps`. Try forecasting 8 quarters into the future.

Create a dataframe for the future values of any exogenous features.

Hint: `pd.DateOffset` and using `freq=QS` in `pd.date_range` might be helpful 

Forecast over each state using a for loop. Define a `ForecasterDirectMultiVariate` forecaster and experiment with the number of lags to use as a feature.

Plot the forecasts.