# COMP0188 RNN tutorial

This tutorial discusses how to develop RNN's in PyTorch, in particular RNNs for time series prediction. The following topics are covered:
* Tensor structure for sequences
* Vanilla RNN model in Pytorch
* Stacked RNNs
* Predicting n-day values

__NOTE__: The code in this tutorial is sometimes unecessarily verbose to expose students to coding practices that are often helpful in developing machine learning pipelines. Where this is the case, the code will be marked with "# verbose code"

Connect environment to a GPU by:
* Select 'Runtime' in the top left
* Select 'Change Runtime Type'
* Select the GPU runtime available

In [1]:
!pip install wandb

In [1]:
# Used for debugging the notebook locally. Leave as False when running in collab!
local_testing = True
if local_testing:
    data_dir = "../../data"
else:
    try:
        from google.colab import drive
        drive.mount('/content/drive')
        data_dir = "/content/drive/MyDrive/comp0188/data"
    except ModuleNotFoundError:
        print("This notebook might be running locally!")


In [2]:
gpu = False

In [3]:
import wandb
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import os
from typing import Union, Callable, Tuple, List, Literal, Dict
from torch.autograd import Variable
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from tqdm import tqdm
import random
import logging
from abc import ABCMeta, abstractmethod

In [4]:
logger = logging.getLogger("rnn_tutorial")
logger.setLevel(logging.INFO)

In [5]:
WANDB_PROJ = "rnn_tutorial"
TMP_DIR = "./tmp"

if os.path.isdir(TMP_DIR):
    pass
else:
    os.mkdir(TMP_DIR)

## Predicting next day values
* For the first exercise, daily climate data will be used to develop a model that can predict the next day "meantemp".
* First of all, training and test datasets need to be manipulated such that they are in the following form:
| date | meantemp at time t | humidity at time t | wind_speed at time t | meanpressure at time t | meantemp at time t+1 |
| ---- | ---- | ---- | ---- | ---- | ---- |
| .... | .... | .... | .... | .... | .... | 


##### Train/test split
* The training dataset also needs to be split into a training and holdout set. When using any data where observations are non iid, data must be split to prevent "data leakage". Time series data is likely non-iid in the sense that future observations most likely depend on previous ones for example, it is reasonable to assume that the meantemp at time t+1 is dependant on the meantemp at time t.
* The dataset therefore needs to be split such that the ML models is not explosed to correlations which would not be available at test time.
* Given this, the final year of the training data is used as the holdout set

In [6]:
train_df = pd.read_csv(os.path.join(data_dir, "DailyDelhiClimateTrain.csv"))
test_df = pd.read_csv(os.path.join(data_dir, "DailyDelhiClimateTest.csv"))
date_var = "date"
train_df[date_var] = pd.to_datetime(train_df[date_var], format="%Y-%m-%d")
train_df.head()

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
0,2013-01-01,10.0,84.5,0.0,1015.666667
1,2013-01-02,7.4,92.0,2.98,1017.8
2,2013-01-03,7.166667,87.0,4.633333,1018.666667
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5


The double for loop in the code block below (also copied directly) below defines the train/test data structure (mentioned above):
| date | meantemp at time t | humidity at time t | wind_speed at time t | meanpressure at time t | meantemp at time t+1 |
| ---- | ---- | ---- | ---- | ---- | ---- |
| .... | .... | .... | .... | .... | .... | 

```python
steps = [1,5,10]
for df in [train_df, test_df]:
    for stp in steps:
        df[[f"{col}_{stp}_step" for col in non_date_vars]] = df[non_date_vars].shift(-1*stp)
```

However, it is generalised in the following ways:
* To compute not just the meantemp at t+1 but all features at time t+1;
* To compute t+step target values, not just t+1

This generalisation is useful for performing __Exercise 6b__ and __Exercise 8__.



In [7]:
# Dynamically select remaining columns
non_date_vars = [col for col in train_df.columns if col != date_var] # verbose code
# For the training and test set, create a new column per non-date variable for each step number in "steps"
steps = [1,5,10]
for df in [train_df, test_df]:
    for stp in steps:
        df[[f"{col}_{stp}_step" for col in non_date_vars]] = df[non_date_vars].shift(-1*stp)
train_df["__date_yrs"] = train_df["date"].dt.year
# Visualise time periods covered by the training data
print(train_df["__date_yrs"].value_counts())

# Select the time period for the holdout set
val_idx = train_df["__date_yrs"] >= 2016 # '__' at the start of variables not mean anything.
# I use it to indicate "intermediate" variables that can be easily dropped using the code below
val_df = train_df[val_idx].drop(columns=[col for col in train_df.columns if col[0:2] == "__"])
train_df = train_df[~val_idx].drop(columns=[col for col in train_df.columns if col[0:2] == "__"])
print(train_df.shape)
print(val_df.shape)
print(test_df.shape)
train_df.head()

__date_yrs
2016    366
2013    365
2014    365
2015    365
2017      1
Name: count, dtype: int64
(1095, 17)
(367, 17)
(114, 17)


Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure,meantemp_1_step,humidity_1_step,wind_speed_1_step,meanpressure_1_step,meantemp_5_step,humidity_5_step,wind_speed_5_step,meanpressure_5_step,meantemp_10_step,humidity_10_step,wind_speed_10_step,meanpressure_10_step
0,2013-01-01,10.0,84.5,0.0,1015.666667,7.4,92.0,2.98,1017.8,7.0,82.8,1.48,1018.0,15.714286,51.285714,10.571429,1016.142857
1,2013-01-02,7.4,92.0,2.98,1017.8,7.166667,87.0,4.633333,1018.666667,7.0,78.6,6.3,1020.0,14.0,74.0,13.228571,1015.571429
2,2013-01-03,7.166667,87.0,4.633333,1018.666667,8.666667,71.333333,1.233333,1017.166667,8.857143,63.714286,7.142857,1018.714286,15.833333,75.166667,4.633333,1013.333333
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667,6.0,86.833333,3.7,1016.5,14.0,51.25,12.5,1017.0,12.833333,88.166667,0.616667,1015.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5,7.0,82.8,1.48,1018.0,11.0,62.0,7.4,1015.666667,14.714286,71.857143,0.528571,1015.857143


Load training loop functions. These are broadly the same as those used in previous tutorials however with additional functionality that is discussed later

In [8]:
# verbose code
# Here a series of Debug classes are defined. The reason for using this structure will be discussed in class
class DebugPass:

    def __init__(self):
        self.y_pred = []
        self.y_true = []

    def debug(
        self, 
        y_true:torch.tensor, 
        y_pred:torch.tensor, 
    ):
        pass
        
    def close(self):
        pass

class DebugBase(DebugPass):

    def __init__(self):
        super().__init__()

    def debug(self, y_true, y_pred):
        self.y_true.append(y_true.detach().numpy())
        self.y_pred.append(y_pred.detach().numpy())

    @abstractmethod
    def close(self):
        pass

class DebugLocal(DebugBase):

    def __init__(self):
        super().__init__()

    def close(self):
        res_tbl = pd.DataFrame(
            {
                "y_true":np.concatenate(self.y_true, axis=0).squeeze().flatten(), 
                "y_pred":np.concatenate(self.y_pred, axis=0).squeeze().flatten()
            }
        )
        res_tbl.to_csv(os.path.join(TMP_DIR, "validate_debug.csv"), index=False)

class DebugWandB(DebugBase):

    def __init__(self):
        super().__init__()

    def close(self):
        res_tbl = pd.DataFrame(
            {
                "y_true":np.concatenate(self.y_true, axis=0).squeeze().flatten(), 
                "y_pred":np.concatenate(self.y_pred, axis=0).squeeze().flatten()
            }
        )
        wandb_tbl = wandb.Table(dataframe=res_tbl)
        wandb.log({"val_predictions" : wandb_tbl})

def train_single_epoch(model:nn.Module, data_loader:torch.utils.data.DataLoader, 
                       gpu:Literal[True, False], optimizer:torch.optim,
                       criterion:torch.nn.modules.loss
                      ) -> Tuple[List[torch.Tensor]]:
    model.train()
    losses = []
    preds = []
    range_gen = tqdm(
        enumerate(data_loader),
        )
    for i, (y,X) in range_gen:
        
        if gpu:
            X = X.cuda()
            y = y.cuda()
        else:
            X = Variable(X)
            y = Variable(y)
        
        optimizer.zero_grad()

        # Compute output
        output = model(X)
        preds.append(output)
        train_loss = criterion(output, y)
        losses.append(train_loss.item())

        # losses.update(train_loss.data[0], g.size(0))
        # error_ratio.update(evaluation(output, target).data[0], g.size(0))

        try: 
            # compute gradient and do SGD step
            train_loss.backward()
            
            optimizer.step()
        except RuntimeError as e:
            print("Runtime error on training instance: {}".format(i))
            raise e
    return losses, preds

def validate(model:nn.Module, data_loader:torch.utils.data.DataLoader,
             gpu:Literal[True, False], criterion:torch.nn.modules.loss,
             dh:DebugPass
            ) -> Tuple[List[torch.Tensor]]:
    
    model.eval()
    losses = []
    preds = []
    with torch.no_grad():
        range_gen = tqdm(
            enumerate(data_loader),
        )
        # Your code here
        for i, (y,X) in range_gen:
        
            if gpu:
                X = X.cuda()
                y = y.cuda()
            else:
                X = Variable(X)
                y = Variable(y)

            # Compute output
            output = model(X)

            # Logs
            losses.append(criterion(output, y).item())
            preds.append(output)
            dh.debug(y_true=y, y_pred=output)
    return losses, preds


def train(model:torch.nn, train_data_loader:torch.utils.data.DataLoader,
          val_data_loader:torch.utils.data.DataLoader, 
          gpu:Literal[True, False], optimizer:torch.optim,
          criterion:torch.nn.modules.loss, epochs:int, 
          debug:bool = False, wandb_proj:str="", 
          wandb_config:Dict={}
         ) -> Tuple[List[torch.Tensor]]:

    if (len(wandb_config) == 0) or (len(wandb_proj) == 0):
        use_wandb = False
        logger.warning("WandB not in use!")
        chkpnt_dir = TMP_DIR
    else:
        use_wandb = True
        wandb.init(project=wandb_proj, config=wandb_config)
        chkpnt_dir = wandb.run.dir

    if debug:
        if use_wandb:
            dh = DebugWandB()
        else:
            dh = DebugLocal()
    else:
        dh = DebugPass()
    
    if gpu:
        model.cuda()
    
    epoch_train_loss = []
    epoch_val_loss = []
    for epoch in range(1, epochs+1):
        print("Running training epoch")
        train_loss_val, train_preds =  train_single_epoch(
            model=model, data_loader=train_data_loader, gpu=gpu, 
            optimizer=optimizer, criterion=criterion)
        mean_train_loss = np.mean(train_loss_val)
        epoch_train_loss.append(mean_train_loss)
        val_loss_val, val_preds = validate(
            model=model, data_loader=val_data_loader, gpu=gpu, 
            criterion=criterion, dh=dh)
        
        print("Running validation")
        mean_val_loss = np.mean(val_loss_val)
        epoch_val_loss.append(np.mean(val_loss_val))

        chkp_pth = os.path.join(chkpnt_dir, f"mdl_chkpnt_epoch_{epoch}.pt")
        torch.save(
            {
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
            }, chkp_pth)
        if use_wandb:
            wandb.log({"train_loss": mean_train_loss, "val_loss": mean_val_loss})
            wandb.save(chkp_pth)
    dh.close()
    if use_wandb: 
        wandb.finish()
    return epoch_train_loss, epoch_val_loss

## Predict 1 step mean temperature
* For the first exercise, the aim is to predict the next day mean temperature using a historial time series of temperature, humidity, wind_speed and meanpressure values
* A core hyperparameter is defining the sequence length i.e., the size of the historial time series to use for prediction.
* As it stands, the data is of the form (time t obs, time t+1 target). Therefore, if this data was converted to a tensor as is, batched and used for training, only the time t values would be used for prediction. Using the "PandasDataset" class from the first tutorial demonstrates this

In [9]:
class PandasDataset(Dataset):
    def __init__(self, X:pd.DataFrame, y:pd.Series, normalise:bool=True)->None:
        # Your code here
        self._X = torch.from_numpy(X.values).float()
        if normalise:
            self._X = self.__min_max_norm(self._X)
        self.feature_dim = X.shape[1]
        self._len = X.shape[0]
        self._y = torch.from_numpy(y.values)[:,None].float()
    
    def __len__(self)->int:
        # Your code here
        return self._len
    
    def __getitem__(self, idx:int) -> Tuple[torch.Tensor, torch.Tensor]:
        # Your code here
        return self._y[idx], self._X[idx,:]
        
    def __min_max_norm(self, in_tens:torch.Tensor) -> torch.Tensor:
        # X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
        # Your code here
        _min = in_tens.min(axis=0).values
        _max = in_tens.max(axis=0).values
        in_tens = (in_tens - _min)/(_max - _min)
        return in_tens
        
        

In [10]:
trgt_col = "meantemp_1_step" # This is the t+1 target associated with the time t observations
# Dynamically select the remaining feature columns i.e., those that are: 
# 1) Not target variables (do not end with _step) and;
# 2) Are not the date variable
indp_cols = [ # verbose code
    col for col in train_df.columns if (
        (col != date_var) and (col[-5:] != "_step")
    )
]
print(trgt_col)
print(indp_cols)

meantemp_1_step
['meantemp', 'humidity', 'wind_speed', 'meanpressure']


Using the PandasDataset class, data is batched according to the time dimension. This dataset object needs to be extended such that sequences of length longer than 1 can be created.

In [11]:
# Normalise has been set to False for demo purposes
tmp_dataset = PandasDataset(X=train_df[indp_cols], y=train_df[trgt_col], normalise=False)
tmp_loader = DataLoader(tmp_dataset, shuffle=False, batch_size=2)
display(train_df[indp_cols+[trgt_col]].head(4))
loader_iter = tmp_loader.__iter__()
first_batch = next(loader_iter)
print(f"The first batch contains the first two rows of the dataset:\n {first_batch[1]}")
second_batch = next(loader_iter)
print(f"The second batch contains the second two rows of the dataset:\n {second_batch[1]}")

Unnamed: 0,meantemp,humidity,wind_speed,meanpressure,meantemp_1_step
0,10.0,84.5,0.0,1015.666667,7.4
1,7.4,92.0,2.98,1017.8,7.166667
2,7.166667,87.0,4.633333,1018.666667,8.666667
3,8.666667,71.333333,1.233333,1017.166667,6.0


The first batch contains the first two rows of the dataset:
 tensor([[  10.0000,   84.5000,    0.0000, 1015.6667],
        [   7.4000,   92.0000,    2.9800, 1017.8000]])
The second batch contains the second two rows of the dataset:
 tensor([[   7.1667,   87.0000,    4.6333, 1018.6667],
        [   8.6667,   71.3333,    1.2333, 1017.1667]])


__Exercise 1a__: 
* The __get_lookback function is designed to augment the _X and _y tensors with sequences of length "lookback".
* Where lookback is defined as 2, feature values at time points 't' and 't-1' are required to predict values at timepoint 't+1'.
* The code contains a bug where the dimensions of the _X and _y are incorrect - fix this

__Exercise 1b__: 
* Consider why the target tensor is indexed as follows ```i+lookback-1:i+lookback```

In [12]:
# Subclassing the original PandasDataset so we can inherit all of the original functionality
class PandasTsDataset(PandasDataset):
    def __init__(self, X:pd.DataFrame, y:pd.Series, lookback:int, normalise:bool=True)->None:
        # Call super so that the PandasDataset.__init__ function is called
        super().__init__(X=X, y=y, normalise=normalise)
        # By this step, the self._X and self._y attributes etc will have been created.
        if lookback > 1:
            self.__get_lookback(lookback=lookback)
        # Although the 'self._len' attribute is already set in the PandasDataset subclass,
        # it is overwritten here. Whilst we have introduced redundant computation (by setting the _len twice)
        # we have traded this off for code readability and usability!
        self._len = self._X.shape[0]
    
    def __get_lookback(self, lookback:int):
        X_vals = []
        y_vals = []
        for i in range(self._X.shape[0]-(lookback-1)):
            # BUG: Remove [None, :] operation!
            # Your code here
            X_vals.append(self._X[i:i+lookback][None, :])
            y_vals.append(self._y[i+lookback-1:i+lookback])
            # y_vals.append(self._y[i:i+lookback][None, :])
            # Your code here - END
        self._y = torch.concat(y_vals, axis=0)
        self._X = torch.concat(X_vals, axis=0)


In [13]:
tmp_dataset = PandasTsDataset(X=train_df[indp_cols], y=train_df[trgt_col], lookback=2, normalise=False)
display(train_df.head())
print(f"First row of the data contains the first and second row of the original dataset:\n {tmp_dataset[0][1]}")
print(f"With the target defined as:\n {tmp_dataset[0][0]}")
print("\n")
print(f"Second row of the data contains the second and third row of the original dataset:\n {tmp_dataset[1][1]}")
print(f"With the target defined as:\n {tmp_dataset[1][0]}")
print("\n")
print(f"Final row of the data contains the penultimate and final row of the original dataset:\n {tmp_dataset[-1][1]}")
print(f"With the target defined as:\n {tmp_dataset[-1][0]}")
display(train_df.tail())
print("\n")
print("\n")
tmp_loader = DataLoader(tmp_dataset, shuffle=False, batch_size=2)
first_batch = next(tmp_loader.__iter__())
print(f"The first batch contains input:\n {first_batch[1]}")
print(f"With target values:\n {first_batch[0]}")

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure,meantemp_1_step,humidity_1_step,wind_speed_1_step,meanpressure_1_step,meantemp_5_step,humidity_5_step,wind_speed_5_step,meanpressure_5_step,meantemp_10_step,humidity_10_step,wind_speed_10_step,meanpressure_10_step
0,2013-01-01,10.0,84.5,0.0,1015.666667,7.4,92.0,2.98,1017.8,7.0,82.8,1.48,1018.0,15.714286,51.285714,10.571429,1016.142857
1,2013-01-02,7.4,92.0,2.98,1017.8,7.166667,87.0,4.633333,1018.666667,7.0,78.6,6.3,1020.0,14.0,74.0,13.228571,1015.571429
2,2013-01-03,7.166667,87.0,4.633333,1018.666667,8.666667,71.333333,1.233333,1017.166667,8.857143,63.714286,7.142857,1018.714286,15.833333,75.166667,4.633333,1013.333333
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667,6.0,86.833333,3.7,1016.5,14.0,51.25,12.5,1017.0,12.833333,88.166667,0.616667,1015.166667
4,2013-01-05,6.0,86.833333,3.7,1016.5,7.0,82.8,1.48,1018.0,11.0,62.0,7.4,1015.666667,14.714286,71.857143,0.528571,1015.857143


First row of the data contains the first and second row of the original dataset:
 tensor([[  10.0000,   84.5000,    0.0000, 1015.6667],
        [   7.4000,   92.0000,    2.9800, 1017.8000]])
With the target defined as:
 tensor([7.1667])


Second row of the data contains the second and third row of the original dataset:
 tensor([[   7.4000,   92.0000,    2.9800, 1017.8000],
        [   7.1667,   87.0000,    4.6333, 1018.6667]])
With the target defined as:
 tensor([8.6667])


Final row of the data contains the penultimate and final row of the original dataset:
 tensor([[  15.5000,   71.7500,    2.1000, 1017.5000],
        [  15.0000,   71.3750,    2.0875, 1020.5000]])
With the target defined as:
 tensor([14.7143])


Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure,meantemp_1_step,humidity_1_step,wind_speed_1_step,meanpressure_1_step,meantemp_5_step,humidity_5_step,wind_speed_5_step,meanpressure_5_step,meantemp_10_step,humidity_10_step,wind_speed_10_step,meanpressure_10_step
1090,2015-12-27,15.375,63.25,7.8875,1020.625,17.125,58.125,10.8875,1020.875,14.714286,72.285714,1.057143,1021.142857,17.375,81.625,2.3125,1016.5
1091,2015-12-28,17.125,58.125,10.8875,1020.875,16.375,65.0,7.4125,1018.125,14.0,75.875,2.0875,1021.0,17.125,87.0,0.0,1018.125
1092,2015-12-29,16.375,65.0,7.4125,1018.125,15.5,71.75,2.1,1017.5,14.375,74.75,5.1125,1018.5,15.5,83.25,7.8875,1017.25
1093,2015-12-30,15.5,71.75,2.1,1017.5,15.0,71.375,2.0875,1020.5,15.75,77.125,0.0,1017.625,15.857143,65.142857,8.471429,1015.428571
1094,2015-12-31,15.0,71.375,2.0875,1020.5,14.714286,72.285714,1.057143,1021.142857,15.833333,88.833333,0.616667,1017.0,15.625,74.375,2.775,1017.5






The first batch contains input:
 tensor([[[  10.0000,   84.5000,    0.0000, 1015.6667],
         [   7.4000,   92.0000,    2.9800, 1017.8000]],

        [[   7.4000,   92.0000,    2.9800, 1017.8000],
         [   7.1667,   87.0000,    4.6333, 1018.6667]]])
With target values:
 tensor([[7.1667],
        [8.6667]])


The RNN model is now ready to be defined. To begin with, we'll try just using the nn.RNN module, provided by Pytorch. Consider the picture of an RNN below (credit: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks):

![alternative text](./figures/generic_rnn_term_pred.png)

The blue blocks represent a single RNN computation and each computation take a set of hidden values, $a^{<t-1>}$, an input $x^{<t>}$ and produces a hidden state itself $a^{<t>}$. The final computation in the sequence produces an output, $y$.

* Mapping these to the input parameters of nn.RNN, 
    * input_size: Represents the dimenion of $x^{<t>}$ i.e., this represents the feature dimension for each observation in the input sequence
    * hidden_size: Represents the dimension of the hidden layer within the RNN
    * num_layers: Represents the number of "stacked" RNNs. Note this __does not__ represent the number of RNN computations. Ignore this parameter for now it is discussed later
    * nonlinearity: Represents the non-linear function which produces the set of hidden values $a^{<t>}$
    * batch_first: If set to True, tells the RNN to expect tensors of dimension (batch_size, sequence_size, feature_size) else it expects (sequence_size, batch_size, feature_size)
    * bidirectional: If set to true a 'bidirectional' RNN is defined. This is out of scope for the tutorial

The computation described by a single RNN unit is defined by:
\begin{equation}
    a^{<t>} = \textrm{nonlinearity}(x^{<t>}W_{i,h} + b_{i,h} + a^{<t-1>}W_{h,h} + b_{h,h})
\end{equation}

__Exercise__ 2: The computation described in https://pytorch.org/docs/stable/generated/torch.nn.RNN.html and https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks are identical (however, the $b_{i,h}$ bias is set to the identity in the cheatsheet). Try to reconclie the two with pen and paper.  

__Exercise__ 3:
* Using the description above, set the parameters below assuming we require:
    * Sequences of length 2 for each input
    * The dimension of $W_{h,h}$ to be 52
    * relu activation functions
    * Whether the input data should be shuffled

In [14]:
# Your code here
lookback = 2
input_dim = len(indp_cols)
hidden_dim = 52
nonlinearity = "relu"
shuffle=True
# Your code here - END

num_layers = 1

In [15]:
train_dataset = PandasTsDataset(X=train_df[indp_cols], y=train_df[trgt_col], lookback=lookback)
val_dataset = PandasTsDataset(X=val_df[indp_cols], y=val_df[trgt_col], lookback=lookback)
print(len(train_dataset))
print(train_df[indp_cols].shape[0])
train_loader = DataLoader(dataset=train_dataset, shuffle=shuffle, batch_size=2)
val_loader = DataLoader(dataset=val_dataset, shuffle=shuffle, batch_size=2)

1094
1095


__Exercise__ 4:
* The code below produces a bug. Debug it and define the VanillaRNN class
* _Hints_:
    * Performing the reconciliation exercise will help with this, in particular noticing that the description in Pytorch (and therefore the computation implemented in the nn.RNN function is __missing__ the computation $y = g_{2}(W_{y,a}a^{<T>} + b_{y})$) where $a^{<T>}$ is the hidden layer output from the final RNN computation
    * Also, examine the object type produced by the RNN() call. Is it what you expect? Have a look at the Pytorch documentation to understand what is being produced
    * Finally, examine the output of the RNN model and the output of the dataloader - what do you obserse? (The code below will help do this)
 
```python
first_batch = next(train_loader.__iter__())
print(f"First batch shape: {first_batch[1].shape}")
print(f"First batch obs:\n{first_batch[1]}")
print(f"First batch trgt:\n{first_batch[0]}")
with torch.no_grad():
    mdl_pred = model(first_batch[1])
    print(f"All hidden: {mdl_pred[0].shape}")
    print(f"All hidden values:\n {mdl_pred[0]}")
    print(f"Final hidden: {mdl_pred[1].shape}")
    print(f"Final hidden values:\n {mdl_pred[1]}")
```

In [16]:
model = nn.RNN(
    input_size=len(indp_cols), 
    hidden_size=hidden_dim, 
    num_layers=num_layers,
    nonlinearity=nonlinearity,
    batch_first=True,
    bidirectional=False
)

epochs = 5
lr = 0.001
optimizer=torch.optim.Adam(model.parameters(), lr=lr)
criterion=nn.MSELoss()
epoch_train_loss, epoch_val_loss = train(
    model=model, train_data_loader=train_loader, val_data_loader=val_loader, gpu = gpu, 
    optimizer=optimizer, criterion=criterion, epochs=epochs, 
)

WandB not in use!


Running training epoch


0it [00:00, ?it/s]


AttributeError: 'tuple' object has no attribute 'size'

In [17]:
first_batch = next(train_loader.__iter__())
print(f"First batch shape: {first_batch[1].shape}")
print(f"First batch obs:\n{first_batch[1]}")
print(f"First batch trgt:\n{first_batch[0]}")
with torch.no_grad():
    mdl_pred = model(first_batch[1])
    print(f"All hidden: {mdl_pred[0].shape}")
    print(f"All hidden values:\n {mdl_pred[0]}")
    print(f"Final hidden: {mdl_pred[1].shape}")
    print(f"Final hidden values:\n {mdl_pred[1]}")

First batch shape: torch.Size([2, 2, 4])
First batch obs:
tensor([[[0.7336, 0.7369, 0.4275, 0.2530],
         [0.7413, 0.6674, 0.1371, 0.2490]],

        [[0.3324, 0.8655, 0.0936, 0.7312],
         [0.3248, 0.7369, 0.0551, 0.7826]]])
First batch trgt:
tensor([[31.8750],
        [17.7500]])
All hidden: torch.Size([2, 2, 52])
All hidden values:
 tensor([[[0.0000, 0.0000, 0.0000, 0.1371, 0.0000, 0.0897, 0.1841, 0.0000,
          0.1150, 0.1605, 0.0000, 0.0000, 0.0000, 0.0000, 0.2443, 0.0000,
          0.0000, 0.0000, 0.0000, 0.1416, 0.0755, 0.0000, 0.0328, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0150, 0.0000, 0.0000, 0.2725,
          0.0000, 0.2191, 0.2277, 0.0000, 0.1064, 0.2343, 0.0000, 0.0000,
          0.1385, 0.0000, 0.0121, 0.0053, 0.0000, 0.0913, 0.0000, 0.0339,
          0.1329, 0.0000, 0.1630, 0.0000],
         [0.0000, 0.0000, 0.0162, 0.1297, 0.0000, 0.0544, 0.1256, 0.0000,
          0.0698, 0.2032, 0.0000, 0.0000, 0.0000, 0.0000, 0.2145, 0.0000,
          0.0731, 0

In [22]:
class VanillaRNN(nn.Module):
    
    def __init__(self, input_dim:int,  hidden_dim:int, num_layers:int, 
                 fc_output_size:int, *args, **kwargs) -> None: 
        super().__init__()
        self._num_layers = num_layers
        self._hidden_dim = hidden_dim
        self.rnn = nn.RNN(
            input_size=input_dim,  hidden_size=hidden_dim,
            num_layers=num_layers, *args, **kwargs,
            batch_first=True
        )
        # Your code here - this should represent y^{<t>} = g_{2}(W_{y,a}a^{<t>} + b_{y})
        self.relu = nn.ReLU()
        self.fc = nn.Linear(in_features=hidden_dim, out_features=fc_output_size)
        # Your code here - END
    
    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)
        out = self.rnn(x, hidden)
        # Your code here - this should represent both equations from the stanford cheatsheet
        return self.fc(out[1].squeeze())
        # Your code here - END

    def init_hidden(self, batch_size):
        hidden = torch.zeros(self._num_layers, batch_size, self._hidden_dim)
        return hidden


In [23]:
fc_output_size = 1

In [24]:
wandb_config={
    "lr": 0.001,
    "hidden_dim": hidden_dim,
    "num_layers": 1,
    "fc_output_size": fc_output_size,
    "lookback": lookback,
    "nonlinearity":nonlinearity
}
model = VanillaRNN(
    input_dim=input_dim,  hidden_dim=wandb_config["hidden_dim"],
    num_layers=wandb_config["num_layers"], fc_output_size=wandb_config["fc_output_size"], 
    nonlinearity=nonlinearity
)
with torch.no_grad():
    mdl_pred = model(first_batch[1])
    print(f"Pred y: {mdl_pred.shape}")
    print(f"True y: {first_batch[0].shape}")

Pred y: torch.Size([2, 1])
True y: torch.Size([2, 1])


In [25]:
optimizer=torch.optim.Adam(model.parameters(), lr=wandb_config["lr"])
criterion=nn.MSELoss()
epochs = 5
epoch_train_loss, epoch_val_loss = train(
    model=model, train_data_loader=train_loader, val_data_loader=val_loader, gpu = gpu, 
    optimizer=optimizer, criterion=criterion, epochs=epochs, wandb_proj=WANDB_PROJ,
    wandb_config=wandb_config, debug=True
)

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011168778244043803, max=1.0…

Running training epoch


547it [00:00, 2391.37it/s]
183it [00:00, 10025.70it/s]


Running validation
Running training epoch


547it [00:00, 2528.14it/s]
183it [00:00, 10813.11it/s]


Running validation
Running training epoch


547it [00:00, 2708.87it/s]
183it [00:00, 11321.58it/s]


Running validation
Running training epoch


547it [00:00, 2681.60it/s]
183it [00:00, 10663.04it/s]


Running validation
Running training epoch


547it [00:00, 2741.01it/s]
183it [00:00, 11257.32it/s]


Running validation


0,1
train_loss,█▁▁▁▁

0,1
train_loss,4.65035
val_loss,


__Exercise__ 5:
* Notice under "Run summary" a "nan" is returned. The training loop provided at the beginning of this script has been augmented with the functionality to push the ground truth values and predicted values from the validation set to weights and biases. Use weights and biases to debug why nans are being produced in the validation and implement the fix.
* _Hint_:
    * Examine the validation ground truth closely - try exporting it to a csv!

In [53]:
non_na_idx = ~train_df[trgt_col].isna()
train_dataset = PandasTsDataset(
    X=train_df[non_na_idx][indp_cols], 
    y=train_df[non_na_idx][trgt_col],
    lookback=lookback
)
non_na_idx = ~val_df[trgt_col].isna()
val_dataset = PandasTsDataset(
    X=val_df[non_na_idx][indp_cols], 
    y=val_df[non_na_idx][trgt_col],
    lookback=lookback
)
train_loader = DataLoader(dataset=train_dataset, shuffle=shuffle, batch_size=2)
val_loader = DataLoader(dataset=val_dataset, shuffle=shuffle, batch_size=2)

In [54]:
wandb_config={
    "lr": 0.001,
    "hidden_dim": hidden_dim,
    "num_layers": 1,
    "fc_output_size": fc_output_size,
    "lookback": lookback, 
    "nonlinearity":nonlinearity
}
model = VanillaRNN(
    input_dim=len(indp_cols),  hidden_dim=wandb_config["hidden_dim"],
    num_layers=wandb_config["num_layers"], fc_output_size=wandb_config["fc_output_size"],
    nonlinearity=nonlinearity
)
optimizer=torch.optim.Adam(model.parameters(), lr=wandb_config["lr"])
criterion=nn.MSELoss()
epochs = 5
epoch_train_loss, epoch_val_loss = train(
    model=model, train_data_loader=train_loader, val_data_loader=val_loader, gpu = gpu, 
    optimizer=optimizer, criterion=criterion, epochs=epochs, wandb_proj=WANDB_PROJ,
    wandb_config=wandb_config, debug=False
)

Running training epoch


547it [00:00, 2665.11it/s]
  return F.mse_loss(input, target, reduction=self.reduction)
183it [00:00, 10083.12it/s]


Running validation
Running training epoch


547it [00:00, 2701.46it/s]
183it [00:00, 11166.59it/s]


Running validation
Running training epoch


547it [00:00, 2755.06it/s]
183it [00:00, 11404.68it/s]


Running validation
Running training epoch


547it [00:00, 2573.03it/s]
183it [00:00, 10313.86it/s]


Running validation
Running training epoch


547it [00:00, 2502.27it/s]
183it [00:00, 10748.15it/s]

Running validation





VBox(children=(Label(value='0.210 MB of 0.210 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…



0,1
train_loss,█▁▁▁▁
val_loss,█▅▂▁▂

0,1
train_loss,4.36788
val_loss,5.90971


## Changing the size of the lookback
A working RNN model for one step predict has been defined. 

__Exercise__ 6a:
* Try improving performance of the model by changing the lookback size: Here, we are trying to define the 'amount' of historial time steps relevant for prediction

__Exercise__ 6b:
* Try experimenting with the output of the model. In machine learning "auxiliary loss functions" are often used to improve performance. Auxiliary loss functions assess the performance of a model to do a related task in order to increase the amount of gradient signal to pass to the model. For example, in healthcare, when developing a model to predict acute kidney injury (AKI), DeepMind assessed whether the model could predict the outcome of the lab test for AKI (https://www.nature.com/articles/s41586-019-1390-1). It might be reasonable to assume that predicting the next day values for humidity, windspeed and pressure would help in the prediction for mean temperature.
* An alternate auxiliary might be to keep meantemp as the prediction target but predict the intermediatary days as well i.e., defining an RNN of the form:

![alternative text](./figures/generic_rnn.png)

* _Hint_:
    * The first auxiliary loss will require significant modifications to pretty much all of the steps above - don't worry if you're rewriting a lot of code!
    * The second auxiliary loss only requires alterating the indexing in the PandasTsDataset function and input dimensions to the fully connected head. Alternatively, would it be better to use a specific head for each intermeditey output?
    * When validating, we are still only interested in the ability for the model to predict the temperature!
    * The hyperparameters previously discussed i.e., learning rate, epochs, batch_size and the network architecture might need to be adjusted

In [55]:
for lookback in [5,10]:
    non_na_idx = ~train_df[trgt_col].isna()
    train_dataset = PandasTsDataset(
        X=train_df[non_na_idx][indp_cols], 
        y=train_df[non_na_idx][trgt_col],
        lookback=lookback
    )
    non_na_idx = ~val_df[trgt_col].isna()
    val_dataset = PandasTsDataset(
        X=val_df[non_na_idx][indp_cols], 
        y=val_df[non_na_idx][trgt_col],
        lookback=lookback
    )
    train_loader = DataLoader(dataset=train_dataset, shuffle=True, batch_size=2)
    val_loader = DataLoader(dataset=val_dataset, shuffle=True, batch_size=2)
    
    wandb_config={
        "lr": 0.001,
        "hidden_dim": hidden_dim,
        "num_layers": 1,
        "fc_output_size": 1,
        "lookback": lookback,
        "nonlinearity":nonlinearity
    }
    model = VanillaRNN(
        input_dim=len(indp_cols),  hidden_dim=wandb_config["hidden_dim"],
        num_layers=wandb_config["num_layers"], fc_output_size=wandb_config["fc_output_size"],
        nonlinearity=nonlinearity
    )
    optimizer=torch.optim.Adam(model.parameters(), lr=wandb_config["lr"])
    criterion=nn.MSELoss()
    epochs = 5
    epoch_train_loss, epoch_val_loss = train(
        model=model, train_data_loader=train_loader, val_data_loader=val_loader, gpu = gpu, 
        optimizer=optimizer, criterion=criterion, epochs=epochs, wandb_proj=WANDB_PROJ,
        wandb_config=wandb_config, debug=False
    )

Running training epoch


  return F.mse_loss(input, target, reduction=self.reduction)
546it [00:00, 2022.97it/s]
181it [00:00, 9510.77it/s]


Running validation
Running training epoch


546it [00:00, 2147.69it/s]
181it [00:00, 9948.36it/s]


Running validation
Running training epoch


546it [00:00, 2115.01it/s]
181it [00:00, 8958.28it/s]


Running validation
Running training epoch


546it [00:00, 1997.70it/s]
181it [00:00, 8640.08it/s]


Running validation
Running training epoch


546it [00:00, 2003.23it/s]
181it [00:00, 9670.33it/s]


Running validation




0,1
train_loss,█▁▁▁▁
val_loss,▂▂▁▂█

0,1
train_loss,6.62359
val_loss,14.65492


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011131981488890434, max=1.0…

Running training epoch


543it [00:00, 1595.52it/s]
  return F.mse_loss(input, target, reduction=self.reduction)
179it [00:00, 7488.26it/s]


Running validation
Running training epoch


543it [00:00, 1553.07it/s]
179it [00:00, 7374.11it/s]


Running validation
Running training epoch


543it [00:00, 1499.59it/s]
179it [00:00, 7146.27it/s]


Running validation
Running training epoch


543it [00:00, 1505.51it/s]
179it [00:00, 7624.15it/s]


Running validation
Running training epoch


543it [00:00, 1648.55it/s]
179it [00:00, 7716.77it/s]

Running validation







0,1
train_loss,█▁▁▁▁
val_loss,▂█▁▄▇

0,1
train_loss,9.11932
val_loss,17.73588


## Stacked RNNs
The nn.RNN module also contains a 'num_layers' parameter. Setting 'num_layers' to greater than 1 creates a "stacked" RNN which is depicted below (credit: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks) 

![alternative text](./figures/rnn_stacked.png)

Stacking RNNs is similar to making MLPs deeper. One may want to stack an RNN if the raw features have different 'levels' of time dependant features as each layer extracts a non-linear relationship between the input to that layer and each layer is also recursively defined for the input sequence. For the climate example here, a stacked RNN might be required if it was hypothesised that there existed 'more complex' interactions between the four input variables than can be captured by a single non-linear layer. Furthermore, these 'more complex' interactions would have to be themselves recursive else, a deeper MLP could just be used instead to extract the time t prediction, $y_{i}$.

__Exercise__ 7:
* Experiment with different numbers of RNN layers and MLP head layers

## Predicting n step values

__Exercise__ 8:
* Experiment with using the other target columns i.e. meantemp_5_step. When using meantemp_5_step as the target variable, we are building a model can can predict the temperature 5 days in advance. Using auxiliary losses might be useful here as one may expect that if the model can predict the next day more accurately, it should be able to predict the fifth day more accurately. However, again be careful not to use the 1 day predictions in your validation assessment!