### Multivariate time series prediction using MLP


Here is a PyTorch implementation of a fully connected neural network which has in total 4 fully connected layers. This is a vanilla
neural network model to be used for multivariate time series prediction. The model has two fully connected layers with size
**seq_len x 100** and **100 x pred_len** with **ReLU** loss function along a time axis and two fully connected layers with sizes **num_features x 20** and **20 x 1** with same **ReLU** as loss function along the features axis. The neural network uses Mean Square Error Loss (MSELoss) to calculate the model loss. NSE and WAPE are used for model evaluation. The data is normalized using MinMaxScaler from scikit learn. Data is used is the 3 hr river gauge height data. It is divided in the ratio of 7:1:2 training, validation and testing respectively.

In [1]:
## Write the above description in more organized way, like
# No. of layers =  4
# Sequence length = 24, etc.

import torch
import numpy as np

import pandas as pd
from sklearn.preprocessing import MinMaxScaler


In [2]:
# Use hourly data, use a subset to reduce the datasize

path = '../dataset/final_data.csv'

In [3]:
# Implement determinism. Set a fixed value for random seed so that when the parameters are initialized, they are initialized same across all experiments.
torch.manual_seed(42)

<torch._C.Generator at 0x112e14990>

Here we define **RiverData** a custom Dataset class to load the dataset we have. It extends the pytorch's **Dataset** class.  
- We need to define \_\_init__() function which can be used for loading data from file and optionally for data preprocessing.
- Thereafter we define \_\_len__() function which gives the length of dataset.
- Then we define \_\_getitem__() function which returns an instance of (feature, label) tuple which can be used for model training.
  For our time series data, feature means the past values to be used for training and label means the future values to be predicted.

In [4]:
# we need to put in pre-requisite the knowledge for object oriented programming
# send emails to partcipants to go through it.

class RiverData(torch.utils.data.Dataset):
    
    def __init__(self, df, target, datecol, seq_len, pred_len):
        self.df = df
        self.datecol = datecol
        self.target = target
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.setIndex()
        

    def setIndex(self):
        self.df.set_index(self.datecol, inplace=True)
    

    def __len__(self):
        return len(self.df) - self.seq_len - self.pred_len


    def __getitem__(self, idx):
        if len(self.df) <= (idx + self.seq_len+self.pred_len):
            raise IndexError(f"Index {idx} is out of bounds for dataset of size {len(self.df)}")
        df_piece = self.df[idx:idx+self.seq_len].values
        feature = torch.tensor(df_piece, dtype=torch.float32)
        label_piece = self.df[self.target][idx + self.seq_len:  idx+self.seq_len+self.pred_len].values
        label = torch.tensor(label_piece, dtype=torch.float32)
        return (feature.T, label) 

### Normalize the data

In [5]:
df = pd.read_csv(path)
df.reset_index(inplace=True)
df = df[df['DATE'] > '2012']

df.drop('index', axis=1, inplace=True)
raw_df = df.drop('DATE', axis=1, inplace=False)
scaler = MinMaxScaler()

# Apply the transformations
df_scaled = scaler.fit_transform(raw_df)

df_scaled = pd.DataFrame(df_scaled, columns=raw_df.columns)
df_scaled['DATE'] = df['DATE']
df = df_scaled

Some advanced python syntax have been used here. \
*common_args : it's used to pass arguments to a function, where common_args represents a python list \
**common_args: it's used to pass arguments to a function, where common_args represents a python dictionary

In [6]:
# Ratio for train:test:validation is 7:2:1
# we can vary it depending on the dataset

train_size = int(0.7 * len(df))
test_size = int(0.2 * len(df))
val_size = len(df) - train_size - test_size

seq_len = 13
pred_len = 1
num_features = 7

common_args = ['gauge_height', 'DATE', seq_len, pred_len]
train_dataset = RiverData(df[:train_size], *common_args)
val_dataset = RiverData(df[train_size: train_size+val_size], *common_args)
test_dataset = RiverData(df[train_size+val_size : len(df)], *common_args)


In [7]:
# Important hyperparameters

BATCH_SIZE = 512 # keep as big as can be handled by GPU and memory
SHUFFLE = False # we don't shuffle the time series data
DATA_LOAD_WORKERS = 1 # it depends on amount of data you need to load
learning_rate = 1e-3 # Learning rate

In [8]:
from torch.utils.data import DataLoader

common_args = {'batch_size': BATCH_SIZE, 'shuffle': SHUFFLE}
train_loader = DataLoader(train_dataset, **common_args)
val_loader = DataLoader(val_dataset, **common_args)
test_loader = DataLoader(test_dataset, **common_args)

### Here we define our pytorch model.

BasicMLPNetwork is the model class, it extends the **Module** class provided by pytorch. \
- We define \_\_init__() function. It sets up layers and defines the model parameters.
- Also, we define forward() function which defines how the forwared pass computation occurs

In [9]:
class BasicMLPNetwork(torch.nn.Module):
    
    def __init__(self, seq_len, pred_len):
        # call the constructor of the base class
        super().__init__()
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.num_features = num_features
        hidden_size_time = 100
        hidden_size_feat = 20
        # define layers for combining across time series
        self.fc1 = torch.nn.Linear(self.seq_len, hidden_size_time)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(hidden_size_time, self.pred_len)

        # define layers for combining across the features
        self.fc3 = torch.nn.Linear(self.num_features, hidden_size_feat)
        self.fc4 = torch.nn.Linear(hidden_size_feat, 1)

    def forward(self, x):

        # computation over time
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.relu(out) # has dimension 512 x 7 x 1

        # computation over features
        out = out.transpose(1,2) # dimension 512 x 1 x 7
        out = self.fc3(out) # dimension 512 x 1 x 20
        out = self.relu(out)
        out = self.fc4(out) # dimension 512 x 1 x 1

        out = out.squeeze(-1) # dimension 512 x 1
        
        return out

# Note that the gradients are stored insize the FC layer objects
# For each training example we need to get rid of these gradients

In [10]:
model = BasicMLPNetwork(seq_len, pred_len)
loss = torch.nn.MSELoss()

# there are different optimizer methods, here we're using Adam Optimizer.
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

In [11]:
for gen in model.parameters():
    print(gen.shape)

torch.Size([100, 8])
torch.Size([100])
torch.Size([1, 100])
torch.Size([1])
torch.Size([20, 7])
torch.Size([20])
torch.Size([1, 20])
torch.Size([1])


In [12]:
for i, (f,l) in enumerate(train_loader):
    print('features shape: ', f.shape)
    print('labels shape: ', l.shape)
    break

features shape:  torch.Size([512, 7, 8])
labels shape:  torch.Size([512, 1])


In [13]:
# define metrics
import numpy as np
epsilon = np.finfo(float).eps

def wape_function(y, y_pred):
    """Weighted Average Percentage Error metric in the interval [0; 100]"""
    y = np.array(y)
    y_pred = np.array(y_pred)
    nominator = np.sum(np.abs(np.subtract(y, y_pred)))
    denominator = np.add(np.sum(np.abs(y)), epsilon)
    wape = np.divide(nominator, denominator) * 100.0
    return wape

def nse_function(y, y_pred):
    y = np.array(y)
    y_pred = np.array(y_pred)
    return (1-(np.sum((y_pred-y)**2)/np.sum((y-np.mean(y))**2)))


def evaluate_model(model, data_loader):
    # following line prepares the model for evaulation mode. It disables dropout and batch normalization if they have 
    # are part of the model. For our simple model it's not necessary. Still I'm going to use it.

    model.eval()
    all_inputs = torch.empty((0, num_features, seq_len))
    all_labels = torch.empty(0, pred_len)
    for inputs, labels in data_loader:
        all_inputs = torch.vstack((all_inputs, inputs))
        all_labels = torch.vstack((all_labels, labels))
    
    with torch.no_grad():
        outputs = model(all_inputs)
        nse = nse_function(all_labels.numpy(), outputs.numpy())
        wape = wape_function(all_labels.numpy(), outputs.numpy())
        
    print(f'NSE : {nse} ', end='')
    print(f'WAPE : {wape} ')
    
    model.train()
    return nse, wape


In [14]:
num_epochs = 30

for epoch in range(num_epochs):
    epoch_loss = []
    for batch_idx, (inputs, labels) in enumerate(train_loader):
        outputs = model(inputs)
        loss_val = loss(outputs, labels)

        # calculate gradients for back propagation
        loss_val.backward()

        # update the weights based on the gradients
        optimizer.step()

        # reset the gradients, avoid gradient accumulation
        optimizer.zero_grad()
        epoch_loss.append(loss_val.item())
    
    print(f'Epoch {epoch+1}: {sum(epoch_loss)/len(epoch_loss)} ', end='')
    nse, wape = evaluate_model(model, val_loader)
    
        



Epoch 1: 0.016206374802296957 NSE : 0.15714913606643677 WAPE : 39.11094295875523 
Epoch 2: 0.013584269159911013 NSE : 0.5186263918876648 WAPE : 27.771585992077323 
Epoch 3: 0.004240482592613017 NSE : 0.7875455766916275 WAPE : 20.210333945197505 
Epoch 4: 0.0010790594650515286 NSE : 0.9127053394913673 WAPE : 12.171565234218027 
Epoch 5: 0.0006093668847422165 NSE : 0.9477205686271191 WAPE : 8.73540323259754 
Epoch 6: 0.00044432752840916295 NSE : 0.9574275575578213 WAPE : 7.915840024643277 
Epoch 7: 0.00035985031659557055 NSE : 0.9610028378665447 WAPE : 7.763982048978512 
Epoch 8: 0.0003255458753563826 NSE : 0.9462717175483704 WAPE : 10.12784190692189 
Epoch 9: 0.0003198113809048664 NSE : 0.9152859002351761 WAPE : 13.567782155265734 
Epoch 10: 0.0003409921594252013 NSE : 0.8694832772016525 WAPE : 17.483276997324932 
Epoch 11: 0.0003368078595784808 NSE : 0.8608902543783188 WAPE : 18.257107194327162 
Epoch 12: 0.0003075111486127478 NSE : 0.8762962445616722 WAPE : 17.208524422717716 
Epoch 1

In [15]:

evaluate_model(model, test_loader)


## Write code for plotting the observed values and predictions.

NSE : 0.8102979809045792 WAPE : 25.818637390268833 


(0.8102979809045792, 25.818637390268833)