### Multivariate time series prediction using MLP with Hyperparameter optimization

In this notebook, we use **Optuna** to find the optimum values of hyperparameters. Optuna is a package for optimizing hyperparameters. In this notebook we specifically optimize the values of **learning rate, weight decay and dropout**.

Optuna is a python package specifially designed for hyperparameter tuning. We need to define a range of possible values for each of the hyperparameters. And optuna will try different parameter values with the model to minimize the validation loss after for specified number of experiments. 


In [1]:
import torch
import numpy as np
import optuna

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
path = '../dataset/final_data.csv'

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Implement determinism. Set a fixed value for random seed so that when the parameters are initialized, they are initialized same across all experiments.
torch.manual_seed(42)

# If you are using CUDA, also set the seed for it
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)
    torch.cuda.manual_seed_all(42)

# Set the seed for NumPy
np.random.seed(42)

Using device: cuda


Here we define **RiverData** a custom Dataset class to load the dataset we have. It extends the pytorch Dataset class.  
- We need to define \_\_init__() function which can be used for loading data from file and optionally for data preprocessing.
- Thereafter we define \_\_len__() function which gives the length of dataset.
- Then we define \_\_getitem__() function which returns an instance of (feature, label) tuple which can be used for model training.
  For our time series data, feature means the past values to be used for training and label means the future values to be predicted.

In [4]:
class RiverData(torch.utils.data.Dataset):
    
    def __init__(self, df, target, datecol, seq_len, pred_len):
        self.df = df
        self.datecol = datecol
        self.target = target
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.setIndex()
        

    def setIndex(self):
        self.df.set_index(self.datecol, inplace=True)
    

    def __len__(self):
        return len(self.df) - self.seq_len - self.pred_len


    def __getitem__(self, idx):
        if len(self.df) <= (idx + self.seq_len+self.pred_len):
            raise IndexError(f"Index {idx} is out of bounds for dataset of size {len(self.df)}")
        df_piece = self.df[idx:idx+self.seq_len].values
        feature = torch.tensor(df_piece, dtype=torch.float32)
        label_piece = self.df[self.target][idx + self.seq_len:  idx+self.seq_len+self.pred_len].values
        label = torch.tensor(label_piece, dtype=torch.float32)
        return (feature.T, label) 

### Normalize the data

In [5]:
df = pd.read_csv(path)
df = df[df['DATE'] > '2012']
raw_df = df.drop('DATE', axis=1, inplace=False)
scaler = MinMaxScaler()

# Apply the transformations
df_scaled = scaler.fit_transform(raw_df)

df_scaled = pd.DataFrame(df_scaled, columns=raw_df.columns)
df_scaled['DATE'] = df['DATE']
df = df_scaled

Some advanced python syntax have been used here. \
*common_args : it's used to pass arguments to a function, where common_args represents a python list \
**common_args: it's used to pass arguments to a function, where common_args represents a python dictionary

In [6]:

train_size = int(0.7 * len(df))
test_size = int(0.2 * len(df))
val_size = len(df) - train_size - test_size

seq_len = 13
pred_len = 1
num_features = 7

common_args = ['gauge_height', 'DATE', seq_len, pred_len]
train_dataset = RiverData(df[:train_size], *common_args)
val_dataset = RiverData(df[train_size: train_size+val_size], *common_args)
test_dataset = RiverData(df[train_size+val_size : len(df)], *common_args)


In [7]:
# Important parameters

BATCH_SIZE = 512 # keep as big as can be handled by GPU and memory
SHUFFLE = False # we don't shuffle the time series data
DATA_LOAD_WORKERS = 1 # it depends on amount of data you need to load


In [8]:
from torch.utils.data import DataLoader

common_args = {'batch_size': BATCH_SIZE, 'shuffle': SHUFFLE}
train_loader = DataLoader(train_dataset, **common_args)
val_loader = DataLoader(val_dataset, **common_args)
test_loader = DataLoader(test_dataset, **common_args)

### Here we define our pytorch model.

BasicMLPNetwork is the model class, it extends the Module class provided by pytorch. \
- We define \_\_init__() function. It sets up layers and defines the model parameters.
- Also, we define forward() function which defines how the forwared pass computation occurs

In [9]:
# Here we are adding dropout layers.

class BasicMLPNetwork(torch.nn.Module):
    
    def __init__(self, seq_len, pred_len, num_features, dropout):
        # call the constructor of the base class
        super().__init__()
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.num_features = num_features
        hidden_size_time = 256
        hidden_size_feat = 128
        # define layers for combining across time series
        self.fc1 = torch.nn.Linear(self.seq_len, hidden_size_time)
        self.relu = torch.nn.ReLU()
        self.dropout1 = torch.nn.Dropout(p=dropout)
        self.fc2 = torch.nn.Linear(hidden_size_time, self.pred_len)
        self.dropout2 = torch.nn.Dropout(p=dropout)

        # define layers for combining across the features
        self.fc3 = torch.nn.Linear(self.num_features, hidden_size_feat)
        self.fc4 = torch.nn.Linear(hidden_size_feat, 1)

    def forward(self, x):

        # computation over time
        out = self.fc1(x)
        out = self.relu(out)
        out = self.dropout1(out)
        out = self.fc2(out)
        out = self.relu(out) # has dimension 512 x 7 x 12
        out = self.dropout2(out)
        # computation over features
        out = out.transpose(1,2) # dimension 512 x 12 x 7
        out = self.fc3(out) # dimension 512 x 12 x 20
        out = self.relu(out)
        out = self.fc4(out) # dimension 512 x 12 x 1

        out = out.squeeze(-1) # dimension 512 x 12
        
        return out

# Note that the gradients are stored insize the FC layer objects
# For each training example we need to get rid of these gradients

In [10]:
loss = torch.nn.MSELoss()

In [11]:
for i, (f,l) in enumerate(train_loader):
    print('features shape: ', f.shape)
    print('labels shape: ', l.shape)
    break

features shape:  torch.Size([512, 7, 13])
labels shape:  torch.Size([512, 1])


In [12]:
# define metrics
import numpy as np
epsilon = np.finfo(float).eps

def Wape(y, y_pred):
    """Weighted Average Percentage Error metric in the interval [0; 100]"""
    y = np.array(y)
    y_pred = np.array(y_pred)
    nominator = np.sum(np.abs(np.subtract(y, y_pred)))
    denominator = np.add(np.sum(np.abs(y)), epsilon)
    wape = np.divide(nominator, denominator) * 100.0
    return wape

def nse(y, y_pred):
    y = np.array(y)
    y_pred = np.array(y_pred)
    return (1-(np.sum((y_pred-y)**2)/np.sum((y-np.mean(y))**2)))


def evaluate_model(model, data_loader):
    # following line prepares the model for evaulation mode. It disables dropout and batch normalization if they have 
    # are part of the model. For our simple model it's not necessary. Still I'm going to use it.

    model.eval()
    all_inputs = torch.empty((0, num_features, seq_len))
    all_labels = torch.empty(0, pred_len)
    for inputs, labels in data_loader:
        all_inputs = torch.vstack((all_inputs, inputs))
        all_labels = torch.vstack((all_labels, labels))
    
    with torch.no_grad():
        all_inputs = all_inputs.to(device)
        outputs = model(all_inputs).detach().cpu()
        avg_val_loss = loss(outputs, all_labels)
        nsee = nse(all_labels.numpy(), outputs.numpy())
        wapee = Wape(all_labels.numpy(), outputs.numpy())
        
    print(f'NSE : {nsee}', end=' ')
    print(f'WAPE : {wapee}', end=' ')
    print(f'Validation Loss: {avg_val_loss}')
    model.train()
    return avg_val_loss


In [13]:
def objective(trial):
    # Here we define the search space of the hyper-parameters. Optuna uses byaesian optimization to find the optimal values of the hyperparameters.
    learning_rate = trial.suggest_loguniform('lr', 1e-4, 1e-1)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)
    dropout_p = trial.suggest_uniform('dropout_p', 0.0, 0.5)
    
    model = BasicMLPNetwork(seq_len, pred_len, num_features, dropout_p)
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate, weight_decay=weight_decay)
    
    num_epochs = 15
    best_val_loss = float('inf')
    patience = 2
    
    for epoch in range(num_epochs):
        model.train()
        epoch_loss = []
        for batch_idx, (inputs, labels) in enumerate(train_loader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss_val = loss(outputs, labels)
    
            # calculate gradients for back propagation
            loss_val.backward()
    
            # update the weights based on the gradients
            optimizer.step()
    
            # reset the gradients, avoid gradient accumulation
            optimizer.zero_grad()
            epoch_loss.append(loss_val.item())
    
        avg_train_loss = sum(epoch_loss)/len(epoch_loss)
        print(f'Epoch {epoch+1}: Traning Loss: {avg_train_loss}', end=' ')
        avg_val_loss = evaluate_model(model, val_loader)
    
        # Check for improvement
        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            epochs_no_improve = 0
            # Save the best model
            torch.save(model.state_dict(), 'best_model_trial.pth')
        else:
            epochs_no_improve += 1
            if epochs_no_improve == patience:
                print('Early stopping!')
                # Load the best model before stopping
                model.load_state_dict(torch.load('best_model_trial.pth'))
                break

        # Report intermediate objective value
        trial.report(best_val_loss, epoch)

        # Handle pruning based on the intermediate value
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

    return best_val_loss

study = optuna.create_study(direction='minimize')

# normally you run 100s of trials.
study.optimize(objective, n_trials=20)

print('Number of finished trials:', len(study.trials))
print('Best trial:')
trial = study.best_trial

print('  Value (Best Validation Loss):', trial.value)
print('  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')


[I 2024-11-19 16:24:36,665] A new study created in memory with name: no-name-c27094dc-e043-46e1-8cf0-9e6ea119370a
  learning_rate = trial.suggest_loguniform('lr', 1e-4, 1e-1)
  weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)
  dropout_p = trial.suggest_uniform('dropout_p', 0.0, 0.5)


Epoch 1: Traning Loss: 0.01916072037986686 NSE : -0.01572859287261963 WAPE : 43.30573329081579 Validation Loss: 0.013396549969911575
Epoch 2: Traning Loss: 0.017773940703744107 NSE : -0.016893386840820312 WAPE : 43.37999627116872 Validation Loss: 0.013411909341812134
Epoch 3: Traning Loss: 0.01730379935794654 NSE : 0.04941713809967041 WAPE : 41.90851795757358 Validation Loss: 0.012537333182990551
Epoch 4: Traning Loss: 0.012876993881213486 NSE : 0.46912115812301636 WAPE : 26.004010730044726 Validation Loss: 0.0070018162950873375
Epoch 5: Traning Loss: 0.010350052168867831 NSE : 0.5256786644458771 WAPE : 23.640776635479252 Validation Loss: 0.006255871616303921
Epoch 6: Traning Loss: 0.010038072848067488 NSE : 0.5434865951538086 WAPE : 22.64961717295897 Validation Loss: 0.0060210018418729305
Epoch 7: Traning Loss: 0.009996265862074574 NSE : 0.550794780254364 WAPE : 22.102456457603743 Validation Loss: 0.00592461321502924
Epoch 8: Traning Loss: 0.00992068236900006 NSE : 0.5583174526691437 

[I 2024-11-19 16:26:11,321] Trial 0 finished with value: 0.005485587287694216 and parameters: {'lr': 0.0004182462427868436, 'weight_decay': 4.137577281552585e-05, 'dropout_p': 0.1917303324665699}. Best is trial 0 with value: 0.005485587287694216.


NSE : 0.5798713862895966 WAPE : 20.21062141987742 Validation Loss: 0.005541119258850813


  learning_rate = trial.suggest_loguniform('lr', 1e-4, 1e-1)
  weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)
  dropout_p = trial.suggest_uniform('dropout_p', 0.0, 0.5)


Epoch 1: Traning Loss: 1.9483975153900808 NSE : -1.075688362121582 WAPE : 59.20074934453626 Validation Loss: 0.027376465499401093
Epoch 2: Traning Loss: 0.025183055634241575 NSE : -0.298305869102478 WAPE : 52.95254598555531 Validation Loss: 0.017123490571975708
Epoch 3: Traning Loss: 0.020061713079656256 NSE : -0.03237128257751465 WAPE : 44.239370239889965 Validation Loss: 0.013616050593554974
Epoch 4: Traning Loss: 0.018798368520825327 NSE : -0.011872172355651855 WAPE : 43.04839156786664 Validation Loss: 0.013345684856176376
Epoch 5: Traning Loss: 0.019059648001902494 NSE : -0.00027191638946533203 WAPE : 41.82888482744228 Validation Loss: 0.013192687183618546
Epoch 6: Traning Loss: 0.019261704693351492 NSE : -0.0056732892990112305 WAPE : 41.00424910032945 Validation Loss: 0.013263927772641182
Epoch 7: Traning Loss: 0.01940465645982584 

[I 2024-11-19 16:26:54,189] Trial 1 finished with value: 0.013192687183618546 and parameters: {'lr': 0.07743412458199854, 'weight_decay': 1.4788727760527778e-05, 'dropout_p': 0.13864611869658144}. Best is trial 0 with value: 0.005485587287694216.


NSE : -0.02276754379272461 WAPE : 40.6339296808751 Validation Loss: 0.01348938513547182
Early stopping!
Epoch 1: Traning Loss: 0.02436667114211834 NSE : -0.0013695955276489258 WAPE : 41.300613729844756 Validation Loss: 0.013207166455686092
Epoch 2: Traning Loss: 0.01986140956037253 NSE : -0.007388949394226074 WAPE : 42.70969303034743 Validation Loss: 0.01328655332326889
Epoch 3: Traning Loss: 0.020174183384687232 

[I 2024-11-19 16:27:12,493] Trial 2 finished with value: 0.013207166455686092 and parameters: {'lr': 0.04085243035135183, 'weight_decay': 5.1814023066376746e-05, 'dropout_p': 0.44841376827852025}. Best is trial 0 with value: 0.005485587287694216.


NSE : -0.01131594181060791 WAPE : 43.009027742701036 Validation Loss: 0.01333835069090128
Early stopping!
Epoch 1: Traning Loss: 0.02021454844366753 NSE : -0.33572936058044434 WAPE : 44.23828475506935 Validation Loss: 0.01761707291007042
Epoch 2: Traning Loss: 0.0149258001646552 NSE : 0.17974317073822021 WAPE : 39.98269333548321 Validation Loss: 0.010818450711667538
Epoch 3: Traning Loss: 0.012396582317907337 NSE : 0.49580204486846924 WAPE : 25.607819679916165 Validation Loss: 0.006649917922914028
Epoch 4: Traning Loss: 0.01194335750285411 NSE : 0.5093318223953247 WAPE : 24.05533910709437 Validation Loss: 0.006471472792327404
Epoch 5: Traning Loss: 0.011576157782051788 NSE : 0.5556380748748779 WAPE : 23.356867807913044 Validation Loss: 0.005860734730958939
Epoch 6: Traning Loss: 0.011657671941969318 NSE : 0.44455528259277344 WAPE : 29.799771142707872 Validation Loss: 0.0073258159682154655
Epoch 7: Traning Loss: 0.011231317609898596 

[I 2024-11-19 16:27:55,357] Trial 3 finished with value: 0.005860734730958939 and parameters: {'lr': 0.0038696869690797446, 'weight_decay': 0.00012241976861373418, 'dropout_p': 0.21898644223042274}. Best is trial 0 with value: 0.005485587287694216.


NSE : 0.48120635747909546 WAPE : 28.915960128778867 Validation Loss: 0.00684242183342576
Early stopping!
Epoch 1: Traning Loss: 0.016473092762253328 NSE : -0.0011849403381347656 WAPE : 42.03434146690564 Validation Loss: 0.013204730115830898
Epoch 2: Traning Loss: 0.017385200417644325 NSE : -4.184246063232422e-05 WAPE : 41.72054089654064 Validation Loss: 0.013189652934670448
Epoch 3: Traning Loss: 0.017307955806843637 NSE : -5.8531761169433594e-05 WAPE : 41.57488137941884 Validation Loss: 0.013189874589443207
Epoch 4: Traning Loss: 0.017264709648544957 

[I 2024-11-19 16:28:19,786] Trial 4 finished with value: 0.013189652934670448 and parameters: {'lr': 0.0003027194626768554, 'weight_decay': 0.0008800559582517779, 'dropout_p': 0.28778323694491004}. Best is trial 0 with value: 0.005485587287694216.


NSE : -0.00030875205993652344 WAPE : 41.47719865495885 Validation Loss: 0.01319317426532507
Early stopping!
Number of finished trials: 5
Best trial:
  Value (Best Validation Loss): 0.005485587287694216
  Params:
    lr: 0.0004182462427868436
    weight_decay: 4.137577281552585e-05
    dropout_p: 0.1917303324665699


In [21]:
# need to install plotly and nbformat to plot in jupyter notebook
# I'm not going to the details. But you can visualize many many things.
# source: https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/005_visualization.html

import optuna.visualization as vis

# Optimization history
fig1 = vis.plot_optimization_history(study)
fig1.write_html("optimization_history_mlp.html")

In [19]:
study.trials

[FrozenTrial(number=0, state=TrialState.COMPLETE, values=[0.005485587287694216], datetime_start=datetime.datetime(2024, 11, 19, 16, 24, 36, 665900), datetime_complete=datetime.datetime(2024, 11, 19, 16, 26, 11, 321492), params={'lr': 0.0004182462427868436, 'weight_decay': 4.137577281552585e-05, 'dropout_p': 0.1917303324665699}, user_attrs={}, system_attrs={}, intermediate_values={0: 0.013396549969911575, 1: 0.013396549969911575, 2: 0.012537333182990551, 3: 0.0070018162950873375, 4: 0.006255871616303921, 5: 0.0060210018418729305, 6: 0.00592461321502924, 7: 0.005825395230203867, 8: 0.005693289451301098, 9: 0.005661779083311558, 10: 0.005582711659371853, 11: 0.005526938010007143, 12: 0.005526938010007143, 13: 0.005485587287694216, 14: 0.005485587287694216}, distributions={'lr': FloatDistribution(high=0.1, log=True, low=0.0001, step=None), 'weight_decay': FloatDistribution(high=0.01, log=True, low=1e-05, step=None), 'dropout_p': FloatDistribution(high=0.5, log=False, low=0.0, step=None)}, 