# COMP0188 tutorial

* This tutorial is designed to introduce Pytorch, training models with Pytorch and evaluating models using weights and biases. All of which will be critical for the rest of the course
* Proficiency with Python is expected as well as a familiarity with object orientated programming within Python. For further information on Pytorch, please refer to https://pytorch.org/tutorials/beginner/basics/intro.html#learn-the-basics.
* An introductory understanding to machine learning is also expected i.e., data set splitting, bias variance trade off etc. 

Dataset used in this tutorial: https://www.kaggle.com/datasets/mathchi/diabetes-data-set

Connect environment to a GPU by:
* Select 'Runtime' in the top left
* Select 'Change Runtime Type'
* Select the GPU runtime available

In [None]:
!pip install wandb

In [None]:
from google.colab import drive
drive.mount('/content/drive')

As is, setting gpu=True will run the model on the connected GPU. Note, due to the size of the model, this will actually be slower than running in the CPU. See the extended exercise

In [None]:
gpu = False

In [None]:
import wandb
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import os
from typing import Union, Callable, Tuple, List, Literal
from torch.autograd import Variable
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from tqdm import tqdm
from sklearn.model_selection import train_test_split
import random

In [None]:
# Load example dataset
data = pd.read_csv("/content/drive/MyDrive/comp0188/data/diabetes.csv")
print(data.shape)
y_var = "Outcome"
X_vars = [col for col in data.columns if col != y_var]
data.head()

### Tensors

* Pytorch provides 'tensors' which enable efficient linear algebra functionality, auto differentiation and integration with CUDA
    * tens.T performs the transpose of the matrix
    * Try pushing the tens to the GPU with tens.cuda()

In [None]:
tens = torch.tensor(data.values)
print(tens)

### Datasets and dataloaders
Pytorch Datasets and Dataloaders provide a useful API for loading batches of data for deep learning models

##### Dataset
* The 'Dataset' represents the entire training/validation/test data. The \_\_len\_\_ and \_\_getitem\_\_ dunder methods are required for the Dataset class as they: 
    * Define the number of data observations e.g., a single row in a dataset, a single image and; 
    * Allow a single data observation to be retrieved
* The Dataset class simplifies managing large and non-standard datasets as e.g., not all of the data needs to be loaded into RAM at onces etc

##### DataLoader
* The 'Dataloader' handles how a given dataset should be batched. If the output of a dataset.\_\_getitem\_\_ call is a tensor then the base dataloader class can be used however, if non-standard types are being used i.e. dictionaries then defining custom batching is useful

The diabetes dataset used in this tutorial is small and tabular therefore we'll use the standard dataloader and define a custom dataset to handle input data which:
* Is a pandas dataframe;
* Has a 1-dimensional dependant variable which does not require processing
* Has an n-dimensional feature space which requires min-max scaling

In [None]:
class PandasDataset(Dataset):
    def __init__(self, X:pd.DataFrame, y:pd.Series)->None:
        # Your code here
        self._X = torch.from_numpy(X.values).float()
        self._X = self.__min_max_norm(self._X)
        self.feature_dim = X.shape[1]
        self._len = X.shape[0]
        self._y = torch.from_numpy(y.values)[:,None].float()
    
    def __len__(self)->int:
        # Your code here
        return self._len
    
    def __getitem__(self, idx:int) -> Tuple[torch.Tensor, torch.Tensor]:
        # Your code here
        return self._y[idx], self._X[idx,:]
        
    def __min_max_norm(self, in_tens:torch.Tensor) -> torch.Tensor:
        # X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
        # Your code here
        _min = in_tens.min(axis=0).values
        _max = in_tens.max(axis=0).values
        in_tens = (in_tens - _min)/(_max - _min)
        return in_tens
        
        

In [None]:
random.seed(0)
np.random.seed(0)
X_train, X_test, y_train, y_test = train_test_split(data[X_vars], data[y_var], test_size=0.1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1)

In [None]:
train_data = PandasDataset(X=X_train, y=y_train)

Take not of how the \_\_get_item\_\_ dundar method enables indexing via \[\] and the \_\_len\_\_ dundar method allows len() to be called on the object

In [None]:
print(train_data[0])
print(len(train_data))

In [None]:
val_data = PandasDataset(X=X_val, y=y_val)
test_data = PandasDataset(X=X_test, y=y_test)

Given the dataset is only small, a large batch size is not required.

In [None]:
batch_size = 2
shuffle = True

train_dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=shuffle)
val_dataloader = DataLoader(val_data, batch_size=batch_size, shuffle=shuffle)

In [None]:
tmp_loader = DataLoader(train_data, batch_size=2, shuffle=False)

Notice how the dataloader concatenates the observations by adding a new first dimension

In [None]:
print(f"First train example: {train_data[0]} \n with shape {(train_data[0][0].shape, train_data[0][1].shape)}")
print("\n")
print(f"Second train example: {train_data[1]} \n with shape {(train_data[1][0].shape, train_data[1][1].shape)}")
print("\n")
first_batch = tmp_loader.__iter__()._next_data()
print(f"First train example: {first_batch} \n with shape {(first_batch[0].shape, first_batch[1].shape)}")
print("\n")


### Pytorch models
* Pytorch models are developed by subclassing the nn.Module. The core requirement for a Pytorch model is defining the forward method which defines the model's forward pass. The new subclass will most likely make us of other nn.Module subclasses, some of which are:
    * nn.Linear(in_features, out_features) - this defines a single fully connected layer with a given number of input and output features
    * nn.ReLU() - this defines a relu non-linear activation function
* Additionally functionality from Pytorch that are often used in models include:
    * nn.ModuleList() - this the principal way to chain together a number of pytorch Modules using an API similar to python's native 'List' class

Complete the BinaryClassMLP class such that given the input dimensions, a list of hidden layer sizes and activations, an fully connected MLP is defined.

In [None]:
class BinaryClassMLP(nn.Module):
    
    def __init__(
        self, input_dim:int,  hidden_size:List[int],
        actvtns:List[Union[None, nn.Module]], seed=1
    ) -> None:
        
        super().__init__()
        torch.manual_seed(seed)
        assert len(actvtns) == len(hidden_size)
        self.layers = nn.ModuleList()
        # Your code here
        all_layer_size = [*hidden_size, 1]
        _layer = nn.Linear(
                    in_features=input_dim, 
                    out_features=all_layer_size[0]
        )
        self.__init_linear(_layer)
        self.layers.append(_layer)
        if len(hidden_size) > 0:
            for i in np.arange(1, len(all_layer_size)):
                self.layers.append(actvtns[i-1])
                _layer = nn.Linear(
                        in_features=all_layer_size[i-1], 
                        out_features=all_layer_size[i]
                )
                self.__init_linear(_layer)
                self.layers.append(_layer)
    
    def __init_linear(self, layer):
        nn.init.xavier_normal_(layer.weight)
        nn.init.zeros_(layer.bias)
        
    def forward(self, x:torch.Tensor) -> torch.Tensor:
        for layer in self.layers:
            x = layer(x)
        return torch.sigmoid(x)
        

In [None]:
mdl = BinaryClassMLP(
    input_dim=train_data.feature_dim, 
    hidden_size=[64], 
    actvtns=[nn.ReLU()]
)

Once the model has been initialised, it can be used to make predictions by calling the model like a function.

In [None]:
print(mdl(train_data[0][1]))
print(train_data[0][0])
print("\n")
print(mdl(train_data[1][1]))
print(train_data[1][0])

Without being trained the model isn't so discriminative!

### Training pipeline

The train_single_epoch function provides an examplar function that trains an initialised model for a single epoch and returns the batch losses and predictions. Of note:
* model.train(): certain nn.Module functionality such as dropout behaves differently during training and eval so we must tell the model that it is being trained
* optimizer.zero_grad(), train_loss.backward() and optimizer.step(): for every minibatch, gradients are 'accumulated', based on this accumulation, the optimiser takes a 'step'. At the start of a gradient step the previous gradients are set to 0 to reaccumulate - _gradient calculations will be covered later in the course!_

In [None]:
def train_single_epoch(model:nn.Module, data_loader:torch.utils.data.DataLoader, 
                       gpu:Literal[True, False], optimizer:torch.optim,
                       criterion:torch.nn.modules.loss
                      ) -> Tuple[List[torch.Tensor]]:
    model.train()
    losses = []
    preds = []
    range_gen = tqdm(
        enumerate(data_loader),
        )
    for i, (y,X) in range_gen:
        
        if gpu:
            X = X.cuda()
            y = y.cuda()
        else:
            X = Variable(X)
            y = Variable(y)
        
        optimizer.zero_grad()

        # Compute output
        output = model(X)
        preds.append(y)
        train_loss = criterion(output, y)
        losses.append(train_loss.item())

        # losses.update(train_loss.data[0], g.size(0))
        # error_ratio.update(evaluation(output, target).data[0], g.size(0))

        try: 
            # compute gradient and do SGD step
            train_loss.backward()
            
            optimizer.step()
        except RuntimeError as e:
            print("Runtime error on training instance: {}".format(i))
            raise e
    return losses, preds

After each epoch, we would like to evaluate the model. Notice:
* model.eval() now tells the model we are evaluating and ensures functionality such as dropout behave appropriately
* torch.no_grad() tells the model not to calculate gradients since, in evaluation, we do not update the parameters!

Complete the function to calculate the epoch lossses and predictions, take inspiraton from the training function above

In [None]:
def validate(model:nn.Module, data_loader:torch.utils.data.DataLoader,
             gpu:Literal[True, False], criterion:torch.nn.modules.loss
            ) -> Tuple[List[torch.Tensor]]:
    
    model.eval()
    losses = []
    preds = []
    with torch.no_grad():
        range_gen = tqdm(
            enumerate(data_loader),
        )
        # Your code here
        for i, (y,X) in range_gen:
        
            if gpu:
                X = X.cuda()
                y = y.cuda()
            else:
                X = Variable(X)
                y = Variable(y)

            # Compute output
            output = model(X)

            # Logs
            losses.append(criterion(output, y).item())
            preds.append(output)
    return losses, preds


* We can now use the above functions to run a single epoch worth of training and validation _optimisers and learning rates will be covered in the next tutorial. However, please experiment if you wish!_
* nn.BCELoss() is used since we are performing a single class classification task. This is not the only training metric which we can use _again, experiment with others if you wish!_

In [None]:
if gpu:
    mdl.cuda()
optimizer=torch.optim.Adam(mdl.parameters(), lr=0.001)
t_losses, t_preds = train_single_epoch(model=mdl, data_loader=train_dataloader, gpu = gpu, optimizer=optimizer,
                                       criterion=nn.BCELoss())
v_losses, v_preds = validate(model=mdl, data_loader=val_dataloader, gpu = gpu, criterion=nn.BCELoss())
print(np.mean(t_losses))
print(np.mean(v_losses))

The training and validation functions can be incorporated into a single training loop, below

In [None]:
def train(model:torch.nn, train_data_loader:torch.utils.data.DataLoader,
          val_data_loader:torch.utils.data.DataLoader, 
          gpu:Literal[True, False], optimizer:torch.optim,
          criterion:torch.nn.modules.loss, epochs:int
         ) -> Tuple[List[torch.Tensor]]:
    
    if gpu:
        model.cuda()
    
    epoch_train_loss = []
    epoch_val_loss = []
    
    for epoch in range(1, epochs+1):
        print("Running training epoch")
        train_loss_val, train_preds =  train_single_epoch(
            model=model, data_loader=train_data_loader, gpu=gpu, 
            optimizer=optimizer, criterion=criterion)
        mean_train_loss = np.mean(train_loss_val)
        epoch_train_loss.append(mean_train_loss)

        val_loss_val, val_preds = validate(
            model=model, data_loader=val_data_loader, gpu=gpu, 
            criterion=criterion)
        mean_val_loss = np.mean(val_loss_val)
        print("Running validation")
        epoch_val_loss.append(mean_val_loss)
            
    return epoch_train_loss, epoch_val_loss

In [None]:
epochs=10
lr = 0.01

In [None]:
mdl = BinaryClassMLP(
    input_dim=train_data.feature_dim, 
    hidden_size=[64], 
    actvtns=[nn.ReLU()]
)
print(mdl)

optimizer=torch.optim.Adam(mdl.parameters(), lr=lr)

epoch_train_loss, epoch_val_loss = train(
    model=mdl, train_data_loader=train_dataloader, val_data_loader=val_dataloader, gpu = gpu, 
    optimizer=optimizer, criterion=nn.BCELoss(), epochs=epochs
)

In [None]:
fig, axis = plt.subplots(1,1, figsize=(12,9))
axis2 = axis.twinx()
lns1 = axis.plot(range(1,epochs+1), epoch_train_loss, label="Train")
lns2 = axis2.plot(range(1,epochs+1), epoch_val_loss, label="Val", c="red")
axis.set_xlabel("Epoch")
axis.set_ylabel("Binary cross entropy loss")
lns = lns1+lns2
labs = [l.get_label() for l in lns]
axis.legend(lns, labs, loc=0)
plt.show()

### Monitoring
* A huge part of development ML models is experimentation. Tracking these experiments is challenging therefore, we use tools to help! Weights and biases is one such tool!
* The previous training loop is updated to log metrics to weights and biases as well as saving the model parameters to each epoch and pushing them to weights and biases

Run the cells below and explore weights and biases to understand what is being tracked!

In [None]:
def train(model:torch.nn, train_data_loader:torch.utils.data.DataLoader,
          val_data_loader:torch.utils.data.DataLoader, 
          gpu:Literal[True, False], optimizer:torch.optim,
          criterion:torch.nn.modules.loss, epochs:int
         ) -> Tuple[List[torch.Tensor]]:
    if gpu:
        model.cuda()
    
    epoch_train_loss = []
    epoch_val_loss = []
    for epoch in range(1, epochs+1):
        print("Running training epoch")
        train_loss_val, train_preds =  train_single_epoch(
            model=model, data_loader=train_data_loader, gpu=gpu, 
            optimizer=optimizer, criterion=criterion)
        mean_train_loss = np.mean(train_loss_val)
        epoch_train_loss.append(mean_train_loss)
        val_loss_val, val_preds = validate(
            model=model, data_loader=val_data_loader, gpu=gpu, 
            criterion=criterion)

        print("Running validation")
        mean_val_loss = np.mean(val_loss_val)
        epoch_val_loss.append(np.mean(val_loss_val))
        
        wandb.log({"train_loss": mean_train_loss, "val_loss": mean_val_loss})

        chkp_pth = os.path.join(wandb.run.dir, f"mdl_chkpnt_epoch_{epoch}.pt")
        torch.save(
            {
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
            }, chkp_pth)
        wandb.save(chkp_pth)
    return epoch_train_loss, epoch_val_loss

In [None]:
actvtns_lkp = {
    "relu": nn.ReLU()
}
loss_lkp = {
    "bce": nn.BCELoss()
}

In [None]:
wandb.login()

lr = 0.01
hidden_size=[64]
actvtns = ["relu"]
epochs = 10
weight_decay = 0
batch_size = 2
shuffle = True
loss = "bce"

train_data = PandasDataset(X=X_train, y=y_train)
val_data = PandasDataset(X=X_val, y=y_val)

train_dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=shuffle)
val_dataloader = DataLoader(val_data, batch_size=batch_size, shuffle=shuffle)

config={
    "learning_rate": lr,
    "architecture": f"MLP | {'-'.join([str(h) for h in hidden_size])} | {'-'.join(actvtns)}",
    "epochs": epochs,
    "weight_decay": weight_decay,
    "batch_size": batch_size,
    "shuffle": shuffle,
    "loss": loss
    }

wandb.init(project='diabetes_prediction', config=config)
mdl = BinaryClassMLP(
    input_dim=train_data.feature_dim, 
    hidden_size=hidden_size, 
    actvtns=[actvtns_lkp[act] for act in actvtns]
)
print(mdl)
optimizer=torch.optim.Adam(mdl.parameters(), lr=lr, weight_decay=weight_decay)

epoch_train_loss, epoch_val_loss = train(
    model=mdl, train_data_loader=train_dataloader, val_data_loader=val_dataloader, gpu = gpu, 
    optimizer=optimizer, criterion=loss_lkp[loss], epochs=epochs
)

wandb.finish()

### Extended exercise 1
* Update the Dataset class and train functions to make running the model on a GPU more efficient! _Hint: Front load the data being pushed!_

### Extended exercise 2
* Using weights and biases to diagnose model performance, try and develop the best performing model
* Don't evaluate the model on the test set until you are finished with experimentation

In [None]:
test_dataloader = DataLoader(test_data, batch_size=1, shuffle=False)

In [None]:
losses, preds = validate(model=, data_loader=test_dataloader, gpu=gpu, criterion=nn.BCELoss())
print(np.mean(losses))