<p align="center">
    <img src="https://drive.google.com/uc?id=1DvKhAzLtk-Hilu7Le73WAOz2EBR5d41G" width="400"/>
</p>

---

<p align="center">
<img src="https://pytorch.org/assets/images/pytorch-logo.png" alt="drawing" width="100"/>
</p>



<h1 style="text-align: center;"> Introduction to Pytorch for Deep Learning
  – Exercises</h1>


#### **Afternoon contents/agenda**

1. Understanding the basics:
- [But what is a convolution?](https://www.youtube.com/watch?v=KuXjwB4LzSA&ab_channel=3Blue1Brown)

- [But what is a neural network?](https://www.youtube.com/watch?v=aircAruvnKk&t=1s&ab_channel=3Blue1Brown)

- [What is backpropagation really doing?](https://www.youtube.com/watch?v=Ilg3gGewQ5U&t=2s&ab_channel=3Blue1Brown)

2. In this exercise we will work with a chest x-ray dataset from [MedMnist](https://github.com/MedMNIST/MedMNIST) to tackle a reconstruction problem. Often, bio-engineering datasets have sparse or missing information which are difficult to to avoid due to poor design unexpected failures or restricitions in acquisition times. Interpolation is a common method to pre-process the data to simulate missing data, but fails when the amount of information is large. Here we will use a neural network to predict missing values by learning the distribution of the dataset as opposed to localised operations.

### 2.0 Some imports and utils

In [None]:
try:
  from google.colab import drive
  drive.mount('/content/drive')
except:
  pass

In [None]:
!pip install torchsummary progressbar2 livelossplot monai -q

import torch
import torch.nn as nn
import numpy as np
import random
from torch.utils.data import Dataset
from PIL import Image
from torchsummary import summary
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
from livelossplot import PlotLosses

def set_seed(seed):
    """
    Use this to set ALL the random seeds to a fixed value and take out any randomness from cuda kernels
    """
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.benchmark = True  ##uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms. -
    torch.backends.cudnn.enabled   = True

    return True


def set_device(device="cpu", idx=0):
    if device != "cpu":
        if torch.cuda.device_count() > idx and torch.cuda.is_available():
            print("Cuda installed! Running on GPU {} {}!".format(idx, torch.cuda.get_device_name(idx)))
            device="cuda:{}".format(idx)
        elif torch.cuda.device_count() > 0 and torch.cuda.is_available():
            print("Cuda installed but only {} GPU(s) available! Running on GPU 0 {}!".format(torch.cuda.device_count(), torch.cuda.get_device_name()))
            device="cuda:0"
        else:
            device="cpu"
            print("No GPU available! Running on CPU")
    return device

device = set_device("cuda")



### 2.1  Download and inspect the data using the commands below


In [None]:
!wget https://zenodo.org/record/6496656/files/chestmnist.npz

In [None]:
data = np.load("./chestmnist.npz")
print(data.files)

### 2.3 Create a custom dataset

* Create your own ``Dataset`` derived class that takes as initialisation arguments:
  - ``data_path``, the path to the data
  - a probability for a random mask ``p``,
  - a ``transform`` to be applied to the data,
  - and a ``split`` argument to dictate what part of the data to load (train, validation, test)

* Load the data into an argument ``self.data`` inside the initialisation

* Create a method for your class ``_get_mask``, that generates a binary mask of the size of the sample to randomly erase some data points based on the probability ``p``

* Customise the  ``__getitem__`` class so that it loads a sample from ``self.data`` and returns a masked version of the sample, and the original sample (the former will be input to our network and the later the target)

* Don't forget to set the built-in method ``__len__`` to the correct size

* Instantiate the class for a training set and a validation set. Plot one input and output for each of these sets

In [None]:
class ChestMNIST(Dataset):
    def __init__(self, data_path, split="train", p=0.5, transform=None):
      pass

    def _get_mask(self, img_shape):
      pass

    def __getitem__(self, idx):
      pass

    def __len__(self):
      pass


In [None]:
## Instantiate datasets

## Plots


### 2.4 Modify our ``simpleFFN`` model

* Add two more hidden layers to the model

* Change the size of the output to match the size of the input

* Change the activation of the model to [``Mish``](https://arxiv.org/abs/1908.08681)

* Change the activation of last layer, what should it be?

* Instantiate the model and print a summary


In [None]:
# Modify model
class simpleFFN(nn.Module):
  def __init__(self, input_size, hidden_size_1=100, hidden_size_2=50, output_size=10):
    super(simpleFFN, self).__init__()
    self.hidden_1 = nn.Linear(input_size, hidden_size_1, bias=False)
    self.hidden_2 = nn.Linear(hidden_size_1, hidden_size_2, bias=False)
    self.output = nn.Linear(hidden_size_2, 10, bias=False)
    self.activation = nn.Sigmoid()

  def forward(self, X):
    z1 = self.hidden_1(X.flatten(start_dim=1))
    a1 = self.activation(z1)
    z2 = self.hidden_2(a1)
    a2 = self.activation(z2)
    z3 = self.output(a2)
    a3 = self.activation(z3)
    return a3

In [None]:
# Instantiate and print summary

### 2.5 Prepare parameters and hyperparameters for training

* Set your hyperparameters:
    - seed: 42
    - mask probability: 0.6 (this is a heavy damaged imputation problem! We are only keeping 60% of the information)
    - learning rate: 1e-2
    - weight decay = 1e-6 (applied in optimiser)
    - batch size: 128
    - number of epochs: 30


* Instantiate ``simpleFFN`` as our model with hidden sizes: 150, 50, 50, 150

* Instantiate ``Adam`` as the optimiser

* Instantiate ``MSELoss`` as a criterion

* Collect any list of transformations you think are appropriate for this problem

* Instantiate the training and validation dataset and create the dataloader for each

* Visualise an input and target batch using ``make_grid``

In [None]:
# Hyperparameters


# Training set up: model, optimiser, criterion


# Transforms, Dataset and dataloader


# Visualise a batch sample



### 2.6 Modify training and validation functions

* Make the necessary modifications to the ``train`` and ``valid`` functions from the lecture to adapt to our reconstruction problem

* Does prediction play a role in this problem?

* Is accuracy a suitable metric?


In [None]:
# Modify functions
def train(model, optimizer, criterion, data_loader):
    model.train()
    train_loss, train_accuracy = 0, 0
    for input, target in data_loader:
        input, target = input.to(device), target.to(device)

        optimizer.zero_grad()
        output = model(input)
        loss = criterion(output, target)
        loss.backward()

        train_loss += loss*input.size(0)
        pred = output.softmax(dim=1).max(dim=1)[1]
        train_accuracy += accuracy_score(target.cpu().numpy(), pred.detach().cpu().numpy())*input.size(0)

        optimizer.step()

    train_loss = train_loss / len(data_loader.dataset)
    train_accuracy = train_accuracy/len(data_loader.dataset)
    return train_loss, train_accuracy


def valid(model, criterion, data_loader):
    " Equivalent to the training function without any backpropagation or optimisation steps"
    model.eval()
    valid_loss, valid_accuracy = 0, 0
    with torch.no_grad():
        for input, target in data_loader:
            input, target = input.to(device), target.to(device)

            output = model(input)
            loss = criterion(output, target)

            valid_loss += loss*input.size(0)

            pred = output.softmax(dim=1).max(dim=1)[1]

            valid_accuracy += accuracy_score(target.cpu().numpy(), pred.detach().cpu().numpy())*input.size(0)

        valid_loss = valid_loss / len(data_loader.dataset)
        valid_accuracy = valid_accuracy/len(data_loader.dataset)
        return valid_loss, valid_accuracy


### 2.7 Train and validate the model

* Train your model

* Visualise the output of a validation sample along training

* At the end of training, plot the 32 reconstructed and target samples from a validation batch

* What do you observe?

* Are the results as expected?


In [None]:
# Train model


In [None]:
# Plot recon and target from valid batch

### 2.8 Save model to disk and load

* ``Pytorch`` stores all the parameters of models and optimizers, their weights and biases in an easy to read dictionary called a "state-dict".

* When we store models and optimizers, we store the state-dict.  

* Together with the model definition we can then restore the model to it's state when we stored it to disk.

* Let's look at the contents of the state-dict of both our optimizer and our model:

In [None]:
# Print model's state_dict
print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

# Print optimiser's state_dict
print("Optimiser's state_dict:")
for var_name in optimiser.state_dict():
    print(var_name, "\t", optimiser.state_dict()[var_name])


From colab (and locally) we can store models to disk using ```torch.save``` and passing both a models state_dict() and a path where to store it.

In [None]:
#!mkdir '/content/gdrive/My Drive/models'  ## create the director for storing the model in Google Drive

model_save_name = 'chestmnist_simpleFFN_model.pt'           # .pt and .pth are common file extensions for saving models in pytorch
path = F"/content/gdrive/My Drive/models/{model_save_name}" # use this to store in your Google Drive storage
path = F'./{model_save_name}'                               # use this to store locally (it will be erased once the colab session is over)
torch.save(model.state_dict(), path)

optimiser_save_name = 'chestmnist_simpleFFN_optimiser.pt'
path = F"/content/gdrive/My Drive/models/{optimiser_save_name}"
path = F"./{optimiser_save_name}"
torch.save(model.state_dict(), path)

Finally, we can restore models from the saved ```state_dict```'s and do a number of things such as:
1. Use it as a checkpoint and continue training (given we stored the optimizer as well)
2. Make predictions from our model
3. Perform inspections of our model
4. Use our model in ensembles
5. ...

By default a loaded model is put into ```.train()``` mode. So be careful when using networks that behave different depending on training and test time e.g. dropout regularized networks or batch-normalized networks.

In [None]:
model_save_name = 'chestmnist_simpleFFN_model.pt'           # .pt and .pth are common file extensions for saving models in pytorch
path = F"/content/gdrive/My Drive/models/{model_save_name}" # use this to store in your Google Drive storage
path = F'./{model_save_name}'

model = simpleFFN(input_size= 1 * 28 * 28, hidden_size_1=150, hidden_size_2=50, hidden_size_3=50, hidden_size_4=150).to(device) ## creates an instance of the model
model.load_state_dict(torch.load(path)) ## loads the parameters of the model in path. state_dict is a dictionary object that maps each layer in the model to its trainable parameters (weights and biases).
model.eval()

valid_loss = valid(model, mseloss, valid_loader)
print("Avg. Valid Loss: %1.3f" % valid_loss.item())

### 2.8 Training with Unet

* Instantiate a ``U-net`` using the snipped below.

* For now you do not need to understand what a ``U-net`` is or how it works. This will be explored later in the course.

* Print the summary of the model and have a look at what kind of layers it includes. Search for these layers in the ``Pytorch`` documentation to gain a general understanding of their operations.



In [None]:
from monai.networks.nets import UNet
set_seed(42)
model = UNet(spatial_dims=2, in_channels=1, out_channels=1, channels=(8, 8, 8), strides=(1, 1,), act="mish").to(device)
summ = summary(model, input_size=(1, 28, 28))


* Train the model with the same hyperparameters from before. Don't forget to re-initialise the optimiser with the correct model parameters

* As before, visualise some validation samples along training.

* Plot 32 reconstructed and target samples from the validation batch

* Save your final model

* What differences do you observe from training with a simple feed-forward network? Why do you think that is?

In [None]:
# Instantiate optimiser


In [None]:
# Train model


In [None]:
# Plot recon and target from valid batch

In [None]:
# Save model