# Introduction to PyTorch and deep-learning
    Thomas Moreau <thomas.moreau@inria.fr>
    Mathurin Massias <mathurin.massias@inria.fr>

In this notebook, you will get familiar with the deep learning framework `PyTorch`. A __deep learning framework__ is a library designed to build deep learning models __easily__, __quickly__ and __efficiently__. It usually contains modules (corresponding to different types of layers) that allow you to build models easily, an automatic differentiation machinery which computes gradients for you, and a series of optimization algorithms. As of today, there are various deep learning frameworks available, among which the most famous may be:

 * PyTorch
 * Keras-Tensorflow

## Table of content

[1. Defining a neural network in PyTorch](#basics)<br>
- [1.1 Pytorch tutorials](#tutorials)<br>
- [1.2 Definition using Sequential](#NNseq)<br>
- [1.3 Definition using a custom class](#NNcustom)<br>
- [1.4 Definition using a custom class without relying on PyTorch modules](#NNcustom+)<br>


[2. Training a neural network in PyTorch](#NNtraining)<br>
- [2.1  Loading openml data](#Loaddata)<br>
- [2.2  The training loop](#TrainLoop)<br>

[3. Using weights and biases](#wandb)<br>

[4. Implementing a Resnet for tabular data](#resnet)<br>

**Import the following libraries and check that Pytorch is running on your computer.**

In [4]:
from __future__ import print_function
from __future__ import division
import os
import torch
from torch import nn
import torch.nn.functional as F
import time
import matplotlib.pyplot as plt
import torch.optim as optim
import openml
from torch.utils.data import Dataset, DataLoader, TensorDataset
import numpy as np
import pandas
from sklearn.model_selection import train_test_split
import wandb

<a id='basics'></a>
# 1 - Defining a neural network in PyTorch

There are several possiblities to define a neural network architecture in PyTorch. In this part we will review three of them, from the simplest (and least flexible) to the most advanced (and most flexible).

The PyTorch documentation is clear and well written. Please use and abuse of it to better understand the various objects that we will use in this notebook. 

<a id='tutorials'></a>

## 1.1 -Tensors

PyTorch uses tensors to store the model inputs, outputs as well as model parameters. Tensors are just a data structure that ressembles numpy arrays or matrices, and their usage resemble that of numpy arrays. The main difference is that PyTorch keeps track of the computation perfomed to obtained a given tensor, in order to be able to perform back-propagation. For a review of basic operations on tensors (addition, multiplication, ...), please refer to the [PyTorch Blitz tutorial](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) and notably the section on [tensors](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py).

<a id='NNseq'></a>
## 1.2 - Definition using Sequential

Using `Sequential` is the simplest way of defining a neural network in PyTorch. Here is an example:

In [5]:
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.Sigmoid(),
    nn.Linear(256, 10),
    nn.Softmax(dim=1)
)

The documentation for `Sequential` is [here](https://pytorch.org/docs/master/generated/torch.nn.Sequential.html#torch.nn.Sequential).

**Q1 - What kind of neural network is it?**

**Q2 - What is the appropriate data input size for such a network?**

**Q3 - Is it suited for a regression task, binary classification taks, multiclass classification task? What does `dim=1` means in the `Softmax`?**

**Q4 - Count the number of parameters of this network. Check this programatically with `model.parameters()`.**

_Solution :_ `solutions/01_pytorch_q04.py`

<a id='NNcustom'></a>
## 1.3 - Definition using a custom class

All PyTorch `Modules` (layers or entire architectures) are implemented as classes with two specific methods: `__init__` and `forward`. Below is an implementation of the same archietcture as above, defining a custom class with the appropriate methods.

In [6]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.hidden = nn.Linear(784, 256)
        self.output = nn.Linear(256, 10)
        
        self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x):
        x = self.hidden(x)
        x = self.sigmoid(x)
        x = self.output(x)
        x = self.softmax(x)
        
        return x

Check the source code of a few PyTorch modules and see how they implement the required methods for example the source code of the [`Linear` layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear).

**Q5 - Acess the weights and biases of each layer, and print their sizes.**
You can use the PyTorch documentation for the `Linear` layer to understand how to access them.

_Solution :_ `solutions/01_pytorch_q05.py`

<a id='NNcustom+'></a>
## 1.4 - Definition using a custom class without relying on PyTorch modules

It is also possible and rather easy in PyTorch to define a neural network without even using existing modules for the layers (modules such as `nn.Linear` above). This flexibility is particularly useful when one wants to experiment with new types of layers, or more generally an architecture that can't be written with the existing modules. It is one of the reasons why PyTorch is appreciated among researchers. However, defining a neural network in this way requires one to define the learnable parameters, intialize them, ...

**Q6 - Bonus question: Implement the same feedforward neural network as above, but this time without using the `nn.Linear` module**.

**Hints -** Use:
- `torch.empty` to create parameters
- `kaiming_uniform_` to initialize weight matrices
- `uniform_` to initialize vectors
- `torch.nn.Parameter` to make parameters learnable
- `matmul`, `sigmoid` and `softmax` for the foward pass

_Solution :_ `solutions/01_pytorch_q06.py`.

_Note: this is an exercice, i.e, this network can very well be implemented with `nn.Linear`, but we implement it without `nn.Linear` for training purposes. Of course, in real life, you should use existing layers when nothing prevents you from doing so._

In [7]:
class My_network(nn.Module):
    def __init__(self, d_in, d_h1, d_out):
        super().__init__()
        # Create the parameters of the network
        W_hidden = ...
        b_hidden = ...
        W_output = ...
        b_output = ...
        
        # Initialize the parameters with nn.init.kaiming_uniform_
        # and nn.init.normal_.
        # One could have chosen another type of initialization
        ...
        
        # Make tensors learnable parameters with torch.nn.Parameter
        ...
        
    def forward(self, x):
        """
        Parameters:
        ----------
        x: tensor, shape (batch_size, d_in)
        """
        # Compute the forward pass

        return h

Check on simulated data below that the forward pass works.

In [None]:
mu = np.zeros(784)
Sigma = np.eye(784)
X = torch.tensor(np.random.multivariate_normal(mu, Sigma, size=10), dtype=torch.float)
model = My_network(d_in=784, d_h1=256, d_out=10)
pred = model(X)
print(pred[0])

# Also check that you can compute the gradient of the model:
torch.sum(pred).backward()
assert next(model.parameters()).grad is not None

<a id='NNtraining'></a>
# 2 - Training a neural network in PyTorch

Up to now, we have learned how to define a neural network in PyTorch, but not how to train it. Let's take the covertype data set and train a feedforward neural network on it.

<a id='Loaddata'></a>
## 2.1 - Loading openml data

Load the covertype dataset. This dataset represents tiles from cartographic data and aims to classify them between elements with forest or not.

In [None]:
task = openml.tasks.get_task(361061) 
dataset = task.get_dataset()

X, y, categorical_indicator, attribute_names = dataset.get_data(
    dataset_format="array", target=dataset.default_target_attribute
)

Create a train/validation/test split.

In [10]:
X_train_val, X_test, y_train_val, y_test = train_test_split(
     X, y, test_size=20000, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(
     X_train_val, y_train_val, test_size=20000, random_state=0)

In [11]:
print(f'Shape of X_train: {X_train.shape}')
print(f'Shape of X_val: {X_val.shape}')
print(f'Shape of X_test: {X_test.shape}')

Shape of X_train: (526602, 10)
Shape of X_val: (20000, 10)
Shape of X_test: (20000, 10)


PyTorch provides tools to make data loading efficient and readable. In particular, it provides the `torch.utils.data.DataLoader` and `torch.utils.data.Dataset` classes (see doc [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) ) which allow you to load the data by batch, using multiprocessing, along with useful options such as data shuffling or applying transformations to the data.

Loading the data by batch is essential when large datasets (for example images datasets) just do not fit in memory, and it is practical to speed up debugging and development iterations (since one does not have to wait for data loading each time the code is launched).

To be able to use the `DataLoader` utility, we will convert our data to `TensorDataset`objects.

In [12]:
trainset = TensorDataset(
    torch.tensor(X_train, dtype=torch.float),
    torch.tensor(y_train, dtype=torch.long)
)
valset = TensorDataset(
    torch.tensor(X_val, dtype=torch.float),
    torch.tensor(y_val, dtype=torch.long)
)
testset = TensorDataset(
    torch.tensor(X_test, dtype=torch.float),
    torch.tensor(y_test, dtype=torch.long)
)

`DataLoader` can then be used:

In [13]:
train_loader = DataLoader(trainset, batch_size=64, shuffle=True)
test_loader = DataLoader(testset, batch_size=20000)
val_loader = DataLoader(valset, batch_size=20000)

The covertype data set is now loaded and ready to be used to train a neural network.

<a id='TrainNNPytorch'></a>
## 2.2 - The training loop

**Q7 - Create a neural network of your choice with the class `nn.Sequential` that is appropriate for the covertype dataset.** Store the model into a variable called `model`.

You can use the `LogSoftmax` output that is compatible with the Negative Log Likelihood Loss (see documentation here https://pytorch.org/docs/stable/nn.html#nllloss)

_Solution:_ `solutions/01_pytorch_q07.py`

In [None]:
model = ...

Now that our model is derived, we have to define a loss (doc [here](https://pytorch.org/docs/stable/nn.html#loss-functions)) and an optimizer (doc [here](https://pytorch.org/docs/stable/optim.html)).

In [None]:
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

Let's create an evaluation function that will compute the validation loss for our model.

**Q8.1 - Can you explain what is the purpose of `torch.no_grad()` and `model.eval/train`.**



In [None]:
def evaluate_model(model):
    n_val = len(val_loader)

    loss_val = 0
    accuracy = 0
    # Compute validation loss
    with torch.no_grad():
        model.eval()
        for xb_val, yb_val in val_loader:
            prob_val = model(xb_val)
            loss = criterion(prob_val, yb_val)
            loss_val += loss.item()

            y_pred = torch.argmax(prob_val, dim=1)
            accuracy += (y_pred == yb_val).to(float).mean().item()
        model.train()

    val_loss = loss_val / n_val
    val_accuracy = accuracy / n_val
    return val_loss, val_accuracy

**Q8 - Train the network you have created during a small number of epochs (for example 10).**

**Hint -** See [here](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#train-the-network) in the PyTorch Blitz tutorial for an example of how to train a network.

_Solution :_ `solutions/01_pytorch_q08.py`

In [None]:
def train(model, optimizer, criterion, n_epochs=10, verbose=True):

    # Loop over epochs
    for e in range(n_epochs):

        # Perform the training steps with the SGD optimizer
        ...

        val_loss, val_accuracy = evaluate_model(model)
        if verbose:
            print("Epoch number", e+1)
            print("------------")
            print(f"Training loss: {training_loss}")
            print(f"Val loss: {val_loss}")
            print(f"Val Accuracy: {val_accuracy}\n")


In [None]:
train(model, optimizer, criterion)

If you want to understand more about what happens when calling when using `loss.backward()` and `optimizer.step()`, the following cell prints a few elements in between the steps.

In [None]:
optimizer = optim.SGD(model.parameters(), 0.01)
epochs = 1

u=0
for xb, yb in trainloader:
    while u <2:
        print('\n Minibatch number', u,'\n')

        # Training pass
        optimizer.zero_grad()
        prob = model(xb)
        
        print('The output probabilities  for the first 2 samples are', '\n', torch.exp(prob[0:2]))
        print('The labels  for the first 2 samples are', yb[0:2],'\n' )
        
        loss = criterion(prob, yb)
              
        print('A few coefs of the model parameters are \n',  next(model.parameters())[0:3, 0:3],'\n')
        print('Before backward step, the gradient of the loss wrt these few coefs of the model parameters are\n', next(model.parameters()).grad ,'\n')

        loss.backward()
        print('After backward step, the gradient of the loss wrt these few coefs of the model parameters are\n', next(model.parameters()).grad[0:3,0:3],'\n')
        optimizer.step()
        print('After optimizer step, these coefs of the model parameters are\n', next(model.parameters())[0:3,0:3],'\n')
        
        u += 1

<a id='wandb'></a>
# 3 - Using weights and biases

Weight and Biases is a platform that eases model development. Today we will focus on:
* experiment tracking: it allows to track and visualize various metrics in real-time (for example the training and validation loss).
* Hyperparameter optimization: it is key to the performance of deep learning models, but can be computation intensive and painful. WandB provides an interface to perform parameter sweeps and visualize the effect of hyperparameters easily.

WandB is free, but you will need to create an account to use it.

**Q9 - Following the instructions provided [here](https://docs.wandb.ai/quickstart), modify your code above to record and visualize in WandB the following metrics: training loss, validation loss and validation accuracy.**

_Solution :_ `solutions/01_pytorch_q09.py`

In [None]:
lr = 0.001
n_epochs = 10
weight_decay = 0
config = {
  "lr": lr,
  "n_epochs": n_epochs,
  "weight_decay": weight_decay
}

# Init a Wandb run. The logs will be recrder in the MLP_covertype project as 
# run `MLP_covertype1`, and asssoicated with the hyperparameters in `config`.
wandb.init(project="MLP_covertype", name="MLP_covertype1", config=config)

model = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 2),
    nn.LogSoftmax(dim=1)
)
optimizer = optim.SGD(model.parameters(), lr=wandb.config.lr, weight_decay=wandb.config.weight_decay)
criterion = nn.NLLLoss()

train(model, optimizer, criterion, verbose=False, use_wandb=True)
wandb.finish()

**Q10 - Following the instructions provided [here](https://docs.wandb.ai/ref/python/sweep#examples), lauch a sweep (hypermarameter search) and visualize the results.**

_Note_ consider the following sweep parameters:

```
'parameters': {
    'lr': {'max': 0.1, 'min': 0.0001},
    'weight_decay': { "values": [0, 1e-5, 1e-4]}
}
```

_Solution :_ `solutions/01_pytorch_q10.py`

In [None]:
# Step 1: Define training function that takes in hyperparameter 
# values from `wandb.config` and uses them to train a model and return metric
def main():
    ...

In [None]:
# Step 2: Define sweep config
sweep_configuration = ...

In [None]:
# Step 3: Initialize sweep by passing in config and start the sweep with `wandb.agent`
...

<a id='resnet'></a>
# 4 - Implementing a Resnet for tabular data

The Resnet architecture presented [Revisiting Deep Learning Models for Tabular Data](https://arxiv.org/pdf/2106.11959.p) is one of the state-of-the-art architectures for tabular data. Its architecture is described in paragraph 3.2 of the paper.

**Q11 - Implement the Resnet architecture and train it on the covertype data.**

Hints:
* You may want to code first the ResNetBlock, then check that the forward pass works, before using it to build the complete RestNet architecture.
* To start with, do not hesitate to code a simplified version of the network (for example without BatchNorm, Dropout, ...). Add components little by little one your simplified version works.
* We are in a classification setting, do not forget the Softmax!
* Choose the size of the hidden layers as you like. Note however that the ResNeBlock needs to output a vector the same size as your input.

_Solution :_ `solutions/01_pytorch_q11.py`

In [None]:
# Test the forward pass of the entire ResNet
resnet = ResNet(d=X.shape[1], d_out=2, dropout_rate=0.5, n_resnet_blocks=2)
pred = resnet(xb)

In [None]:
criterion = nn.NLLLoss()
optimizer = optim.SGD(resnet.parameters(), lr=0.01)
train(resnet, optimizer, criterion, n_epochs=3)