<a href="https://colab.research.google.com/github/terezaif/amld-pytorch-workshop/blob/master/4-Modules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch Modules

### What is this notebook about?

In this notebook, we will learning about PyTorch modules and the great functionalities they provide. Later on, we'll create a small a multilayer perceptron to perform image classification on MNIST.

___

## Google Colab only!

In [0]:
# execute only if you're using Google Colab
!wget -q https://raw.githubusercontent.com/ahug/amld-pytorch-workshop/master/binder/requirements.txt -O requirements.txt
!pip install -qr requirements.txt

____

In [2]:
import torch
import torch.nn as nn

print("Torch version:", torch.__version__)

Torch version: 1.0.0


In [0]:
import matplotlib.pyplot as plt

In PyTorch, there are many predefined layer like Convolutions, RNN, Pooling, Linear, etc.

These functions are wrapped in **modules** and inherit from the **torch.nn.Module** base class.

When designing a custom model in PyTorch, you should follow this strategy and derive your class from **torch.nn.Module**.

## Modules

In [4]:
print(torch.nn.Module.__doc__)

Base class for all neural network modules.

    Your models should also subclass this class.

    Modules can also contain other Modules, allowing to nest them in
    a tree structure. You can assign the submodules as regular attributes::

        import torch.nn as nn
        import torch.nn.functional as F

        class Model(nn.Module):
            def __init__(self):
                super(Model, self).__init__()
                self.conv1 = nn.Conv2d(1, 20, 5)
                self.conv2 = nn.Conv2d(20, 20, 5)

            def forward(self, x):
               x = F.relu(self.conv1(x))
               return F.relu(self.conv2(x))

    Submodules assigned in this way will be registered, and will have their
    parameters converted too when you call :meth:`to`, etc.
    


### Modules are doing a lot of "magic" under the hood.

- It registers all the parameters of your model.
- It simplifies the saving/loading of your model.
- It provides helper functions to reset/freeze/update the gradients.
- It provides helper functions to put all parameters on a device (GPU).

### What is a torch.nn.Parameter?

A Parameter is a Tensor with `requires_grad` to `True` by default, and which is automatically added to the list of parameters when used within a model.

Let's have a look at the documentation ([torch.nn.Paramter](https://pytorch.org/docs/stable/_modules/torch/nn/parameter.html))

In [0]:
print(torch.nn.Parameter.__doc__)

In [5]:
mod = nn.Conv1d(10, 2, 3)
print(mod.weight)

Parameter containing:
tensor([[[ 0.0924,  0.0328, -0.1005],
         [-0.0339, -0.0312, -0.0266],
         [ 0.0562, -0.1205, -0.1652],
         [-0.0178, -0.0233,  0.1687],
         [-0.1185, -0.1457,  0.1599],
         [-0.1652, -0.1771,  0.1152],
         [-0.0165,  0.1109,  0.0133],
         [ 0.1574, -0.1611,  0.0971],
         [ 0.0605,  0.1632, -0.0926],
         [-0.0165,  0.1370,  0.0801]],

        [[ 0.1369,  0.1705,  0.0673],
         [ 0.1659, -0.0706, -0.0932],
         [ 0.1545, -0.1621, -0.0136],
         [ 0.1234, -0.0541, -0.0975],
         [ 0.0446, -0.0070,  0.1597],
         [ 0.1148,  0.0827, -0.1733],
         [ 0.0719, -0.0418,  0.0889],
         [-0.1656,  0.1242,  0.1478],
         [-0.1533, -0.0688,  0.0158],
         [-0.1719, -0.0966, -0.0085]]], requires_grad=True)


___

## Very simple example of a module

A module has to implemented the `forward` function which is executed during the forward pass.

All your model's submodules and parameters should be instantiated in the `__init__` function. This way PyTorch know that they exist and registers them.

In [0]:
# A simple module
class MySuperSimpleModule(nn.Module):
    def __init__(self, input_size, num_classes):
        super(MySuperSimpleModule, self).__init__()
        self.linear = nn.Linear(input_size, num_classes)
    
    def forward(self, x):
        out = self.linear(x)
        return out

You can use the print function to list a model's submodules/parameters:

In [0]:
model = MySuperSimpleModule(input_size=20, num_classes=5)
print(model)

You can use **`model.parameters()`** to get the list of parameters of your model automatically inferred by PyTorch.

In [0]:
for name, p in model.named_parameters():  # Here we use a sligtly different version of the parameters() function
    print(name, ":\n", p)                 # which also returns the parameter name

___

## Simple network for image classification

![We need to go depper](https://github.com/terezaif/amld-pytorch-workshop/blob/master/figures/deeper.jpeg?raw=1)

## Your turn!

### Let's create a more complicated model.

Implement a simple multilayer perceptron with two hidden layers and the following structure:

![](https://raw.githubusercontent.com/ledell/sldm4-h2o/master/mlp_network.png)

- Input-size: *input_size*
- 1st hidden layer: 75
- 2nd hidden layer: 50
- Output layer: *num_classes*

Additionally, we use `ReLU`s as activation functions.

You will need some PyTorch NN modules - Find them in the [PyTorch doc](https://pytorch.org/docs/master/nn.html) (especially nn.Linear)!

In [0]:
from torch.nn import Parameter
import torch.nn.functional as F  # provides some helper functions like Relu's, Sigmoids, Tanh, etc.


class MyMultilayerPerceptron(nn.Module):
    def __init__(self, input_size, num_classes):
        super(MyMultilayerPerceptron, self).__init__()
        
        self.input_size = input_size
        self.num_classes = num_classes
        
        self.linear_1 = nn.Linear(input_size, 75)
        self.linear_2 = nn.Linear(75, 50)
        self.linear_3 = nn.Linear(50, num_classes)
        
    
    def forward(self, x):
        out = F.relu(self.linear_1(x))
        out = F.relu(self.linear_2(out))
        out = self.linear_3(out) # because of the output it expects..
        return out

### Print your network's parameters

In [0]:
model = MyMultilayerPerceptron(784, 10)
print(model)

### Feed an input to your network

In [0]:
x = torch.rand(16, 784)  # the first dimension is reserved for the 'batch_size'
out = model(x)  # equivalent to model.forward(x)
out[0, :]

___

## Training a model

Most of the functions to train a model follow a similar pattern in PyTorch.
In most of the cases in consists of the following steps:
- Loop over data (in batches)
- Forward pass
- Zero gradients!
- Backward pass
- Parameter update (Optimizer)

In [0]:
def train(model, num_epochs, data_loader, device):
    model = model.to(device)
    
    # Define the Loss function and Optimizer that you want to use
    criterion = nn.CrossEntropyLoss()  
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  # NOTE: model.parameters()
    
    # Outter training loop
    for epoch in range(num_epochs):
        # Inner training loop
        cum_loss = 0
        for (inputs, labels) in data_loader:
            # Prepare inputs and labels for processing by the model (e.g. reshape, move to device, ...)
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            # original shape is [batch_size, 28, 28] because it's an image of size 28x28
            inputs = inputs.view(-1, 28*28)

            # Do Forward -> Loss Computation -> Backward -> Optimization
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            cum_loss += loss.item()
        print("Epoch %d, Loss=%.4f" % (epoch+1, cum_loss/len(train_loader)))

Note:
- we can use the `.to` function on the model directly. Indeed, since PyTorch knows all the model parameters, it can put all the parameters on the correct device.
- we use `model.parameters()` to get all the parameters of the model and we can instantiate an optimizer that will optimize these parameters `torch.optim.SGD(model.parameters())`.
- to apply the forward function of the module, we write `model(input)`. In most cases, `model.forward(inputs)` would also work, but there is a slight difference : PyTorch allows you to register hook functions for a model that are automatically called when you do a forward pass on your model. Using `model(input)` will call these hooks and then call the forward function, while using `model.forward(inputs)` will just silently ignore them.

Do you feel the power of Modules ?

## Loss functions

PyTorch comes with a lot of predefined loss functions :
- L1Loss
- MSELoss
- CrossEntropyLoss
- NLLLoss
- PoissonNLLLoss
- KLDivLoss
- BCELoss
- MarginRankingLoss
- HingeEmbeddingLoss
- MultiLabelMarginLoss
- CosineEmbeddingLoss
- TripletMarginLoss
- ...

Check out the [PyTorch Documentation](https://pytorch.org/docs/master/nn.html#loss-functions).

___

## Let's train our model on the MNIST digit classification task


![MNIST](https://github.com/terezaif/amld-pytorch-workshop/blob/master/figures/mnist.jpeg?raw=1)

First, we have to load the training and test images. MNIST is a widely used dataset, therefore the torchvision package provides simple functionalities to load images from it.

In [8]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms

batch_size = 64

# MNIST Dataset (Images and Labels)
train_dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

# Dataset Loader (Input Batcher)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


### Call the actual training function

In [9]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MyMultilayerPerceptron(input_size=784, num_classes=10)
num_epochs = 5

train(model, num_epochs, train_loader, device)

Epoch 1, Loss=0.3763
Epoch 2, Loss=0.1661
Epoch 3, Loss=0.1195
Epoch 4, Loss=0.0937
Epoch 5, Loss=0.0767


### How can we now assess the model's performance?

This function loops over another `data_loader` (usually containing test/validation data) and computes the model's accuracy on it.

In [0]:
def accuracy(model, data_loader, device):
    with torch.no_grad(): # during model evaluation, we don't need the autograd mechanism (speeds things up)
        correct = 0
        total = 0
        for inputs, labels in data_loader:
            inputs = inputs.to(device)     
            inputs = inputs.view(-1, 28*28)
            
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            
            correct += (predicted.cpu() == labels).sum().item()
            total += labels.size(0)
            
    acc = correct / total
    return acc

In [11]:
accuracy(model, test_loader, device)  # look at: accuracy(model, train_loader, device)

0.9704

### We get an accuracy of ~97.9%, can we do better?

____

## How can we now store our trained model?

In [12]:
torch.save(model, "my_model.pt")

  "type " + obj.__name__ + ". It won't be checked "


In [0]:
my_model_loaded = torch.load("my_model.pt")

In [14]:
model.linear_3.bias, my_model_loaded.linear_3.bias

(Parameter containing:
 tensor([-0.1698,  0.1200,  0.0855, -0.1010,  0.0383,  0.1774, -0.1378, -0.1402,
          0.0647, -0.1008], requires_grad=True), Parameter containing:
 tensor([-0.1698,  0.1200,  0.0855, -0.1010,  0.0383,  0.1774, -0.1378, -0.1402,
          0.0647, -0.1008], requires_grad=True))

____

## Don't forget to download the notebook, otherwise your changes may be lost!

![Download the notebook](https://github.com/terezaif/amld-pytorch-workshop/blob/master/figures/notebook-download.png?raw=1)