### Reference
https://adventuresinmachinelearning.com/convolutional-neural-networks-tutorial-in-pytorch/ <br>

Dumoulin, V., & Visin, F. (2016). A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285.
### Building a Convolutional Neural Network CNN classifier
**Dataset:** MNIST (0-9 digits) <br>
**CNN architecture:**
1. **input:** 28x28 pixel greyscale image
2. **1st layer: using nn.Sequential object** 
    1. a convolutional layer of 32 channels i.e. filters with each having a 5x5 convolutional kernel and stride = 1 (the convolutional filter slides one step at a time along the x and y axis) with input = 28x28 output = 32x28x28
    2. a ReLU activation is applied on the convolution output
    3. a Max pooling layer having a kernel size of 2x2 and stride = 2 (i.e. the pooling kernel slides 2 steps at a time in both directions x and y) with input = 32x28x28 output = 32x14x14
3. **2nd layer: using nn.Sequential object** 
    1. a convolutional layer of 64 channels i.e. filters with each having a 5x5 convolutional kernel and stride = 1 (the convolutional filter slides one step at a time along the x and y axis) with input = 32x14x14 output = 64x14x14
    2. a ReLU activation is applied on the convolution output
    3. a Max pooling layer having a kernel size of 2x2 and stride = 2 (i.e. the pooling kernel slides 2 steps at a time in both directions x and y) with input = 64x14x14 output = 64x7x7
4. flattening the output of the previous max pooing layer i.e. from (64,7,7) to (3164,1) vector s.t. 64x7x7 = 3164
5. **3rd layer:** a fully connected layer with input = 3164 nodes and output = 1000 nodes
6. **4th layer:** a fully connected layer with input = 1000 nodes and output = 10 nodes = the number of classes (0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10)

**NB:**
- **nn.Sequential method** allows us to create **sequentially ordered layers** in our network 

In [22]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import torchvision.datasets
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.models import LinearAxis, Range1d
import numpy as np

In [23]:
pwd

'C:\\Users\\walmasri\\Documents\\Thèse Cifre\\Tutorials\\PyTorch'

#### Loading Data
1. **create the train and test dataset** using torchvision.datasets.MNIST(root = storage location, train = bool, transform = .transform() torchvision object, download = bool):
    1. root = specifies the folder where the train.pt and test.pt data files exist
    2. train = informs the data set to pickup either the train.pt data file or the test.pt data file
    3. transform = list of transformations to be applied on the input data, Here, we use to do 2 transformations:
        1. convert the data into a PyTorch Tensor
        2. normalize the data into a normal distribution of mean 0.1307 and standard deviation = 0.3081 <br> **NB:** if we have many channels we need to provide the mean and std of each channel in that way:
            1. transforms.Normalize((M1, M2, ... Mn), (Std1, Std2, ... Stdn))
            2. the normalization formula is the following : input[channel] = (input[channel] - mean[channel]) / std[channel]
    4. download = tells the MNIST data set function to download the data (if required) from an online source
2. **Load the dataset** using torch.utils.data.DataLoader, DataLoader has many advantages:
    1. the ability to **shuffle** the data easily
    2. the ability to easily **batch** the data
    3. the abilityto make data consumption more efficient via the ability to load the data in parallel using **multiprocessing**.
    4. A data loader can be used **as an iterator** – so to **extract the data** we can just **use** the standard Python iterators such as **enumerate**

      

In [5]:
# Hyperparameters
num_epochs = 6
num_classes = 10
batch_size = 100
learning_rate = 0.001

# set the train data directory; PyTorch will store them in this location
DATA_PATH = './data/MNIST'
# set the directory of the trained model; We will save the final CNN model parameters in this location 
# when training is complete
MODEL_STORE_PATH = './models/'

In [6]:
# transforms to apply to the data
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root=DATA_PATH, train=True, transform=trans, download=True)
test_dataset = torchvision.datasets.MNIST(root=DATA_PATH, train=False, transform=trans)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST\MNIST\raw\train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ./data/MNIST\MNIST\raw\train-images-idx3-ubyte.gz to ./data/MNIST\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST\MNIST\raw\train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ./data/MNIST\MNIST\raw\train-labels-idx1-ubyte.gz to ./data/MNIST\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST\MNIST\raw\t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ./data/MNIST\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data/MNIST\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST\MNIST\raw\t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ./data/MNIST\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data/MNIST\MNIST\raw
Processing...
Done!


In [7]:
# Split into batches
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

#### Creating the CNN
- The most straight-forward way of creating a neural network structure in PyTorch is by creating a class which **inherits from the nn.Module super class within PyTorch**
- nn.Conv2d(nbr of input channels = 1, nbr of output channels = 32, kernel_size=(x-size=5, y-size=5), stride=1, padding= nbr of padding along x and y axis):
    - the nbr of padding is calculated via the formula: Wout = (Win - K +2P)/S +1 here S = 1, K = 5, Win = input width = 28, Wout = output width = 28 => P = 2; it is applied for width and height but here width = height so we will do it once
    - kernel_size=(x-size=5, y-size=5) since the x-size = y-size ==> we can write kernel_size = 5
- nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, return_indices=False, ceil_mode=False):
    - also here kernel_size=(x-size=2, y-size=2) since the x-size = y-size ==> we can write kernel_size = 2
    - given width = height and following the formula: Wout = (Win - K +2P)/S +1 here S = 2, K = 2, Win = input = 28, Wout = output = 14 => P = 0;
- **drop-out layer** to avoid **over-fitting** in the model
- a fully connected layer nn.Linear(nbr of input nodes, nbr of output nodes)
- **overwriting the forward function**

**NB:**
- we haven't defined a softMax activation for the final classification layer because **CrossEntropyLoss** function **combines both a SoftMax activation and a cross entropy loss** function in the same function 
- To keep track of the accuracy we need to count the nbr of positives:
    1. **torch.max(arg1, arg2)** function returns the index of the maximum value in a tensor; it is a 10-elements tensor containing probabilities of the sample belonging to the digit i such that i = index in the tensor i.e. output.data[0,3] = probability that image sample 0 belongs to the digit 2:
        - The first argument is the tensor to be examined
        - the second argument is the axis over which to determine the index of the maximum 
        - The output tensor from the model will be of size (batch_size, 10). To determine the model prediction, for each sample in the batch we need to find the maximum value over the 10 output nodes. Each of these will correspond to one of the hand written digits. The output node with the highest value will be the prediction of the model. Therefore, we need to set the second argument of the torch.max() function to 1 – this points the max function to examine the output node axis (axis=0 corresponds to the batch_size dimension).
    2. Count the nbr of correct predictions using **(predicted == labels).sum().item()**: Note the output of sum() is still a tensor, so to access it’s value you need to call **.item()**
    3. Divide the number of correct predictions by the batch_size (equivalent to labels.size(0)) to obtain the accuracy.

In [15]:
class ConvNet(nn.Module):
    def __init__(self):
        # inheritance
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

In [16]:
model = ConvNet()

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [17]:
# print(model)

In [18]:
# Train the model
total_step = len(train_loader)
loss_list = []
acc_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Run the forward pass
        outputs = model(images)
        # !!!! we don’t have to call model.forward(images) as nn.Module knows that forward needs to be called when it 
        # executes model(images) 
        # !!!
        loss = criterion(outputs, labels)
        loss_list.append(loss.item()) # The loss is appended to a list that will be used later to plot the progress of the training

        # Backprop and perform Adam optimisation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Track the accuracy
        total = labels.size(0)
        _, predicted = torch.max(outputs.data, 1)
        correct = (predicted == labels).sum().item()
        acc_list.append(correct / total)

        if (i + 1) % 100 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'
                  .format(epoch + 1, num_epochs, i + 1, total_step, loss.item(),
                          (correct / total) * 100))

Epoch [1/6], Step [100/600], Loss: 0.2169, Accuracy: 93.00%
Epoch [1/6], Step [200/600], Loss: 0.0798, Accuracy: 98.00%
Epoch [1/6], Step [300/600], Loss: 0.0622, Accuracy: 99.00%
Epoch [1/6], Step [400/600], Loss: 0.2029, Accuracy: 96.00%
Epoch [1/6], Step [500/600], Loss: 0.0321, Accuracy: 99.00%
Epoch [1/6], Step [600/600], Loss: 0.0552, Accuracy: 97.00%
Epoch [2/6], Step [100/600], Loss: 0.0254, Accuracy: 99.00%
Epoch [2/6], Step [200/600], Loss: 0.0527, Accuracy: 98.00%
Epoch [2/6], Step [300/600], Loss: 0.0062, Accuracy: 100.00%
Epoch [2/6], Step [400/600], Loss: 0.0230, Accuracy: 100.00%
Epoch [2/6], Step [500/600], Loss: 0.0675, Accuracy: 97.00%
Epoch [2/6], Step [600/600], Loss: 0.0213, Accuracy: 100.00%
Epoch [3/6], Step [100/600], Loss: 0.1210, Accuracy: 98.00%
Epoch [3/6], Step [200/600], Loss: 0.0103, Accuracy: 100.00%
Epoch [3/6], Step [300/600], Loss: 0.0227, Accuracy: 99.00%
Epoch [3/6], Step [400/600], Loss: 0.0618, Accuracy: 98.00%
Epoch [3/6], Step [500/600], Loss: 0

### Testing the model
1. **!** Set the model to ***evaluation mode*** by running **model.eval()**: This is a handy function which **disables any drop-out or batch normalization** layers in your model, which will befuddle your model evaluation / testing. 
2. The **torch.no_grad()** statement **disables the autograd functionality** in the model as it is **not needed in model testing / evaluation**, and this will act to **speed up the computations**. 
3. The rest is the same as the accuracy calculations during training, except that in this case, the code iterates through the test_loader.

In [19]:
# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format((correct / total) * 100))

Test Accuracy of the model on the 10000 test images: 99.18 %


### Printing the progress of the CNN during training

In [20]:
p = figure(y_axis_label='Loss', width=850, y_range=(0, 1), title='PyTorch ConvNet results')
p.extra_y_ranges = {'Accuracy': Range1d(start=0, end=100)}
p.add_layout(LinearAxis(y_range_name='Accuracy', axis_label='Accuracy (%)'), 'right')
p.line(np.arange(len(loss_list)), loss_list)
p.line(np.arange(len(loss_list)), np.array(acc_list) * 100, y_range_name='Accuracy', color='red')
show(p)

### Saving the modem

In [21]:
# Save the model and plot
torch.save(model.state_dict(), MODEL_STORE_PATH + 'conv_net_model.ckpt')