![picture](https://drive.google.com/uc?export=view&id=1eCsjNAtjXuXfqBLxeEnsBpOikUO06msr)

<br>

---
---

<div class="alert alert-block alert-warning">
<h1><span style="color:green"> Foundations of Artificial Intelligence<br> (AI701-Fall2022) </span><h1>

<h2><span style="color:green"> Lab-11: Convolutional Neural Network (CNN) for MNIST classification </span><h2>
</div>

---
---

## Loading the dataset from Google Drive (For Google Colab Users only)

Sign in to Google Drive and upload the lab11_material.zip file to the google drive.
Use the following commands to access the dataset.

In [1]:
from google.colab import drive
drive.mount('/content/drive')
!cp -r "/content/drive/MyDrive/lab11_material" "/content/"
!cp -r "/content/drive/MyDrive/lab11_material" "/content/"

Mounted at /content/drive


In [2]:
!unzip -o lab11 material.zip #Unzip the dataset

unzip:  cannot find or open lab11, lab11.zip or lab11.ZIP.


In [4]:
!mv "/content/lab11_material/data" "/content/"
!mv "/content/lab11_material/dataset" "/content/"
!mv "/content/lab11_material/MNISTtools.py" "/content/"

mv: cannot stat '/content/lab11_material/data': No such file or directory
mv: cannot stat '/content/lab11_material/dataset': No such file or directory
mv: cannot stat '/content/lab11_material/MNISTtools.py': No such file or directory


For lab machines users, please download lab11_material.zip. Unzip it and keep the unzipped files and this jupyter notebook in the same working directory.

In [5]:
import numpy as np
import torch
import time
import platform

In [6]:
print(f'Pytorch version: {torch.__version__}')
print(f'cuda version: {torch.version.cuda}')
print(f'Python version: {platform.python_version()}')

Pytorch version: 1.12.1+cu113
cuda version: 11.3
Python version: 3.7.15


In [7]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cuda


## MNIST Data preparation

### Load MNIST data and normalize

In [8]:
from MNISTtools import load, show

In [9]:
xtrain, ltrain = load(dataset='training', path='dataset/')
xtest, ltest = load(dataset='testing', path='dataset/')

In [10]:
def normalize_MNIST_images(x):
    '''
    Args:
        x: data
    '''
    x_norm = x.astype(np.float32)
    return x_norm*2/255-1

In [11]:
# normalization
xtrain = normalize_MNIST_images(xtrain)
xtest = normalize_MNIST_images(xtest)

### Reshape
Torch expects that the input of a convolutional layer is stored in the following format
`Batch size × Number of input channels × Image width × Image height`

In [12]:
# reshape to 3d
xtrain = xtrain.reshape([28,28,-1])[:,:,None,:]
xtest = xtest.reshape([28,28,-1])[:,:,None,:]
print(f'shape of xtrain after reshape is {xtrain.shape}.')
print(f'shape of xtest after reshape is {xtest.shape}.')

shape of xtrain after reshape is (28, 28, 1, 60000).
shape of xtest after reshape is (28, 28, 1, 10000).


In [13]:
# moveaxis
xtrain = np.moveaxis(xtrain, (2,3), (1,0))
xtest = np.moveaxis(xtest, (2,3), (1,0))
print(f'shape of xtrain after moveaxis is {xtrain.shape}.')
print(f'shape of xtest after moveaxis is {xtest.shape}.')

shape of xtrain after moveaxis is (60000, 1, 28, 28).
shape of xtest after moveaxis is (10000, 1, 28, 28).


### Wrap all the data into torch Tensor

In [14]:
xtrain = torch.from_numpy(xtrain)
ltrain = torch.from_numpy(ltrain)
xtest = torch.from_numpy(xtest)
ltest = torch.from_numpy(ltest)

In [15]:
xtrain_gpu = xtrain.to(device)
ltrain_gpu = ltrain.to(device)
xtest_gpu = xtest.to(device)
ltest_gpu = ltest.to(device)

---
## LeNet -5 network

* Convolutional layers can be created as `nn.Conv2d(N, C, K)`. For input images of size `W×H`, the output feature maps have size `[W−K+1]x[H−K+1]`.  

* Maxpooling is implemented like any other non-linear function (such as ReLU or softmax). For input images of size `W×H`, the output feature maps have size `[W/L]×[H/L]`.  

* A fully connected layer can be created as `nn.Linear(M, N)`.

Architecture:  

(a) a convolutional layer connecting the input image to `6` feature maps with `5×5` convolutions (`K=5`) and followed by ReLU and maxpooling (`L=2`)  

(b) a convolutional layer connecting the `6` input channels to `16` output channels with `5×5` convolutions and followed by ReLU and maxpooling (`L=2`)  

(c) a fully-connected layer connecting `16` feature maps to `120` output units and followed by ReLU  

(d) a fully-connected layer connecting `120` inputs to `84` output units and followed by ReLU  

(e) a final linear layer connecting `84` inputs to `10` linear outputs (one for each of our digits)

First layer  
* input: `(28, 28, 1)`  
* after *padding*: `(32, 32, 1)`
* after convolution(kernel=`5x5`): `(28, 28, 6)` where `28=32-5+1`  
* after ReLU: `(28, 28, 6)`  
* after maxpooling(stride=`2x2`): `(14, 14, 6)` $\Rightarrow$ **OUTPUT**  


Second layer
* input: `(14, 14, 6)`
* after convolution(kernel=`5x5`): `(10, 10, 16)`
* after ReLU: `(10, 10, 16)`  
* after maxpooling(stride=`2x2`): `(5, 5, 16)` $\Rightarrow$ **OUTPUT**  


Third layer
* input: `(5, 5, 16)` $\Rightarrow$ `5x5x16=400`  
* after fully-connected: `(120, 1)`
* after ReLU: `(120, 1)` $\Rightarrow$ **OUTPUT**  


Fourth layer
* input: `(120, 1)`
* after fully-connected: `(84, 1)`
* after ReLU: `(84, 1)` $\Rightarrow$ **OUTPUT**  


Fifth layer
* input: `(84, 1)`
* after fully-connected: `(10, 1)`
* after ReLU: `(10, 1)` $\Rightarrow$ **OUTPUT**  

In [16]:
import torch.nn as nn
import torch.nn.functional as F

In [42]:
class LeNet(nn.Module):

    # network structure
    def __init__(self):
        super(LeNet, self).__init__()
        ## TO DO
        self.conv1 = nn.Conv2d(1, 6, 5, padding=2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)
        ##
    def forward(self, x):
        '''
        One forward pass through the network.
        
        Args:
            x: input
        '''
        ## To DO
        print(x.shape)
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        # x = x.view(-1, self.num_flat_features(x))
        # x = F.relu(self.fc1(x))
        # x = F.relu(self.fc2(x))
        # x = self.fc3(x)
        ##
        return x

    def num_flat_features(self, x):
        '''
        Get the number of features in a batch of tensors `x`.
        '''
        size = x.size()[1:]
        return np.prod(size)

### Check the network structure

In [43]:
net = LeNet()
print(net)

<__main__.LeNet object at 0x7f7ebb4f9310>


### Check the network parameters

In [33]:
xtrain.shape
example = xtrain[0]
example = example[np.newaxis, :, :, :]
example = torch.Tensor(example)
example.shape

torch.Size([1, 1, 28, 28])

In [45]:
output = net.forward(example)
output.shape

torch.Size([1, 1, 28, 28])


torch.Size([1, 6, 14, 14])

In [19]:
||for name, param in net.named_parameters():
    print(name, param.size(), param.requires_grad)

conv1.weight torch.Size([6, 1, 5, 5]) True
conv1.bias torch.Size([6]) True
conv2.weight torch.Size([16, 6, 5, 5]) True
conv2.bias torch.Size([16]) True
fc1.weight torch.Size([120, 400]) True
fc1.bias torch.Size([120]) True
fc2.weight torch.Size([84, 120]) True
fc2.bias torch.Size([84]) True
fc3.weight torch.Size([10, 84]) True
fc3.bias torch.Size([10]) True


### The accuracy without backprop

In [20]:
# avoid tracking for gradient during testing and then save some computation time
with torch.no_grad():
    yinit = net(xtest)

In [21]:
_, lpred = yinit.max(1)
print(100 * (ltest == lpred).float().mean())

tensor(10.6500)


`ltest == lpred` generates a tensor with values of `0` and `1`, where `0` means inequal and `1` means equal. Therefore, `(ltest == lpred).float().mean()` implies the accuracy.

### (Mini-Batch) Stochastic Gradient Descent (SGD) with cross-entropy and momentum

**Note**: PyTorch’s CrossEntropyLoss is the composition of a softmax activation with the standard cross-entropy loss.

In [24]:
def backprop_deep(xtrain, ltrain, net, T, B=100, gamma=.001, rho=.9):
    '''
    Backprop.
    
    Args:
        xtrain: training samples
        ltrain: testing samples
        net: neural network
        T: number of epochs
        B: minibatch size
        gamma: step size
        rho: momentum
    '''
    N = xtrain.size()[0]     # Training set size
    NB = N//B                # Number of minibatches
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(net.parameters(), lr=gamma, momentum=rho)
    
    for epoch in range(T):
        running_loss = 0.0
        shuffled_indices = np.random.permutation(NB)
        for k in range(NB):
            # Extract k-th minibatch from xtrain and ltrain
            minibatch_indices = range(shuffled_indices[k]*B, (shuffled_indices[k]+1)*B)
            inputs = xtrain[minibatch_indices]
            labels = ltrain[minibatch_indices]

            # Initialize the gradients to zero
            optimizer.zero_grad()

            # Forward propagation
            outputs = net(inputs)

            # Error evaluation
            labels = labels.long()
            loss = criterion(outputs, labels)

            # Back propagation
            loss.backward()

            # Parameter update
            optimizer.step()

            # Print averaged loss per minibatch every 100 mini-batches
            # Compute and print statistics
            with torch.no_grad():
                running_loss += loss.item()
            if k % 100 == 99:
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, k + 1, running_loss / 100))
                running_loss = 0.0

In [25]:
net = LeNet()

In [26]:
start = time.time()
backprop_deep(xtrain, ltrain, net, T=20)
end = time.time()
print(f'It takes {end-start:.6f} seconds.')

[1,   100] loss: 2.301
[1,   200] loss: 2.290
[1,   300] loss: 2.270
[1,   400] loss: 2.231
[1,   500] loss: 2.123
[1,   600] loss: 1.703
[2,   100] loss: 0.917
[2,   200] loss: 0.592
[2,   300] loss: 0.475
[2,   400] loss: 0.377
[2,   500] loss: 0.345
[2,   600] loss: 0.305
[3,   100] loss: 0.264
[3,   200] loss: 0.253
[3,   300] loss: 0.255
[3,   400] loss: 0.231
[3,   500] loss: 0.210
[3,   600] loss: 0.190
[4,   100] loss: 0.186
[4,   200] loss: 0.181
[4,   300] loss: 0.178
[4,   400] loss: 0.170
[4,   500] loss: 0.144
[4,   600] loss: 0.146
[5,   100] loss: 0.130
[5,   200] loss: 0.128
[5,   300] loss: 0.155
[5,   400] loss: 0.147
[5,   500] loss: 0.133
[5,   600] loss: 0.133
[6,   100] loss: 0.128
[6,   200] loss: 0.121
[6,   300] loss: 0.120
[6,   400] loss: 0.118
[6,   500] loss: 0.106
[6,   600] loss: 0.114
[7,   100] loss: 0.108
[7,   200] loss: 0.119
[7,   300] loss: 0.093
[7,   400] loss: 0.094
[7,   500] loss: 0.107
[7,   600] loss: 0.102
[8,   100] loss: 0.100
[8,   200] 

<h3> Task: <h3/> 
Analyze the impact of modifying the provided LeNet architecture by, (I) adding ONLY ONE additional convolution layer, (2) introducing one or more batch normalization layers, and (3) various data augmentation methods.

In [None]:
# Evaluate on the testing dataset (CPU)
y = net(xtest)
print(100 * (ltest==y.max(1)[1]).float().mean())

tensor(98.5100)


In [None]:
# Network on GPU
net_gpu = LeNet().to(device)
start = time.time()
backprop_deep(xtrain_gpu, ltrain_gpu, net_gpu, T=10)
end = time.time()
print(f'It takes {end-start:.6f} seconds.')

[1,   100] loss: 2.302
[1,   200] loss: 2.295
[1,   300] loss: 2.287
[1,   400] loss: 2.274
[1,   500] loss: 2.252
[1,   600] loss: 2.207
[2,   100] loss: 2.065
[2,   200] loss: 1.642
[2,   300] loss: 0.933
[2,   400] loss: 0.549
[2,   500] loss: 0.425
[2,   600] loss: 0.346
[3,   100] loss: 0.290
[3,   200] loss: 0.274
[3,   300] loss: 0.244
[3,   400] loss: 0.245
[3,   500] loss: 0.206
[3,   600] loss: 0.198
[4,   100] loss: 0.183
[4,   200] loss: 0.186
[4,   300] loss: 0.175
[4,   400] loss: 0.161
[4,   500] loss: 0.157
[4,   600] loss: 0.143
[5,   100] loss: 0.144
[5,   200] loss: 0.136
[5,   300] loss: 0.141
[5,   400] loss: 0.135
[5,   500] loss: 0.121
[5,   600] loss: 0.125
[6,   100] loss: 0.117
[6,   200] loss: 0.114
[6,   300] loss: 0.123
[6,   400] loss: 0.118
[6,   500] loss: 0.111
[6,   600] loss: 0.106
[7,   100] loss: 0.101
[7,   200] loss: 0.103
[7,   300] loss: 0.109
[7,   400] loss: 0.104
[7,   500] loss: 0.094
[7,   600] loss: 0.093
[8,   100] loss: 0.097
[8,   200] 

In [None]:
# Re-evaluate on the testing dataset (GPU)
y = net_gpu(xtest_gpu)
print(100 * (ltest==y.max(1)[1].cpu()).float().mean())

tensor(97.9200)


### Congratulations you have successfully completed Lab 11.

---

