# Lab5A - Constructing a CNN Network

For spatial data for example image or video data, Convolutional Neural Network (CNN or ConvNet) performs much better than  standard neural network. In this practical, we shall learn how to build a CNN Network.

#### Objectives:
1. Learn how to build a convolutional neural network (CNN)
2. Learn how to build a network or layer using `sequential` 

Remember to **enable the GPU** (Edit > Notebook setting > GPU) to ensure short training time.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
cd "/content/gdrive/MyDrive/UCCD3074_Labs/UCCD3074_Lab5"

Import the required libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

import torch.optim as optim
from torch.utils.data import DataLoader

from torchsummary import summary

from cifar10 import CIFAR10

%load_ext autoreload
%autoreload 2

---

# SECTION 1. DEFINING A CNN MODULE WITH `torch.nn.Module`

In this section, we create a CNN network using `nn.Module`. The `Module` is the main building block, it defines the base class for all neural network and you MUST subclass it. 

## 1.1 Build the network

**Exercise**. Build the following CNN. You will need to following modules:
* To define a conv2d layer: [`torch.nn.Conv2d(in_channel, out_channel, kernel_size, stride=1, padding=0)`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
   * `in_channel`: number of channels in the input tensor. 
   * `out_channel`: number of channels in the output tensor. This is equivalent to the number of filters in the current convolutional layer.
   * `kernel_size`: size of the filter (`f`).
   * `stride`: stride (`s`). Default value is 1.
   * `padding`: padding (`p`). Default value is 0.

* To define a max pooling layer: [`torch.nn.functional.max_pool2d (x, kernel_size, stride=None, padding=0)`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool2d.html#torch.nn.functional.max_pool2d)
   * `x`: input tensor of shape `(b, c, h, w)`. This is required as this is a `functional` operation.
   * `kernel_size`: size of the filter (`f`).
   * `stride`: stride (`s`). Default value is `kernel_size`.
   * `padding`: padding (`p`). Default value is 0.

* To define a linear layer: [`torch.nn.Linear (in_features, out_features)`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear)
  * `in_features`:  size of each input sample. This is equivalent to the number of units or signals in the previous layer.
  * `out_features`: size of the output sample. This is equivalent to the number of units / neurons in the current layer.

* To define the global average pooling: [`torch.mean (x, dim)`](https://pytorch.org/docs/stable/generated/torch.mean.html)
    * `x`: the input tensor
    * `dim`: the dimensions to reduce. For the input tensor is `(b, c, h, w)`, to compute the mean of the spatial dimensions `h` and `w`, set `dim = [2, 3]`. This will compute the mean for the spatial dimensions and output a tensor of shape `(b, c, 1, 1)`. Then, use [`torch.squeeze`](https://pytorch.org/docs/stable/generated/torch.squeeze.html#torch.squeeze) to remove the two empty dimensions to get a tensor of shape `(b, c)`.

* Alternatively, the global average pooling can be defined using the following command: [`torch.nn.AdaptiveAvgPool2d (output_size)`](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html)
    * `output size`: the target output size (`o`). The layer will configure the kernel size as `(input_size+target_size-1)//target_size` to generate an output tensor of shape `output_size`. 

<br><center><b>Network Architecture </b></center>

|Layer | Name | Description | OutputShape |
|:--:|:--|:---:|---|
| - | Input       | -                            | (?,  3, 32, 32) |
| 1 | conv1       | Conv2d (k=32, f=3, s=1, p=1) | (?, 32, 32, 32) |
|   |             | relu                         | (?, 32, 32, 32) | 
| 2 | conv2       | Conv2d (k=32, f=3, s=1, p=1) | (?, 32, 32, 32) | 
|   |             | relu                         | (?, 32, 32, 32) |
|   | pool1       | maxpool (f=2, s =2, p=0)     | (?, 32, 16, 16) |
| 3 | conv3       | Conv2d (k=64, f=3, s=1, p=1) | (?, 64, 16, 16) | 
|   |             | relu                         | (?, 64, 16, 16) |  
| 4 | conv4       | Conv2d (k=64, f=3, s=1, p=1) | (?, 64, 16, 16) | 
|   |             | relu                         | (?, 64, 16, 16) |  
|   | global_pool | AdaptiveAvgPool (o=(1,1))    | (?, 64, 1, 1)   | 
|   |             | view                         | (?, 64)         | 
| 5 | fc1         | Linear (#units=10)           | (?, 10)         | 

Notes: `k`: number of filters, `f`: filter or kernel size, `s`: stride, `p`: padding, `o`: output shape


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
class Net(nn.Module):
    def __init__(self):
        # call super constructor
        # ... your code here ...

        # create the conv1 layer
        # ... your code here ...
        
        # create the conv2 layer
        # ... your code here ...
        
        # create the conv3 layer
        # ... your code here ...
        
        # create the conv4 layer
        # ... your code here ...
        
        # create the global pooling layer
        # ... your code here ...

        # fully connected layer
        # ... your code here ...
        
    def forward(self, x):

        # conv1 layer
        # ... your code here ...
        pass

        # conv2 layer
        # ... your code here ...
        
        # pooling layer
        # ... your code here ...

        # conv3 layer
        # ... your code here ...
        
        # conv4 layer
        # ... your code here ...

        # global pooling
        # ... your code here ...

        # remove the spatial dimension
        # ... your code here ...

        # fc1 layer
        # ... your code here ...

        return x

Create the network and test it

In [None]:
# ... your code here ...

Display the network

In [None]:
# ... your code here ...

## 1.2 Load the dataset

1. Load the dataset. Define the following transformation pipeline to 
* Convert an image (numpy array with range (0, 255)) to a tensor, and 
*  Normalize the tensor with mean = 0.5 and std = 1.0 

In [None]:
# transform the model
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (1., 1., 1.))
])

# Load the dataset
trainset = CIFAR10(train=True,  transform=transform, download=True, num_samples=10000)
testset  = CIFAR10(train=False,  transform=transform, download=True, num_samples=2000)

print('Size of trainset:', len(trainset))
print('Size of testset:', len(testset))

2. Create the dataloader for train set and test set. Use a batch size of 16, enable shuffle, apply the transformation pipeline defined above and use 2 cpu workers to load the datasets. 

In [None]:
trainloader = DataLoader(trainset, batch_size=16, shuffle=True, num_workers=2)
testloader  = DataLoader(testset, batch_size=16, shuffle=True, num_workers=2)

## 1.3 Train the model

In [None]:
def train(net, trainloader, max_epochs, lr=0.1, momentum=0.9):
    
    loss_iterations = int(np.ceil(len(trainloader)/3))
    
    # transfer model to GPU
    net = net.to(device)
    
    # set the optimizer. Use the SGD optimizer. Use the lr and momentum settings passed by the user
    optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)
    
    # set to training mode
    net.train()

    # variables
    best_loss = np.inf
    saturate_count = 0
    
    # train the network
    for e in range(max_epochs):    

        running_loss = 0
        running_count = 0

        # for all batch samples
        for i, (inputs, labels) in enumerate(trainloader):

            # Clear all the gradient to zero
            optimizer.zero_grad()

            # transfer data to GPU
            inputs = inputs.to(device)
            labels = labels.to(device)

            # forward propagation to get h
            outs = net(inputs)

            # compute loss 
            loss = F.cross_entropy(outs, labels)

            # backpropagation to get gradients of all parameters
            loss.backward()

            # update parameters
            optimizer.step()

            # get the loss
            running_loss += loss.item()
            running_count += 1

             # display the averaged loss value 
            if i % loss_iterations == loss_iterations-1 or i == len(trainloader) - 1:                
                train_loss = running_loss / running_count
                running_loss = 0. 
                running_count = 0.
                print(f'[Epoch {e+1:2d} Iter {i+1:5d}/{len(trainloader)}]: train_loss = {train_loss:.4f}')       
                
                if train_loss < best_loss:
                    best_loss = train_loss
                    saturate_count = 0
                else:
                    saturate_count += 1
                    if saturate_count >= 3:
                        return
    print("Training completed.")

Now, train the model with a maximum number of epochs of 50. The training will stop once it converge and may stop earlier. Use a learning rate of 0.01 and momentum of 0.9.

In [None]:
train(net, trainloader, max_epochs=50, lr=0.01, momentum=0.9)

## 3. Evaluate the model

Now let's evaluate the model. Remember that a 2-layered neural network only achieves an accuracy of around 38%. With a CNN architecture, you should be able to achieve a higher accuracy of more than 50%.

In [None]:
def evaluate(net, testloader):
    
    # set to evaluation mode
    net.eval() 

    # running_correct
    running_corrects = 0

    # Repeat for all batch data in the test set
    for inputs, targets in testloader:

        # transfer to the GPU
        inputs = inputs.to(device)
        targets = targets.to(device)

        # # disable gradient computation
        with torch.no_grad():
            
            # perform inference
            outputs = net(inputs)

            # predict as the best result  
            _, predicted = torch.max(outputs, 1)

            running_corrects += (targets == predicted).double().sum()


    print('Accuracy = {:.2f}%'.format(100*running_corrects/len(testloader.dataset)))

Now, let's evaluate our model.

In [None]:
evaluate(net, testloader)

---
# 2. CREATING A CNN NETWORK DIRECTLY USING torch.nn.Sequential

In this section, we shall learn how to create a network using `torch.nn.Sequential`. `Sequential` is a container of `Modules` that can be stacked together and run at the same time. 

```
net = nn.Sequential(
    nn.Conv2d(....),
    nn.ReLU(),
    ....
)

x = ... # get the input tensor
output = net(x)  # perform inference
```

We can see immediately that it is a very convenient way to build a network.

*Limitations: Note that you cannot add functional operations (e.g., `torch.relu`) into a `Sequential` model. If the `nn` module version does not exist for the function, then you have to create your own `nn` module for the function.*

**Exercise**. Reimplement the network above using `torch.nn.Sequential`. 

Since you cannot use functional operations for `Sequential` models, you use their corresponding module versions:
* `torch.nn.functional.max_pool2d` --> [`torch.nn.MaxPool2d`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d)
* `torch.nn.functional.relu` --> [`torch.nn.ReLU`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU)
* `torch.view` --> [`nn.Flatten`](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html#torch.nn.Flatten)

In [None]:
net2 = # ... your code here ...

In [None]:
summary(net2, (3, 32, 32), device = "cpu")

### Train the model

In [None]:
train(net2, trainloader, max_epochs=50, lr=0.01, momentum=0.9)

### Evaluate the model on the test set

In [None]:
evaluate(net2, testloader)

---
# 3. COMBINING torch.nn.Sequential and torch.nn.Module

### Build the Network

We can embed `nn.Sequential` objects (Section 2) to group layers when defining `nn.Module` definition (Section 1). In the following, we shall group `conv1` and `conv2` into `block_1` and  `conv3` and `conv4` into `block_2`.


| Block |Layer | Name | Description | OutputShape |
|:---:|:--:|:--|:---:|:---:|
|input|-|-|-|(?, 3, 32, 32)|
||||||
| block_1 | 1 <br><br> - <br><br> 2 <br><br> - | conv1 <br><br> ReLU <br><br> conv2 <br><br> ReLU| Conv2d (k=32,f=3,s=1,p=1)<br><br> relu <br><br> Conv2d (k=32,f=3,s=1,p=1)<br><br> relu| (?, 32, 32, 32) <br><br> (?, 32, 32, 32)<br><br> (?, 32, 32, 32)<br><br> (?, 32, 32, 32) | 
||||||
| - | - | pool1 | maxpool (f=2,s=2,p=0) | (?, 32, 16, 16) | 
||||||
| block_2 | 3 <br><br> - <br><br> 4 <br><br> - | conv1 <br><br> ReLU <br><br> conv2 <br><br> ReLU| Conv2d (k=64,f=3,s=1,p=1)<br><br> relu <br><br> Conv2d (k=64,f=3,s=1,p=1)<br><br> relu| (?, 64, 16, 16) <br><br> (?, 64, 16, 16)<br><br> (?, 64, 16, 16)<br><br> (?, 64, 16, 16) | 
||||||
|  | - | global_pool | AdaptiveAvgPool, o=(1,1) | (?, 64, 1, 1) | 
|  | - | -           | view                     | (?, 64) | 
||||||
|  | 5 | fc1         | Linear(#units=10)        | (?, 10) | 
|  | - | -           | view                     | (?, 10) | 

Notes: `k`: number of filters, `f`: filter or kernel size, `s`: stride, `p`: padding, `o`: output shape

To do this, rather than declaring each layer individually, you can declare a  block of multiple layers using `nn.Sequential`:

```
self.conv_block1 = nn.Sequential(
    nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
)
```

In [None]:
class Net3(nn.Module):
    def __init__(self):
        super().__init__()
        
        # define block 1
        self.conv_block1 = # ... your code here ...
        
        # define block 2
        self.conv_block2 = # ... your code here ...

        # define global_pool
        self.global_pool = # ... your code here ...
        
        # define fc1
        self.fc1 = # ... your code here ...
        
    def forward(self, x):
        # block 1
        # ... your code here ...
        
        # max pool
        # ... your code here ...
        
        # block 2
        # ... your code here ...
        
        # global pool
        # ... your code here ...

        # view
        # ... your code here ...
        
        # fc1
        # ... your code here ...
        
        return x

In [None]:
net3 = Net3()
summary(net3, (3, 32, 32), device="cpu")

### Train the model

In [None]:
train(net3, trainloader, max_epochs=50, lr=0.01, momentum=0.9)

### Evaluate the model

In [None]:
evaluate(net3, testloader)

<center>--- End of Practical ---</center>