<a href="https://colab.research.google.com/github/woncoh1/EVA5/blob/main/Assignment_Session_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions
1. We have considered many many points in our last 4 lectures. Some of these we have covered directly and some indirectly. They are:
  1. How many layers,
  2. MaxPooling,
  3. 1x1 Convolutions,
  4. 3x3 Convolutions,
  5. Receptive Field,
  6. SoftMax,
  7. Learning Rate,
  8. Kernels and how do we decide the number of kernels?
  9. Batch Normalization,
  10. Image Normalization,
  11. Position of MaxPooling,
  12. Concept of Transition Layers,
  13. Position of Transition Layer,
  14. DropOut
  15. When do we introduce DropOut, or when do we know we have some overfitting
  16. The distance of MaxPooling from Prediction,
  17. The distance of Batch Normalization from Prediction,
  18. When do we stop convolutions and go ahead with a larger kernel or some other alternative (GAP, which we have not yet covered)
  19. How do we know our network is not going well, comparatively, very early
  20. Batch Size, and effects of batch size
  21. Etc (you can add more if we missed it here)
2. Refer to this code: [COLABLINK](https://colab.research.google.com/drive/1uJZvJdi5VprOQHROtJIHy0mnY2afjNlx). WRITE IT AGAIN SUCH THAT IT ACHIEVES:
    1. You can use anything from above you want. 
    2. **99.4% validation (test) accuracy**
    3. **Less than 20k parameters**
    4. **Less than 20 epochs**
    5. **No fully connected layer**
    6. To learn how to add different things we covered in this session, you can refer to this code: https://www.kaggle.com/enwei26/mnist-digits-pytorch-cnn-99 DONT COPY ARCHITECTURE, JUST LEARN HOW TO INTEGRATE THINGS LIKE DROPOUT, BATCHNORM, ETC.
---
**Submission details**
3. This is a slightly time-consuming assignment, please make sure you start early. You are going to spend a lot of effort into running the programs multiple times
4. Once you are done, submit your results in S4-Assignment-Solution
5. You must upload your assignment to a public GitHub Repository. Create a folder called S4 in it, and add your iPynb code in it. THE LOGS MUST BE VISIBLE. Before adding the link to the submission make sure you have opened the file in an "incognito" window. 
6. If you misrepresent your answers, you will be awarded -100% of the score.
7. If you submit Colab Link instead of notebook uploaded on GitHub, or redirect the GitHub page to colab, you will be awarded -50%
8. This assignment is worth 300pts

In [1]:
# define hyperparameters (constants)

# data
BATCH_SIZE = 128
# model
DROPOUT_RATE = 0.1
# optimizer
LEARNING_RATE = 0.01
MOMENTUM = 0.9
# training
EPOCHS = 20

# 1. Prepare data

In [2]:
# Import libraries

from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [3]:
# Construct batches of training and testing data

# for reproducibility
torch.manual_seed(1) 

# GPU configurations for data loaders
use_cuda = torch.cuda.is_available()
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

# divide and process training dataset into batch iterator
train_dataset = datasets.MNIST(
    '../data', 
    train=True,  
    download=True,
    transform=transforms.Compose([
                                  transforms.ToTensor(),
                                  transforms.Normalize((0.1307,), (0.3081,))
                                  ])
)
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE, 
    shuffle=True, 
    **kwargs
)

# divide and process testing dataset into batch iterator
test_dataset = datasets.MNIST(
    '../data', 
     train=False,  
     download=False,
     transform=transforms.Compose([
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.1307,), (0.3081,))
                                   ])
)
test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE, 
    shuffle=True, 
    **kwargs
)

# 2. Build model

In [4]:
# Define model (neural network) structure

class Net(nn.Module):

    # prepare layers and blocks
    def __init__(self, dropout_rate):
        super().__init__()

        # 3x3 feature extraction block #1
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=8, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=8),
            nn.Dropout2d(dropout_rate)
        )
        
        # 3x3 feature extraction block #2
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=8, out_channels=16, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=16),
            nn.Dropout2d(dropout_rate)
        )
        
        # 2x2 max pooling #1
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        # 1x1 convolution for reducing # of parameters
        self.trans = nn.Conv2d(in_channels=16, out_channels=8, 
                                kernel_size=1, padding=1)
        
        # 3x3 feature extraction block #3
        self.conv3 = nn.Sequential(
            nn.Conv2d(in_channels=8, out_channels=16, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=16),
            nn.Dropout2d(dropout_rate)
        )

        # 3x3 feature extraction block #4
        self.conv4 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=16),
            nn.Dropout2d(dropout_rate)
        )

        # 2x2 max pooling #2
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        # 3x3 feature extraction block #5
        self.conv5 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=32),
            nn.Dropout2d(dropout_rate)
        )
        
        # 3x3 feature extraction block #6
        self.conv6 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=32, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=32),
            nn.Dropout2d(dropout_rate)
        )

        # 1x1 convolution for reducing # of output channels to # of classes
        # CEO, so no ReLU, batch norm, dropout, etc
        self.conv7 = nn.Conv2d(in_channels=32, out_channels=10,
                               kernel_size=1, padding=1)
        
        # global average pooling instead of fully connected layer
        self.gap = nn.AvgPool2d(kernel_size=10)
    
    # assemble layers and blocks
    def forward(self, x):
        # input layer
        x = x
        # hidden layers
        x = self.conv1(x) # Input: 28 x 28 x 01 | Kernel: 03 x 03 x 01 x 08 | Output: 28 x 28 x 08, RF: 3
        x = self.conv2(x) # Input: 28 x 28 x 08 | Kernel: 03 x 03 x 08 x 16 | Output: 28 x 28 x 16, RF: 5
        x = self.pool1(x) # Input: 28 x 28 x 16 | Kernel: 02 x 02 x 01 x 16 | Output: 14 x 14 x 16, RF: 10 
        x = self.trans(x) # Input: 14 x 14 x 16 | Kernel: 01 x 01 x 16 x 08 | Output: 16 x 16 x 08, RF: 10
        x = self.conv3(x) # Input: 16 x 16 x 08 | Kernel: 03 x 03 x 08 x 16 | Output: 16 x 16 x 16, RF: 12
        x = self.conv4(x) # Input: 16 x 16 x 16 | Kernel: 03 x 03 x 16 x 16 | Output: 16 x 16 x 16, RF: 14
        x = self.pool2(x) # Input: 16 x 16 x 16 | Kernel: 02 x 02 x 01 x 16 | Output: 08 x 08 x 16, RF: 28
        x = self.conv5(x) # Input: 08 x 08 x 16 | Kernel: 03 x 03 x 16 x 32 | Output: 08 x 08 x 32, RF: 30
        x = self.conv6(x) # Input: 08 x 08 x 32 | Kernel: 03 x 03 x 32 x 32 | Output: 08 x 08 x 32, RF: 32
        x = self.conv7(x) # Input: 08 x 08 x 32 | Kernel: 01 x 01 x 32 x 10 | Output: 10 x 10 x 10, RF: 34
        x = self.gap(x)   # Input: 10 x 10 x 10 | Kernel: 10 x 10 x 01 x 10 | Output: 01 x 01 x 10, RF: 34
        # output layer
        x = x.view(-1, 10) # flatten (# of classes = 10)
        return F.log_softmax(x, dim=1) # fed to NLL loss for cross entropy loss

In [5]:
# Instantiate model into device and
# view model summary

!pip install torchsummary
from torchsummary import summary

device = torch.device("cuda" if use_cuda else "cpu")
model = Net(DROPOUT_RATE).to(device)
summary(model, input_size=(1, 28, 28)) # default input size of MNIST dataset

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              80
              ReLU-2            [-1, 8, 28, 28]               0
       BatchNorm2d-3            [-1, 8, 28, 28]              16
         Dropout2d-4            [-1, 8, 28, 28]               0
            Conv2d-5           [-1, 16, 28, 28]           1,168
              ReLU-6           [-1, 16, 28, 28]               0
       BatchNorm2d-7           [-1, 16, 28, 28]              32
         Dropout2d-8           [-1, 16, 28, 28]               0
         MaxPool2d-9           [-1, 16, 14, 14]               0
           Conv2d-10            [-1, 8, 16, 16]             136
           Conv2d-11           [-1, 16, 16, 16]           1,168
             ReLU-12           [-1, 16, 16, 16]               0
      BatchNorm2d-13           [-1, 16, 16, 16]              32
        Dropout2d-14           [-1, 16,

# 3. Train and test the model

In [6]:
# Instantiate criterion
criterion = nn.NLLLoss()

# Instantiate optimizer
optimizer = optim.SGD(model.parameters(), lr=LEARNING_RATE, momentum=MOMENTUM)

In [7]:
# Define how to train and test model

from tqdm import tqdm

# Optimize model parameters
def train(model: 'neural network', 
          device: 'CPU or GPU', 
          train_loader: 'training batch', 
          optimizer: 'weight-optimizing algorithm', 
          epoch: 'how many cycles of consuming the entire batches'):
  
    # set model to train mode, enabling features like regularization
    model.train() 
    # instantiate progress meter
    pbar = tqdm(train_loader)
    # iterate for every batch
    for batch_idx, (data, target) in enumerate(pbar): 
        # instantiate data and target to device
        data, target = data.to(device), target.to(device) 
        
        # zero previously accumulated gradients for mini-batch update
        optimizer.zero_grad()
        # forward pass
        output = model(data)
        # calculate loss
        loss = criterion(output, target) 
        # calculate gradients 
        loss.backward() 
        # update parameters
        optimizer.step() 
        
        # display progress meter                                            
        pbar.set_description(desc=(f'epoch: {epoch + 1} | '
                                   f'loss= {loss.item():.4f} | '
                                   f'batch_id= {batch_idx}'))

# Test how well model parameters are optimized
def test(model, device, test_loader: 'testing batch'):

    # set model to evaluation mode, disabling features like regularization
    model.eval()
    # iteratively accumulate test loss and # of correct predictions
    test_loss = 0
    correct = 0
    # no backprop
    with torch.no_grad():
        # iterate for every batch
        for data, target in test_loader:
            # instantiate data and target to device
            data, target = data.to(device), target.to(device)
            # forward pass
            output = model(data)

            # sum up batch loss
            test_loss += criterion(output, target).sum().item()  
            # get the index of the max log-probability
            pred = output.argmax(dim=1, keepdim=True)
            # update # of correct predictions  
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    # average test loss per batch
    test_loss /= len(test_loader.dataset)

    # display results
    print((f'\nTest set: Average loss= {test_loss:.4f} | '
           f'Accuracy= {correct}/{len(test_loader.dataset)} '
           f'({100. * correct / len(test_loader.dataset):.1f}%)\n'))

In [8]:
# Bring batches, model, loss function and optimizer together to 
# carry out training and testing of the model
for epoch in range(EPOCHS):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

epoch: 1 | loss= 0.2395 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.41it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0014 | Accuracy= 9601/10000 (96.0%)



epoch: 2 | loss= 0.2131 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.88it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0007 | Accuracy= 9755/10000 (97.5%)



epoch: 3 | loss= 0.2334 | batch_id= 468: 100%|██████████| 469/469 [00:15<00:00, 29.65it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0004 | Accuracy= 9845/10000 (98.5%)



epoch: 4 | loss= 0.0511 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.92it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0003 | Accuracy= 9875/10000 (98.8%)



epoch: 5 | loss= 0.0807 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.55it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0003 | Accuracy= 9893/10000 (98.9%)



epoch: 6 | loss= 0.1188 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.80it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0003 | Accuracy= 9907/10000 (99.1%)



epoch: 7 | loss= 0.0321 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.14it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9915/10000 (99.2%)



epoch: 8 | loss= 0.1501 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.17it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9910/10000 (99.1%)



epoch: 9 | loss= 0.0788 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.11it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9913/10000 (99.1%)



epoch: 10 | loss= 0.0495 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.23it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9915/10000 (99.2%)



epoch: 11 | loss= 0.0868 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.62it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9921/10000 (99.2%)



epoch: 12 | loss= 0.0492 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.09it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9927/10000 (99.3%)



epoch: 13 | loss= 0.0655 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.18it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9928/10000 (99.3%)



epoch: 14 | loss= 0.0714 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.99it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9929/10000 (99.3%)



epoch: 15 | loss= 0.0498 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.10it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9930/10000 (99.3%)



epoch: 16 | loss= 0.0351 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.30it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0001 | Accuracy= 9935/10000 (99.3%)



epoch: 17 | loss= 0.0344 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 29.00it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0001 | Accuracy= 9930/10000 (99.3%)



epoch: 18 | loss= 0.0680 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.99it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0002 | Accuracy= 9929/10000 (99.3%)



epoch: 19 | loss= 0.0349 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.47it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss= 0.0001 | Accuracy= 9936/10000 (99.4%)



epoch: 20 | loss= 0.0541 | batch_id= 468: 100%|██████████| 469/469 [00:16<00:00, 28.92it/s]



Test set: Average loss= 0.0002 | Accuracy= 9929/10000 (99.3%)

