# NNIA Assignment 8

**DEADLINE: 19. 1. 2022 08:00 CET**
Submission more than 10 minutes past the deadline will **not** be graded!

- Trevor Atkins & trat00001@uni-saarland.de 
- Tsiamfei Prakapenka & tspr00001@uni-saarland.de 
- Hours of work per person: Prakapenka ~2.5h Atkins ~2h

# Submission Instructions

**IMPORTANT** Please make sure you read the following instructions carefully. If you are unclear about any part of the assignment, ask questions **before** the assignment deadline. All course-related questions can be addressed on the course **[Piazza Platform](https://piazza.com/class/kvc3vzhsvh55rt)**.

* Assignments are to be submitted in a **team of 2**.
* Please include your **names**, **ID's**, **Teams usernames**, and **approximate total time spent per person** at the beginning of the Notebook in the space provided
* Make sure you appropriately comment your code wherever required.
* Your final submission should contain this completed Jupyter Notebook, including the bonus question (if you attempt it), and any necessary Python files.
* Do **not** submit any data or cache files (e.g. `__pycache__`).
* Upload the **zipped** folder (*.zip* is the only accepted extension) in **Teams**.
* Only **one member** of the group should make the submisssion.
* **Important** please name the submitted zip folder as: `Name1_id1_Name2_id2.zip`. The Jupyter Notebook should also be named: `Name1_id1_Name2_id2.ipynb`. This is **very important** for our internal organization epeatedly students fail to do this.

## Neural Network Implementation: Convolutional Neural Networks

### Theory Review (7 points)

#### True or False

After reviewing the lecture slides and Chapter 9 of the Deep learning book on [Convolutional Neural Networks](http://www.deeplearningbook.org/contents/convnets.html), determine whether each of the following statements is **True (T)** or **False (F)**. **Also provide a brief justification as to why you think so.**

a) Pooling needs to be removed for handling inputs of varying size. *(1 pt)* 

b) Given a multilayered convolution neural network, a cell in a second convolutional layer has the same-sized receptive field as a cell in the first convolutional layer. *(1 pt)*

c) In the context of edge detection, a convolutional neural network learns features for each pixel separately. *(1 pt)*

d) There is an exponential increase in kernel parameters when convolutional neural networks's capabilities are increased to handle transformations like rotation, scaling etc. *(1 pt)*

#### Convolutional Neural Networks Parameters

a) In your own words, how are the concepts of *sparse interactions*, *parameter sharing*, and *equivariant representations* applied to convolution neural networks? *(1 pts)*


b) When applying CNN on language sequences, why is it useful to combine kernels of different size? *(1 pts)*


c) If a pixel represent an input unit for an image input to a CNN, what can be considered analogous to a pixel for a linguistic data input? *(1 pts)*  




## <font color="green">Answers</font> 
Theory Review <br>
a.) False - this is because pooling can be essential for handling various input sizes depending on the task such as classifying images of variable size - the input to the classiﬁcation layer must have a ﬁxed size. By varying the size of an offset between pooling regions, the classiﬁcation layer always receives the same number of summary statistics regardless of the input size. 

b.) True - the receptive field is spatially contiguous assigned to a hidden unit or cell.

c.) True - each pixel in the original image is subtracted by a value of a neighboring pixel.

d.) False - we can initially learn multiple filters and then due to pooling just different areas of image will affect final result.

Convolutional Neural Networks Parameters

a.) Sparse interaction in terms of CNN means that input does not influence all the output tensors of the next layer as in FFNN. Parameter sharing means that we apply the same filter in different places instead of learning multiple set of parameters for applying in exactly one place. Equivariant representation means that after changing the input the output will be changed respectively (brighter, for instance) without a need to retrain the model.

b.) We may want to extract information on different levels. Presumably, small kernels are useful for char/morpheme level analysis, while bigger are good for word/sentence level analysis.

c.) A morpheme could be considered analogous to a pixel for a linguistic text data input such as a corpus of sentences, or a phoneme (or a unit of wavelength/frequency of audio input) for linguistic audio data input such as a database of spoken words or phrases. Also, a single char can be considered as a pixel, in my opinion.

### Implementation (3 pts)

In this exercise, we will continue to work with [the PyTorch Datasets Class](https://pytorch.org/vision/stable/datasets.html) to obtain
[the CIFAR10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html). Instead of the simple neural network from the previous assignment, we are going to implement a convolution neural networks (CNN) model to classify the images in this dataset into their proper classes.

Your CNN model will have the following architecture:


* It has five convolution blocks. 
* Each block consists of *convolution*, *max pooling* and *ReLU* operation in that order. 
* We will use 3×3 kernels in all convolutional layers. Set the padding and stride of the convolutional layers so that they maintain the spatial dimensions. 
* Max pooling operations are done with 2×2 kernels, with a stride of 2, thereby halving the spatial resolution each time. 
* Finally, five stacking these five blocks leads to a 512 × 1 × 1 feature map. 
* Classification is achieved by a fully connected layer. 

Implement a class *ConvNet* to define the model described. The ConvNet takes 32 × 32 color images as inputs and has 5 hidden layers with 128, 512, 512, 512, 512 filters, and produces 10-class classification. We will train the convolutional neural networks on the CIFAR-10 dataset. Feel free to incorporate drop-put, batch normalization, and early stopping if desired. Evaluate your trained model on the test set and report your findings. Hyperparameter values and some helper functions are provided in the `solution.py` file where you can continue your implementation.

For loss, you can use nn.CrossEntropyLoss() and for optimization, you can use the Adam optimizer with the learning rate of 2e-3 and weight decay of 0.001. 
       
**Note**: To speed up trainining on the entire dataset, you may want an access to a GPU (CPU runtime > 10 hrs vs < 5 mins GPU). We recommend you make use of [Google Colab](https://colab.research.google.com/?utm_source=scs-index). You may also partition the dataset and work with a smaller sample size if you do not have access to a GPU. For tutorials on how to use Google Colab for deep learning and their file organization, you may find following tutorials helpful: 

* [Neptune AI Blog: Using Google Colab for Deep Learning](https://neptune.ai/blog/how-to-use-google-colab-for-deep-learning-complete-tutorial)  
* [Neptune AI Blog: Dealing with Files in Google Colab](https://neptune.ai/blog/google-colab-dealing-with-files)

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
# from solution import your functions
import torch
import torchvision.transforms as transforms
from torch.utils.data import random_split
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import matplotlib.pyplot as plt
from tqdm import tqdm, trange
from pathlib import Path
from tqdm import tqdm

from typing import List

In [3]:
# global constants
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INPUT_SIZE = 3
NUM_CLASSES = 10
HIDDEN_SIZE = [128, 512, 512, 512, 512, 512]
NUM_EPOCHS = 20  # default is 20, changeable via cl
BATCH_SIZE = 32
LR = 2e-3
LR_DECAY = 0.95
REG = 0.001
TRAINING_SIZE = 49000
VAL_SIZE = 1000
DROP_OUT = 0.2

In [4]:
def get_cifar10_dataset(val_size: int = VAL_SIZE, batch_size: int = BATCH_SIZE):
    """
    Load and transform the CIFAR10 dataset. Make Validation set. Create dataloaders for
    train, test, validation sets. Only train_loader uses batch_size of 200, val_loader and
    test_loader have 1 batch (i.e. batch_size == len(val_set) etc.)

    DO NOT CHANGE THE CODE IN THIS FUNCTION. YOU MAY CHANGE THE BATCH_SIZE PARAM IF NEEDED.

    If you get an error related num_workers, you may change that parameter to a different value.

    :param val_size: size of the validation partition
    :param batch_size: number of samples in a batch
    :return:
    """

    # the datasets.CIFAR getitem actually returns img in PIL format
    # no need to get to Tensor since we're working with our own model and not PyTorch
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.24703233, 0.24348505, 0.26158768))
                                    ])

    # Load the train_set and test_set from PyTorch, transform each sample to a flattened array
    train_set = torchvision.datasets.CIFAR10(root='./data', train=True,
                                             download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False,
                                            download=True, transform=transform)
    classes = train_set.classes

    # Split data and define train_loader, test_loader, val_loader
    train_size = len(train_set) - val_size
    train_set, val_set = random_split(train_set, [train_size, val_size])

    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                               shuffle=True, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=len(test_set),
                                              shuffle=False, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=val_size,
                                             shuffle=False, num_workers=2)

    return train_loader, test_loader, val_loader, classes

In [5]:
class ConvNet(nn.Module):
    def __init__(self):
        """Initializes CNN."""
        super().__init__()
        self.conv1 = nn.Conv2d(INPUT_SIZE, HIDDEN_SIZE[0], (3, 3), padding=1)
        self.conv2 = nn.Conv2d(HIDDEN_SIZE[0], HIDDEN_SIZE[1], (3, 3), padding=1)
        self.conv3 = nn.Conv2d(HIDDEN_SIZE[1], HIDDEN_SIZE[2], (3, 3), padding=1)
        self.conv4 = nn.Conv2d(HIDDEN_SIZE[2], HIDDEN_SIZE[3], (3, 3), padding=1)
        self.conv5 = nn.Conv2d(HIDDEN_SIZE[3], HIDDEN_SIZE[4], (3, 3), padding=1)

        self.fc = nn.Linear(HIDDEN_SIZE[4], NUM_CLASSES)
        self.pool = nn.MaxPool2d(2, 2)
        

    def forward(self, x):
        """Forward path."""
        # [BATCH_SIZE, 3, 32, 32]
        x = self.pool(F.relu(self.conv1(x)))
        # [BATCH_SIZE, 128, 16, 16]
        x = self.pool(F.relu(self.conv2(x)))
        # [BATCH_SIZE, 512, 8, 8]
        x = self.pool(F.relu(self.conv3(x)))
        # [BATCH_SIZE, 512, 4, 4]
        x = self.pool(F.relu(self.conv4(x)))
        # [BATCH_SIZE, 512, 2, 2]
        x = self.pool(F.relu(self.conv5(x)))

        # [BATCH_SIZE, 512, 1, 1]
        x = torch.flatten(x, 1)
        # [BATCH_SIZE, 512]
        x = self.fc(x)

        # [BATCH_SIZE, 10]
        return x

In [6]:
def training(cnn_model: ConvNet, trainloader: torch.utils.data.DataLoader, 
             criterion, optimizer: optim.Optimizer):
    """Trains model and reports metrics."""
    for epoch in range(NUM_EPOCHS):
        total_batches = running_loss = 0
        for i, data in tqdm(enumerate(trainloader, 0)):
            inputs, labels = data
            inputs = inputs.to(DEVICE)
            labels = labels.to(DEVICE)
            
            optimizer.zero_grad()
            outputs = cnn_model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

            total_batches += 1
        print()
        print(f'{epoch + 1} epoch loss: {running_loss / total_batches:.3f}')
    print('Finished Training!')

In [7]:
def evaluation(cnn_model: ConvNet, classes: List[str], 
               testloader: torch.utils.data.DataLoader):
    """Evaluation loop."""
    correct_pred = {classname: 0 for classname in classes}
    total_pred = {classname: 0 for classname in classes}
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images = images.to(DEVICE)
            labels = labels.to(DEVICE)
            outputs = cnn_model(images)
            _, predictions = torch.max(outputs, 1)

            for label, prediction in zip(labels, predictions):
                if label == prediction:
                    correct_pred[classes[label]] += 1
                total_pred[classes[label]] += 1

    for classname, correct_count in correct_pred.items():
        accuracy = 100 * float(correct_count) / total_pred[classname]
        print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

In [8]:
train_loader, test_loader, val_loader, classes = get_cifar10_dataset()

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [9]:
cnn_model = ConvNet().to(DEVICE)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(cnn_model.parameters(), lr=LR, weight_decay=0.001)

In [10]:
training(cnn_model, train_loader, criterion, optimizer)

1532it [00:34, 44.89it/s]


1 epoch loss: 1.567



1532it [00:33, 45.67it/s]


2 epoch loss: 1.162



1532it [00:33, 45.56it/s]


3 epoch loss: 1.012



1532it [00:34, 44.70it/s]


4 epoch loss: 0.924



1532it [00:34, 44.79it/s]


5 epoch loss: 0.857



1532it [00:34, 44.18it/s]


6 epoch loss: 0.820



1532it [00:34, 43.99it/s]


7 epoch loss: 0.788



1532it [00:34, 43.86it/s]


8 epoch loss: 0.759



1532it [00:35, 43.63it/s]


9 epoch loss: 0.741



1532it [00:34, 43.98it/s]


10 epoch loss: 0.726



1532it [00:35, 43.67it/s]


11 epoch loss: 0.715



1532it [00:35, 43.33it/s]


12 epoch loss: 0.708



1532it [00:35, 43.01it/s]


13 epoch loss: 0.695



1532it [00:35, 43.41it/s]


14 epoch loss: 0.687



1532it [00:35, 43.43it/s]


15 epoch loss: 0.672



1532it [00:35, 43.18it/s]


16 epoch loss: 0.671



1532it [00:35, 43.40it/s]


17 epoch loss: 0.668



1532it [00:35, 43.53it/s]


18 epoch loss: 0.661



1532it [00:35, 43.54it/s]


19 epoch loss: 0.651



1532it [00:35, 43.55it/s]


20 epoch loss: 0.649
Finished Training!





In [11]:
evaluation(cnn_model, classes, val_loader)

Accuracy for class: airplane is 70.5 %
Accuracy for class: automobile is 88.6 %
Accuracy for class: bird  is 48.0 %
Accuracy for class: cat   is 59.4 %
Accuracy for class: deer  is 73.5 %
Accuracy for class: dog   is 57.3 %
Accuracy for class: frog  is 75.3 %
Accuracy for class: horse is 88.5 %
Accuracy for class: ship  is 79.5 %
Accuracy for class: truck is 85.9 %


### Bonus

Implement a function to calculate the number of traininable parameters (1 pt) and another function to visualize the filter weights (1 pt). 

In [15]:
def count_parameters(model: nn.Module) -> int:
    """Returns number of trainable params for the model."""
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [16]:
count_parameters(cnn_model)

7678474