# Machine Vision - Assignment 3: CNNs in PyTorch

---

Prof. Dr. Markus Enzweiler, Esslingen University of Applied Sciences

markus.enzweiler@hs-esslingen.de

---

### This is the third assignment for the "Machine Vision" lecture.
It covers:
* training and finetuning CNNs for traffic sign recognition
* working with public benchmark datasets ([German Traffic Sign Recognition Benchmark](https://benchmark.ini.rub.de/gtsrb_news.html))

**Make sure that "GPU" is selected in Runtime -> Change runtime type**

To successfully complete this assignment, it is assumed that you already have some experience in Python and numpy. You can either use [Google Colab](https://colab.research.google.com/) for free with a private (dedicated) Google account (recommended) or a local Jupyter installation.

---


## Preparations and Imports


### Package Path

In [None]:
# Package Path (this needs to be adapted)
packagePath = "./" # local
if 'google.colab' in str(get_ipython()):
  packagePath = "/content/drive/MyDrive/a3-cnn/a3-cnn/template"   # Colab

### Import important libraries (you should probably start with these lines all the time ...)

In [None]:
# os, glob, time, logging
import os, glob, time, logging

# NumPy
import numpy as np

# OpenCV
import cv2

# Matplotlib
import matplotlib.pyplot as plt
# make sure we show all plots directly below each cell
%matplotlib inline

# PyTorch
import torch
import torchvision
import torchvision.transforms as transforms

# Some Colab specific packages
if 'google.colab' in str(get_ipython()):
  # image display
  from google.colab.patches import cv2_imshow

  # torchinfo
  %pip install torchinfo


### Some helper functions that we will need

In [None]:
def my_imshow(image, windowTitle=None, size=20, depth=3):
  '''
  Displays an image and differentiates between Google Colab and a local Python installation.

  Args:
    image: The image to be displayed

  Returns:
    -
  '''

  if 'google.colab' in str(get_ipython()):
    print(windowTitle)
    cv2_imshow(image)
  else:
    if (size):
      (h, w) = image.shape[:2]
      aspectRatio = float(h)/w
      figsize=(size, size * aspectRatio)
      plt.figure(figsize=figsize)

    if (windowTitle):
      plt.title(windowTitle)

    if (depth == 1):
      plt.imshow(image, cmap='gray', vmin=0, vmax=255)
    elif (depth == 3):
      plt.imshow(cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
    else:
      plt.imshow(image)
    plt.axis('off')
    plt.show()

### In Google Colab only:
Mount the Google Drive associated with your Google account. You will have to click the authorization link and enter the obtained authorization code here in Colab.

In [None]:
# Mount Google Drive
if 'google.colab' in str(get_ipython()):
  from google.colab import drive
  drive.mount('/content/drive', force_remount=True)

### PyTorch Trainer and Test Class

In the project package, I have provided a Python file `torchHelpers.py` that contains two classes `Trainer` and `Tester`. These classes contain the neural network training loop and test code, similar to what you have already seen in the tutorial.

The classes can be used as follows (see the documentation of the individual classes):

```
# Train a neural network model
# create a trainer
trainer = Trainer(model, lossFunction, optimizer, device, logLevel=logging.INFO)
# train the model
trainer.train(trainLoader, valLoader, numberOfEpochs)

# Test a neural network model
# create a tester
tester = Tester(model, device, logLevel=logging.INFO)
# test the model
tester.test(testLoader)
```

Error and accuracy metrics are available after training / testing via `trainer.metrics` and `tester.metrics`.

In [None]:
import sys
sys.path.append(packagePath)
print("If the import does not work, most likely your 'packagePath' is not set correctly!")
print(f'packagePath: {packagePath}')

from torchHelpers import Trainer, Tester

help(Trainer)

## Exercise 1 - Traffic Sign Classification using Convolutional Neural Networks in PyTorch (10 points)

In this exercise you will train a convolutional neural network using PyTorch on the [German Traffic Sign Recognition Benchmark](https://benchmark.ini.rub.de/gtsrb_news.html) dataset. There will be no previous feature transform, i.e. the raw pixel values are the input to the neural network.

**For that, you can use exactly the same procedure as in the previous assignment, with the same datasets, loss function, optimizer, hyperparameters, `Trainer` and `Tester` class. You will just have to define a different network model.**

### Automatically select the best available device (**PROVIDED**)

**In Colab: Make sure that "GPU" is selected in Runtime -> Change runtime type**

You should have a GPU device available, e.g.:

```
Using device: cuda
Tesla T4
```

We will transfer our model and data to this device later on in the Assignment. PyTorch takes care of all the particular device handling automatically, i.e. we will not have to explicitly deal with CUDA / MPS.

In [None]:
# Check the devices that we have available and prefer CUDA over MPS and CPU
def autoselectDevice(verbose=1):

    # default: CPU
    device = torch.device('cpu')

    if torch.cuda.is_available():
        # CUDA
        device = torch.device('cuda')
    elif torch.backends.mps.is_available() and torch.backends.mps.is_built():
        # MPS (acceleration on Apple M-series SoC)
        device = torch.device('mps')

    if verbose:
        print('Using device:', device)

    # Additional Info when using cuda
    if verbose and device.type == 'cuda':
        print(torch.cuda.get_device_name(0))

    return device

# We transfer our model and data later to this device. If this is a GPU
# PyTorch will take care of everything automatically.
device = autoselectDevice(verbose=1)


### Getting familiar with the GTSRB dataset (**PROVIDED**)

In [None]:
# GTSRB is available as standard dataset in PyTorch. Nice :)

# Data is loaded and processed in batches of 'batchSize' images
batchSize = 24

# We can add a chain of transforms that is applied to the original data, e.g.
#    resize all images to the same dimensions, e.g. 64x64 pixels
#    convert (batch of) images to a tensor
#    normalize pixel values (to 0-1)

transform = transforms.Compose(
    [transforms.Resize((64, 64)), # resize to 64x64 pixels
     transforms.ToTensor()        # convert to tensor. This will also normalize pixels to 0-1
     ])

# We construct several DataLoaders that take care of loading, storing, caching, pre-fetching the dataset.
# We will have one DataLoader for training, validation and test data.

# Training data
trainSet = torchvision.datasets.GTSRB(root='./data', split='train',
                                      download=True, transform=transform)
trainLoader = torch.utils.data.DataLoader(trainSet, batch_size=batchSize,
                                          shuffle=True, pin_memory=True, num_workers=2)
numTrainSamples = len(trainSet)

# Validation and test data
# GTSRB only provides a single test set. To create a validation and test set,
# we split the original GTSRB test set into two parts. The validation set is
# used to tune performance during training. The test set is only used AFTER
# training to evaluation the final performance.

gtsrbTestSet = torchvision.datasets.GTSRB(root='./data', split='test',
                                          download=True, transform=transform)

# Split the original GTSRB test data into 75% used for validation and 25% used for testing
# We do not need to shuffle the data, as we are processing every validation / test image exactly once
length75Percent = int(0.75 * len(gtsrbTestSet))
length25Percent = len(gtsrbTestSet) - length75Percent
lengths = [length75Percent, length25Percent]
valSet, testSet = torch.utils.data.random_split(gtsrbTestSet, lengths)
valLoader = torch.utils.data.DataLoader(valSet, batch_size=batchSize,
                                        shuffle=False, pin_memory=True, num_workers=2)
numValSamples = len(valSet)

testLoader = torch.utils.data.DataLoader(testSet, batch_size=batchSize,
                                         shuffle=False, pin_memory=True, num_workers=2)
numTestSamples = len(testSet)

# Available traffic sign classes in the dataset
classes = [
    "Speed limit (20km/h)",
    "Speed limit (30km/h)",
    "Speed limit (50km/h)",
    "Speed limit (60km/h)",
    "Speed limit (70km/h)",
    "Speed limit (80km/h)",
    "End of speed limit (80km/h)",
    "Speed limit (100km/h)",
    "Speed limit (120km/h)",
    "No passing",
    "No passing for vehicles over 3.5 metric tons",
    "Right-of-way at the next intersection",
    "Priority road",
    "Yield",
    "Stop",
    "No vehicles",
    "Vehicles over 3.5 metric tons prohibited",
    "No entry",
    "General caution",
    "Dangerous curve to the left",
    "Dangerous curve to the right",
    "Double curve",
    "Bumpy road",
    "Slippery road",
    "Road narrows on the right",
    "Road work",
    "Traffic signals",
    "Pedestrians",
    "Children crossing",
    "Bicycles crossing",
    "Beware of ice/snow",
    "Wild animals crossing",
    "End of all speed and passing limits",
    "Turn right ahead",
    "Turn left ahead",
    "Ahead only",
    "Go straight or right",
    "Go straight or left",
    "Keep right",
    "Keep left",
    "Roundabout mandatory",
    "End of no passing",
    "End of no passing by vehicles over 3.5 metric tons",
]

numClasses = len(classes)

### Print dataset statistics (**PROVIDED**)

In [None]:
print("Dataset Statistics")
print(f"  # of training samples:   {numTrainSamples}")
print(f"  # of validation samples: {numValSamples}")
print(f"  # of test samples:       {numTestSamples}")
print(f"  # of different classes:  {numClasses}")

### Visualize the data (**PROVIDED**)

In [None]:
# Visualize a random batch of data from the data set

def imshow(img):
    npimg = img.cpu().numpy() # make sure image is in host memory

    # normalize to 0-1 for visualization
    minPixel = np.min(npimg)
    maxPixel = np.max(npimg)
    npimg = npimg - minPixel
    npimg = npimg / (maxPixel-minPixel)

    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.axis("off")
    plt.show()


numRows = 8

# get a single random batch of training images
dataIter = iter(trainLoader)
images, labels = next(dataIter)

# print labels
for i in range( batchSize // numRows ):
    print('\n'.join(f'Image {j:2d}: {classes[labels[j]]:5s}' for j in range((i*numRows), (i*numRows)+numRows)))

# show images
imshow(torchvision.utils.make_grid(images, nrow=numRows))

In [None]:
print("Dataset Statistics")
print(f"  # of training samples:   {numTrainSamples}")
print(f"  # of validation samples: {numValSamples}")
print(f"  # of test samples:       {numTestSamples}")
print(f"  # of different classes:  {numClasses}")

### Define a Convolutional Neural Network (**add your code here**)

We want to design a Convolutional Neural Network as seen in this [tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html). The overall structure of our network class is the same as before for the case of multilayer perceptrons. We will however use additional layers (convolutional and pooling layers).

We will need the following layers (input to output):

* The input will be the raw pixel values, i.e. 64x64x3 (**not flattened as for the multilayer perceptron**)
* 1 [`torch.nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) convolutional layer with 16 output channels (different features), 3x3 kernels and `padding='same'`
* 1 [`torch.nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) convolutional layer with 16 output channels (different features), 3x3 kernels and `padding='same'`
* 1 [`torch.nn.MaxPool2d`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d) pooling layer with 2x2 kernels
* 1 [`torch.nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) convolutional layer with 32 output channels (different features), 3x3 kernels and `padding='same'`
* 1 [`torch.nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) convolutional layer with 32 output channels (different features), 3x3 kernels and `padding='same'`
* 1 [`torch.nn.MaxPool2d`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d) pooling layer with 2x2 kernels
* 1 [`torch.nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) convolutional layer with 64 output channels (different features), 3x3 kernels and `padding='same'`
* 1 [`torch.nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) convolutional layer with 64 output channels (different features), 3x3 kernels and `padding='same'`
* 1 [`torch.nn.MaxPool2d`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d) pooling layer with 2x2 kernels
* 1 [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) fully connected layer with 64 neurons
* The output layer is a [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) fully connected layer with `numClasses` neurons, one neuron per class
* All neurons except for neurons in the output layer and pooling layers should have [`torch.nn.functional.leaky_relu`](https://pytorch.org/docs/stable/generated/torch.nn.functional.leaky_relu.html#torch.nn.functional.leaky_relu) activation functions
* **Important: The output layer must not have any activation function. It will be automatically applied in the loss computation (softmax activation for CrossEntropy loss, as seen in the lecture).**


The model summary should look similar to:
```
==========================================================================================
Layer (type (var_name))                  Output Shape              Param #
==========================================================================================
ConvNet (ConvNet)                        [1, 43]                   --
├─Conv2d (conv1)                         [1, 16, 64, 64]           448
├─Conv2d (conv2)                         [1, 16, 64, 64]           2,320
├─MaxPool2d (pool)                       [1, 16, 32, 32]           --
├─Conv2d (conv3)                         [1, 32, 32, 32]           4,640
├─Conv2d (conv4)                         [1, 32, 32, 32]           9,248
├─MaxPool2d (pool)                       [1, 32, 16, 16]           --
├─Conv2d (conv5)                         [1, 64, 16, 16]           18,496
├─Conv2d (conv6)                         [1, 64, 16, 16]           36,928
├─MaxPool2d (pool)                       [1, 64, 8, 8]             --
├─Linear (fc1)                           [1, 64]                   262,208
├─Linear (fc2)                           [1, 43]                   2,795
==========================================================================================
Total params: 337,083
Trainable params: 337,083
Non-trainable params: 0
Total mult-adds (M): 40.01
==========================================================================================
Input size (MB): 0.05
Forward/backward pass size (MB): 1.84
Params size (MB): 1.35
Estimated Total Size (MB): 3.23
==========================================================================================
```


In [None]:
###### YOUR CODE GOES HERE ######

import torch.nn as nn
import torch.nn.functional as F


class ConvNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
    self.conv2 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.conv3 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
    self.conv4 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1)
    self.conv5 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
    self.conv6 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=1)
    self.fc1 = nn.Linear(in_features=64*8*8, out_features=64)
    self.fc2 = nn.Linear(in_features=64, out_features=43)

  def forward(self,x):
    x = F.leaky_relu(self.conv1(x))
    x = F.leaky_relu(self.conv2(x))
    x = self.pool(x)
    x = F.leaky_relu(self.conv3(x))
    x = F.leaky_relu(self.conv4(x))
    x = self.pool(x)
    x = F.leaky_relu(self.conv5(x))
    x = F.leaky_relu(self.conv6(x))
    x = self.pool(x)
    x = x.view(-1, 64*8*8)  # Flatten the tensor
    x = F.leaky_relu(self.fc1(x))
    x = self.fc2(x)  # No activation function for the output layer
    return x
#################################


In [None]:
# Instantiate the CNN
cnn = ConvNet()
from torchinfo import summary
summary(cnn, input_size=(1, 3, 64, 64), row_settings=["var_names"])

### Train your CNN on the training set and use the validation set to report performance in each iteration (**add your code here**).

To train our network, we will first have to define a loss function, an optimizer and hyperparameters that control the training process:

* [`torch.optim.AdamW`](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html?highlight=adamw#torch.optim.AdamW) is used as an optimizer with default parameters except for the learning rate which is set to `lr=3e-4`.
* [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) is used as loss function. Note, that the softmax activation is applied during loss computation, as stated in the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)
* The number of training epochs is 15

The overall training should take about 20 seconds per epoch (**on a GPU**, depending on what GPU is assigned). Reported accuracies on the training (validation) data should be approx. 99.5% (93%) after 15 training epochs.

In [None]:
###### YOUR CODE GOES HERE ######
lr = 3e-4
model = cnn
param_optimizer = model.parameters()
lossFunction = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(param_optimizer, lr=lr)

trainer = Trainer(model, lossFunction, optimizer, device)
trainer.train(trainLoader, valLoader, 15)
#################################

### Visualize the behavior of the loss and accuracy (**add your code here**)

Using the data available in `trainer.metrics`, create the following two plots:
* Training loss and validation loss as a function of epochs.  
* Training accuracy and validation accuracy as a function of epochs.  

In [None]:
##### YOUR CODE GOES HERE ######
trainLoss = trainer.metrics["epochTrainLoss"]
valLoss = trainer.metrics["epochValLoss"]
trainacc = trainer.metrics["epochTrainAccuracy"]
trainValacc = trainer.metrics["epochValAccuracy"]

# Number of epochs
epochs = range(1, len(trainLoss) + 1)

# Plotting training and validation loss
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs, trainLoss, 'b', label='Training loss')
plt.plot(epochs, valLoss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Plotting training and validation accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, trainacc, 'b', label='Training accuracy')
plt.plot(epochs, trainValacc, 'r', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()
################################

### Run your network on some images to get predictions (**PROVIDED**)

In [None]:
# Run some batches of unseen test data through the network and visualize its predictions
numBatches = 2
numRows = 8

 # Iterator through the test DataLoader
dataIter = iter(testLoader)

# for each batch
for batch in range(numBatches):

    # get images and ground truth labels
    images, labels = next(dataIter)

    # push to the device used
    images, labels = images.to(device), labels.to(device)

    # forward pass of the batch of images
    outputs = cnn(images)

    # find the index of the class with the largest output
    _, predictedLabels         = torch.max(outputs, 1)

    # print labels and outputs
    countCorrect = 0
    for i in range( batchSize // numRows ):
        for j in range((i*numRows), (i*numRows)+numRows):
            print(f'Image {j:2d} - Label: {classes[labels[j]]:5s} | Prediction: {classes[predictedLabels[j]]:5s}')
            if labels[j] == predictedLabels[j]:
                countCorrect = countCorrect + 1

    print(f"\n{(countCorrect / batchSize) * 100.0:.2f}% of test images correctly classified")

    # show images
    imshow(torchvision.utils.make_grid(images))

### Evaluate the performance on the unseen test data set.  (**add your code here**)

Use the proviced `Tester` class (see above) and test your trained network on the unseen test set available via `testLoader`. Accuracy on the unseen test dataset should be approx. 90%.

In [None]:
###### YOUR CODE GOES HERE ######
# Test a neural network model
# create a tester
tester = Tester(model, device)
# test the model
tester.test(testLoader)

#################################

## Exercise 2 - Finetuning of CNNs for Traffic Sign Classification

In this exercise you will finetune a pre-trained CNN using PyTorch on the [German Traffic Sign Recognition Benchmark](https://benchmark.ini.rub.de/gtsrb_news.html) dataset. Finetuning / transfer-learning involves using a pre-trained CNN as a backbone for feature extraction and replacing the final classification layers with custom classification layers (remember the encoder vs. decoder discussion in our lecture). PyTorch comes with a large set of [pre-trained models](https://pytorch.org/vision/stable/models.html) that have been trained on ImageNet. We will finetune a MobileNet V2 to our dataset.


Some more links for finetuning / transfer learning:
* [PyTorch transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)
* [CS231n lecture notes on transfer learning](https://cs231n.github.io/transfer-learning/)

### Redefine the dataset (**PROVIDED**)

Changes:
* 256x256 pixel input (that's what the pretrained models use)
* Decreased the batch size to 16 to save GPU VRAM
* Added the data transforms which were used when MobileNet was trained

In [None]:
# GTSRB is available as standard dataset in PyTorch. Nice :)

# Data is loaded and processed in batches of 'batchSize' images
batchSize = 16

# We can add a chain of transforms that is applied to the original data, e.g.
#    resize all images to the same dimensions, e.g. 256x256 pixels
#    convert (batch of) images to a tensor
#    normalize pixel values (to 0-1)

transform = transforms.Compose(
    [transforms.Resize((256, 256)), # resize to 256x256 pixels
     transforms.ToTensor(),         # convert to tensor. This will also normalize pixels to 0-1

     # add the transforms which were used when MobileNet was trained
     torchvision.models.MobileNet_V2_Weights.IMAGENET1K_V2.transforms(antialias=True)

     ])

# We construct several DataLoaders that take care of loading, storing, caching, pre-fetching the dataset.
# We will have one DataLoader for training, validation and test data.

# Training data
trainSet = torchvision.datasets.GTSRB(root='./data', split='train',
                                      download=True, transform=transform)
trainLoader = torch.utils.data.DataLoader(trainSet, batch_size=batchSize,
                                          shuffle=True, pin_memory=True, num_workers=2)
numTrainSamples = len(trainSet)

# Validation and test data
# GTSRB only provides a single test set. To create a validation and test set,
# we split the original GTSRB test set into two parts. The validation set is
# used to tune performance during training. The test set is only used AFTER
# training to evaluation the final performance.

gtsrbTestSet = torchvision.datasets.GTSRB(root='./data', split='test',
                                          download=True, transform=transform)

# Split the original GTSRB test data into 75% used for validation and 25% used for testing
# We do not need to shuffle the data, as we are processing every validation / test image exactly once
length75Percent = int(0.75 * len(gtsrbTestSet))
length25Percent = len(gtsrbTestSet) - length75Percent
lengths = [length75Percent, length25Percent]
valSet, testSet = torch.utils.data.random_split(gtsrbTestSet, lengths)
valLoader = torch.utils.data.DataLoader(valSet, batch_size=batchSize,
                                        shuffle=False, pin_memory=True, num_workers=2)
numValSamples = len(valSet)

testLoader = torch.utils.data.DataLoader(testSet, batch_size=batchSize,
                                         shuffle=False, pin_memory=True, num_workers=2)
numTestSamples = len(testSet)

# Available traffic sign classes in the dataset
classes = [
    "Speed limit (20km/h)",
    "Speed limit (30km/h)",
    "Speed limit (50km/h)",
    "Speed limit (60km/h)",
    "Speed limit (70km/h)",
    "Speed limit (80km/h)",
    "End of speed limit (80km/h)",
    "Speed limit (100km/h)",
    "Speed limit (120km/h)",
    "No passing",
    "No passing for vehicles over 3.5 metric tons",
    "Right-of-way at the next intersection",
    "Priority road",
    "Yield",
    "Stop",
    "No vehicles",
    "Vehicles over 3.5 metric tons prohibited",
    "No entry",
    "General caution",
    "Dangerous curve to the left",
    "Dangerous curve to the right",
    "Double curve",
    "Bumpy road",
    "Slippery road",
    "Road narrows on the right",
    "Road work",
    "Traffic signals",
    "Pedestrians",
    "Children crossing",
    "Bicycles crossing",
    "Beware of ice/snow",
    "Wild animals crossing",
    "End of all speed and passing limits",
    "Turn right ahead",
    "Turn left ahead",
    "Ahead only",
    "Go straight or right",
    "Go straight or left",
    "Keep right",
    "Keep left",
    "Roundabout mandatory",
    "End of no passing",
    "End of no passing by vehicles over 3.5 metric tons",
]

numClasses = len(classes)

### Neural Network Model Definition (**add your code here**)

To finetune an existing CNN in PyTorch the procedure is as follows:
* Create a custom network class in Python that subclasses [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module) and contains `__init__()` and `forward()`.
* Add an instance variable, e.g. `self.model` to your class and initialize this with your pre-trained model (see `model_ft` in the [PyTorch transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html))
* We will use [MobileNet V2](https://pytorch.org/vision/stable/models/mobilenetv2.html)
* Read through the [documentation of MobileNet](https://pytorch.org/vision/stable/models/generated/torchvision.models.mobilenet_v2.html#torchvision.models.mobilenet_v2) and select pre-trained weights (network parameters). Most of the time, only ImageNet is available. Some models provide pre-trained weights on different datasets. Also, make sure to figure out, what the input size of your pre-trained model is. In the case of MobileNet, the input images should have 256x256 pixels. When preparing the dataloaders (see above), check your `transforms` so that they scale the image to the correct size. This has already been prepared in the dataloader code above.  Since the input images are larger now, you might want to modify your batch size to not run out of GPU memory. Here, we will use a batch of 16 images instead of 24 in the previous assignment.
* Before using the pre-trained models, one must preprocess the image (resize with right resolution/interpolation, apply  transforms, rescale the values etc). There is no standard way to do this as it depends on how a given model was trained. It can vary across model families, variants or even weight versions. Using the correct preprocessing method is critical and failing to do so may lead to decreased accuracy or incorrect outputs. All the necessary information for the transforms of each pre-trained model is provided on its weights documentation. To simplify inference, TorchVision bundles the necessary preprocessing transforms into each model weight. These are accessible via the `weight.transforms` attribute. This has already been prepared in the dataloader code above.
* Every pre-trained model consists of a backbone network for feature extraction and some (fully connected) classifier layers on top. To replace the ImageNet classification layers present in our-pretrained model (instance variables or the model), we will need to find their variable name. This unfortunately involves a bit of guessing or browsing through the [model source code](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html). Usually, the variables are named either `classifier` or `fc` (fully connected). There is some code below to list all available layer names. Typically, the classification layer is the last one.
* Set the  `classifier` or `fc` parameter of your model to your new classification layer. If you only want to use a single classification layer, this should be a [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) layer. If you want to use a stack of layers as classifier, they need to be added to an [`nn.Sequential` container](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html). If the pre-trained model already has a stack of layers as classifier, it is usually enough to replace the final layer with your new classification layer. In any case, you will need to know the output dimension of the previous layer (same as the input dimension of your new classification layer). This is available as `in_features` of the final original classifier layer depending on the particular model. Save this first before replacing the classifier layer.
* Re-train your whole model  


In this assignment, we will replace the final MobileNet classification layer using a single [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) layer with `numClasses` output neurons.


**Setup your network!**

**Important: The output layer must not have any activation function. It will be automatically applied in the loss computation (softmax activation for CrossEntropy loss, as seen in the lecture).**


**Some hints**

In [None]:
# Load a pre-trained MobileNet model with ImageNet weights
backbone = torchvision.models.mobilenet_v2(weights=torchvision.models.MobileNet_V2_Weights.IMAGENET1K_V2)

# Print the layer names to guess what the classification layer(s) are named
print("Mobilenet layer names:")
for name, child in backbone.named_children():
    print(f"     {name}")

# Looks like "classifier" is the correct layer here.
# Let's find out the input dimension of "classifier"

print(f"The Mobilenet final classification layer has an input dimension of {backbone.classifier[1].in_features}.")



**torchinfo.summary() might also be useful here**

It shows a nn.Sequential() container layer named `classifier` which contains the 1000 neuron classification layer (1000 = number of ImageNet output classes) as second element.

```
MobileNetV2 (MobileNetV2)                     [1, 1000]                 --
├─Sequential (features)                       [1, 1280, 8, 8]           --
│    └─Conv2dNormActivation (0)               [1, 32, 128, 128]         --
│    │    └─Conv2d (0)                        [1, 32, 128, 128]         864
│    │    └─BatchNorm2d (1)                   [1, 32, 128, 128]         64
│    │    └─ReLU6 (2)                         [1, 32, 128, 128]         --
│    └─InvertedResidual (1)                   [1, 16, 128, 128]         --
│    │    └─Sequential (conv)                 [1, 16, 128, 128]         896
│    └─InvertedResidual (2)                   [1, 24, 64, 64]           --
│    │    └─Sequential (conv)                 [1, 24, 64, 64]           5,136
│    └─InvertedResidual (3)                   [1, 24, 64, 64]           --
│    │    └─Sequential (conv)                 [1, 24, 64, 64]           8,832
│    └─InvertedResidual (4)                   [1, 32, 32, 32]           --
│    │    └─Sequential (conv)                 [1, 32, 32, 32]           10,000
│    └─InvertedResidual (5)                   [1, 32, 32, 32]           --
│    │    └─Sequential (conv)                 [1, 32, 32, 32]           14,848
│    └─InvertedResidual (6)                   [1, 32, 32, 32]           --
│    │    └─Sequential (conv)                 [1, 32, 32, 32]           14,848
│    └─InvertedResidual (7)                   [1, 64, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 64, 16, 16]           21,056
│    └─InvertedResidual (8)                   [1, 64, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 64, 16, 16]           54,272
│    └─InvertedResidual (9)                   [1, 64, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 64, 16, 16]           54,272
│    └─InvertedResidual (10)                  [1, 64, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 64, 16, 16]           54,272
│    └─InvertedResidual (11)                  [1, 96, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 96, 16, 16]           66,624
│    └─InvertedResidual (12)                  [1, 96, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 96, 16, 16]           118,272
│    └─InvertedResidual (13)                  [1, 96, 16, 16]           --
│    │    └─Sequential (conv)                 [1, 96, 16, 16]           118,272
│    └─InvertedResidual (14)                  [1, 160, 8, 8]            --
│    │    └─Sequential (conv)                 [1, 160, 8, 8]            155,264
│    └─InvertedResidual (15)                  [1, 160, 8, 8]            --
│    │    └─Sequential (conv)                 [1, 160, 8, 8]            320,000
│    └─InvertedResidual (16)                  [1, 160, 8, 8]            --
│    │    └─Sequential (conv)                 [1, 160, 8, 8]            320,000
│    └─InvertedResidual (17)                  [1, 320, 8, 8]            --
│    │    └─Sequential (conv)                 [1, 320, 8, 8]            473,920
│    └─Conv2dNormActivation (18)              [1, 1280, 8, 8]           --
│    │    └─Conv2d (0)                        [1, 1280, 8, 8]           409,600
│    │    └─BatchNorm2d (1)                   [1, 1280, 8, 8]           2,560
│    │    └─ReLU6 (2)                         [1, 1280, 8, 8]           --
├─Sequential (classifier)                     [1, 1000]                 --
│    └─Dropout (0)                            [1, 1280]                 --
│    └─Linear (1)                             [1, 1000]                 1,281,000
===============================================================================================
Total params: 3,504,872
Trainable params: 3,504,872
Non-trainable params: 0
Total mult-adds (M): 392.49
===============================================================================================
Input size (MB): 0.79
Forward/backward pass size (MB): 139.57
Params size (MB): 14.02
Estimated Total Size (MB): 154.37
===============================================================================================
```

In [None]:
from torchinfo import summary
summary(backbone, input_size=(1, 3, 256, 256), row_settings=["var_names"])

In [None]:
## Define your finetuned network consisting of MobileNet backbone and our custom classification layer

###### YOUR CODE GOES HERE ######
import torch.nn as nn
import torch.nn.functional as F


class ConvNetFinetuned(nn.Module):
    pass
#################################


### Print a summary of the structure of our network using the `torchinfo` package. (**PROVIDED**)

The result should look similar to:
```
====================================================================================================
Layer (type (var_name))                            Output Shape              Param #
====================================================================================================
ConvNetFinetuned (ConvNetFinetuned)                [1, 43]                   --
├─MobileNetV2 (model)                              --                        --
│    └─Sequential (features)                       [1, 1280, 8, 8]           --
│    │    └─Conv2dNormActivation (0)               [1, 32, 128, 128]         928
│    │    └─InvertedResidual (1)                   [1, 16, 128, 128]         896
│    │    └─InvertedResidual (2)                   [1, 24, 64, 64]           5,136
│    │    └─InvertedResidual (3)                   [1, 24, 64, 64]           8,832
│    │    └─InvertedResidual (4)                   [1, 32, 32, 32]           10,000
│    │    └─InvertedResidual (5)                   [1, 32, 32, 32]           14,848
│    │    └─InvertedResidual (6)                   [1, 32, 32, 32]           14,848
│    │    └─InvertedResidual (7)                   [1, 64, 16, 16]           21,056
│    │    └─InvertedResidual (8)                   [1, 64, 16, 16]           54,272
│    │    └─InvertedResidual (9)                   [1, 64, 16, 16]           54,272
│    │    └─InvertedResidual (10)                  [1, 64, 16, 16]           54,272
│    │    └─InvertedResidual (11)                  [1, 96, 16, 16]           66,624
│    │    └─InvertedResidual (12)                  [1, 96, 16, 16]           118,272
│    │    └─InvertedResidual (13)                  [1, 96, 16, 16]           118,272
│    │    └─InvertedResidual (14)                  [1, 160, 8, 8]            155,264
│    │    └─InvertedResidual (15)                  [1, 160, 8, 8]            320,000
│    │    └─InvertedResidual (16)                  [1, 160, 8, 8]            320,000
│    │    └─InvertedResidual (17)                  [1, 320, 8, 8]            473,920
│    │    └─Conv2dNormActivation (18)              [1, 1280, 8, 8]           412,160
│    └─Sequential (classifier)                     [1, 43]                   --
│    │    └─Dropout (0)                            [1, 1280]                 --
│    │    └─Linear (1)                             [1, 43]                   55,083
====================================================================================================
Total params: 2,278,955
Trainable params: 2,278,955
Non-trainable params: 0
Total mult-adds (M): 391.27
====================================================================================================
Input size (MB): 0.79
Forward/backward pass size (MB): 139.56
Params size (MB): 9.12
Estimated Total Size (MB): 149.46
====================================================================================================
```

In [None]:
# Instatiate our neural network
cnnFinetuned = ConvNetFinetuned()

# Print a summary of the net

from torchinfo import summary
summary(cnnFinetuned, input_size=(1, 3, 256, 256), row_settings=["var_names"])

### Neural Network Training (**add your code here**)

Train your network using the `Trainer` class for 5 epochs using the same optimizer and loss as before:

* [`torch.optim.AdamW`](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html?highlight=adamw#torch.optim.AdamW) is used as an optimizer with default parameters except for the learning rate which is set to `lr=1e-4`.
* [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) is used as loss function. Note, that the softmax activation is applied during loss computation, as stated in the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)
* The number of training epochs is 3

Train your multilayer perceptron network using the `Trainer` class and provide `trainLoader` as the DataLoader for training data and `valLoader` as the DataLoader for validation data.

The overall training should take between 2-3 minutes per epoch (**on a GPU**, depending on what GPU is assigned). Reported accuracies on the training (validation) data should be > 99% (98%) after  training.   




In [None]:
##### YOUR CODE GOES HERE ######

################################

### Visualize the behavior of the loss and accuracy (**add your code here**)

Using the data available in `trainer.metrics`, create the following two plots:
* Training loss and validation loss as a function of epochs.  
* Training accuracy and validation accuracy as a function of epochs.  

In [None]:
##### YOUR CODE GOES HERE ######

################################

### Run your network on some images to get predictions (**PROVIDED**)

In [None]:
# Run some batches of unseen test data through the network and visualize its predictions
numBatches = 2
numRows = 8

 # Iterator through the test DataLoader
dataIter = iter(testLoader)

# for each batch
for batch in range(numBatches):

    # get images and ground truth labels
    images, labels = next(dataIter)

    # push to the device used
    images, labels = images.to(device), labels.to(device)

    # forward pass of the batch of images
    outputs = cnnFinetuned(images)

    # find the index of the class with the largest output
    _, predictedLabels         = torch.max(outputs, 1)

    # print labels and outputs
    countCorrect = 0
    for i in range( batchSize // numRows ):
        for j in range((i*numRows), (i*numRows)+numRows):
            print(f'Image {j:2d} - Label: {classes[labels[j]]:5s} | Prediction: {classes[predictedLabels[j]]:5s}')
            if labels[j] == predictedLabels[j]:
                countCorrect = countCorrect + 1

    print(f"\n{(countCorrect / batchSize) * 100.0:.2f}% of test images correctly classified")

    # show images
    imshow(torchvision.utils.make_grid(images))

### Evaluate the performance on the unseen test data set.  (**add your code here**)

Use the proviced `Tester` class (see above) and test your trained network on the unseen test set available via `testLoader`. Your trained network should have approximately 97% accuracy.

In [None]:
###### YOUR CODE GOES HERE ######

#################################