# Assignment 7: Transfer Learning


The goal of this exercise is to learn how to use pre-trained networks in transfer learning tasks.
We will make use of networks trained on ImageNet, and apply them to related problems, i.e., the classification of $10$ objects not contained in ImageNet.

## Dataset

For this exercise we use the  [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset that can be downloaded from the official website [here]({https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz}).
The dataset contains $60000$ color images of pixels size $32\times 32$ in $10$ classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck, with $6000$ images per class.

### Task 1: Data Transformation

We need to instantiate a proper `torchvision.transform` instance to create the same input structure as used for training our network.
We need to combine 4 transforms, which can be compiled from the PyTorch website: https://pytorch.org/vision/stable/models.html

1. We need to resize the image such that the shorter side has size 256.
2. We need to take the center crop of size $224\times224$ from the image.
3. We need to convert the image into a tensor (including pixel values scaling)
4. We need to normalize the pixel values with mean $(0.485, 0.456, 0.406)$ and standard deviation $(0.229, 0.224, 0.225)$.

Since we will use networks pre-trained on ImageNet, we need to perform the exact same transform as used for ImageNet testing.

In [12]:
import torch
import torchvision

# Apply it to the input image
imagenet_transform = torchvision.transforms.Compose(
    [torchvision.transforms.Resize(256), # resize the image such that the shorter side has size 256.
     torchvision.transforms.CenterCrop((224, 224)), # take the center crop of size 224*224 from the image.
     torchvision.transforms.ToTensor(), # convert the image into a tensor (including pixel values scaling)
     torchvision.transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))] # normalize the pixel values with mean (0.485, 0.456, 0.406) and sd (0.229, 0.224, 0.225).
)

### Task 2: Dataset Loading

We here use the [torchvision.datasets.CIFAR10](https://pytorch.org/vision/0.12/generated/torchvision.datasets.CIFAR10.html) dataset interface for processing images. 
You can use the `train` argument or flag to distinguish between training and test set.

This task consists of two parts:

1. Create two datasets, one for the training set, one for the test set. Use the transform defined above.
2. Once the datasets are created, create two data loaders, one for training set, one for test set. Use a proper value of the batch-size $B$.

In [13]:
trainset = torchvision.datasets.CIFAR10(
  root = "./data",
  train=True, download=True, transform=imagenet_transform
)

testset = torchvision.datasets.CIFAR10(
  root = "./data",
  train=False, download=True, transform=imagenet_transform
)

Files already downloaded and verified
Files already downloaded and verified


In [14]:
B = 16
trainloader = torch.utils.data.DataLoader(trainset, batch_size=B, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=B, shuffle=True)

### Test 1: Data Size and Types

We check that all input images are `torch.tensors` of size $3\times224\times224$ and of type `torch.float` and that all labels are of type `int`.

Note: the sanity check is only performed on the test set.

In [15]:
for x, t in testset:
  assert isinstance(x, torch.Tensor)
  assert isinstance(t, int)
  assert x.shape==(3,224,224)
  assert x.dtype==torch.float

### Task 3: Pre-trained Network Instantiation

Instantiate two pre-trained networks of type ResNet-50.

1. Freeze the feature layers of the first network.

Note: Make use the `old TorchVision Interface` to load your pre-trained network. Here is the link: https://pytorch.org/vision/0.12/models.html 

In [16]:
# instantiate the first pre-trained resnet 50 network
network_1 = torchvision.models.resnet50(pretrained=True)
# Make sure to freeze all the layers of the network.
# https://androidkt.com/pytorch-freeze-layer-fixed-feature-extractor-transfer-learning/
for param in network_1.parameters():
    param.required_grad = False

# instantiate the second pre-trained resnet 50 network (optinally)
network_2 = torchvision.models.resnet50(pretrained=True)



### Task 4: Network Implementation

We want to modify the network such that we extract the logits for the 10 classes from CIFAR-10 from the last fully-connected layer of the network.

Implement a function that:
1. Replaces the current last linear layer of the pre-trained network with a new linear layer that has $O$ units ($O$ represents the number of classes in our dataset).
2. Initialize the weights of the new linear layer using Xavier's method **(Optional)**.

Note: Use `torch.nn.init.xavier_uniform_` function to initialize the weights of the new linear layer.

In [23]:
# https://discuss.pytorch.org/t/how-to-perform-finetuning-in-pytorch/419/11

def replace_last_layer(network, O=10):
  # define a new linear layer with the input features of the last linear layer & O units
  new_layer = torch.nn.Linear(network.fc.in_features, O, bias=True)
  # initialise the weights of the new linear layer using Xavier's method
  torch.nn.init.xavier_uniform_(new_layer.weight)
  # Replace the last linear layer of the pre-trained model with the new linear layer
  network.fc = new_layer
  return network

### Test 2: Last layer dimensions

This test ensures that the function return a network having the correct number of input and output units in the last layer.

In [24]:
O = 10
for network in (network_1, network_2):
    new_model = replace_last_layer(network, O=O)
    assert new_model.fc.out_features == O
    assert new_model.fc.in_features == 2048

## Network Training
Implement a function that takes all necessary parameters to run a training on a given dataset. 
Select the optimizer to be `torch.optim.SGD` and `torch.nn.CrossEntropyLoss` as the loss function. 
The test set will be used as the validation set.

### Task 5: Training and Evaluation Loop

Implement a training loop over a specific number of epochs (10) with a learning rate of $\eta=0.001$ and momentum of $\mu = 0.9$. 
Make sure that you train on the training data only, and `not` on the validation data.
In each loop, compute and print the training loss, training accuracy, validation loss and validation accuracy. 

In [30]:
# https://www.learnpytorch.io/06_pytorch_transfer_learning/
import numpy as np

def train_eval(network, trainloader, testloader, epochs=10, eta=0.001, mu=0.9):
    # select loss function and optimizer
    loss = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(
        params=network.parameters(),
        lr=eta, momentum=mu
    )

    # instantiate the correct device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    network = network.to(device)

    # collect loss & accuracies over training & test epochs
    train_loss, train_acc = [], []
    test_loss, test_acc = [], []

    for epoch in range(epochs):
        # training process
        train_loss_epoch = []
        train_correct = 0
        for x,t in trainloader:
            # put data to device
            optimizer.zero_grad()
            x, t = x.to(device), t.to(device)
            # train
            z = network(x)
            J = loss(z, t)
            train_loss_epoch.append(J.item())
            J.backward()
            # perform parameter update
            optimizer.step()
        # print accuracies and losses for current epoch
        train_acc.append(train_correct / len(trainloader))
        train_loss.append(np.mean(train_loss_epoch))

        # testing process
        with torch.no_grad():
            loss_epoch = []
            correct = 0
            for x,t in testloader:
                # put data to device
                x,t = x.to(device), t.to(device)
                # compute validation loss
                z = network(x)
                J = loss(z, t)
                loss_epoch.append(J.item())
                # compute valication accuracy
                correct += torch.sum(torch.argmax(z, dim=1) == t).item()
            # print accuracies and losses for current epoch
            test_acc.append(correct / len(testloader))
            test_loss.append(np.mean(loss_epoch))
            
    return train_loss, train_acc, test_loss, test_acc

### Task 6: Network Fine-Tuning with Frozen Layers

Create a network that has feature layers frozen with $10$ output units. 
Fine-tune the created network on our CIFAR-10 data using the previous function.

In [31]:
network_with_frozen_layers = replace_last_layer(network_1, 10)
train_eval(network_with_frozen_layers, trainloader, testloader)

KeyboardInterrupt: 

### Task 7 (Optional): Network Fine-Tuning without Frozen Layers 

Create a network from the second pre-trained network with $10$ output units. 
Fine-tune the created network on our CIFAR-10.

Note:

  * The fine-tuning of the network can take a long time when the layers are not frozen.

In [None]:
network_normal = replace_last_layer(network_2, 10)
train_eval(network_normal, trainloader, testloader)

## Plotting

Finally, we want to plot the confusion matrix of the test set.
For this, we need to compute the predictions for all of our test samples, and the list of target values.
Finally, we can make use of the `sklearn.metrics.confusion_matrix` to compute the confusion matrix.
You can utilize `sklearn.metrics.ConfusionMatrixDisplay` for displaying the confusion matrix, or `pyplot.imshow` and adding the according labels.

Note:

  * The documentation for the confusion matrix can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
  * The interface and an example for the `ConfusionMatrixDisplay` can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html

### Task 8: Confusion Matrix Plotting

Plot the confusion matrix for the fine-tuned network with frozen layers.
Optionally, also plot the confusion matrix for the second fine-tuned network.

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# compute predictions and collect targets
predictions = ...
targets = ...

# compute confusion matrix
matrix = confusion_matrix(...)

# plot confusion matrix
...

# add axis labels if required
...

In [1]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = SVC(random_state=0)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)

In [3]:
type(clf.classes_)

numpy.ndarray

In [None]:
cm = confusion_matrix(y_test, predictions, labels=clf.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=clf.classes_)
disp.plot()
plt.show()