## **In this short assignment, you will be asked to fine-tune the pretrained VGG model to classify 37 categories of cat and dog breeds.**



Please submit your code and answers to the questions in the form of a Jupyter notebook, containing Pytorch code with explanations, along with a Markdown text explaining different parts if needed.

# Short Assignment 1: More dogs and cats!

This time, you are going to use the [Oxford-IIIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) by [O. M. Parkhi et al., 2012](http://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf) which features 12 cat breeds and 25 dogs breeds. You will need to adapt the code from lesson 1 to this new task, i.e. a classification with 37 categories.

##  Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import models,transforms,datasets
import time
%matplotlib inline

In [None]:
torch.__version__

In [None]:
import sys
sys.version

Check if GPU is available and if not change the [runtime](https://www.geeksforgeeks.org/how-to-use-google-colab/).

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

print('Using gpu: %s ' % torch.cuda.is_available())

## Downloading the data

The data given on the website [Oxford-IIIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) is made of two files: `images.tar.gz` and `annotations.tar.gz`. We first need to download and decompress these files.

Depending if you use google colab or your own computer, you can adapt the code below to choose where to store the data.

To see where you are, you can use the standard unix comands:

In [None]:
%pwd

In [None]:
%mkdir data
# the line below needs to be adapted if not running on google colab
%cd ./data/

Now that you are in the right directory, you can download the data:

In [None]:
!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz

and uncompress it:

In [None]:
!tar zxvf images.tar.gz
!tar zxvf annotations.tar.gz

Check that everything went correctly!

In [None]:
%ls

## Warning

If you are running this notebook on your own computer, you need to download the data only once. If you want to run this notebook a second time, you can safely skip this section and the section below as your dataset will be stored nicely on your computer.

If you are running this notebook on google colab, you need to download the data and to do the data wrangling each time you are running this notebook as data will be cleared once you log off.

## 1. Data wrangling

You will first need to do a bit of [data wrangling](https://en.wikipedia.org/wiki/Data_wrangling) to organize your dataset in order to use the PyTorch `dataloader`.

If you want to understand how the files are organized, have a look at the `README` file in the folder `annotations`.

First, we need to split the dataset in a test set and train/validation set. For this, we can use the files `annotations/test.txt` and `annotations/trainval.txt` containing the names of images contained in the test and train/validation sets of the original paper.

In [None]:
!head annotations/test.txt

In [None]:
!head annotations/trainval.txt

Above you see that the authors of the original paper made a partition of the dataset: `./images/Abyssinian_201.jpg` is in the test set while `./images/Abyssinian_100.jpg` is in the train/validation set and so on.

BTW, if you wonder what Abyssinian means, it is explained [here](https://en.wikipedia.org/wiki/Abyssinian_cat)

We first create two directories where we will put images form the test and trainval sets.

In [None]:
%mkdir test
%mkdir trainval

In [None]:
%ls

All the images are in the `./images/` folder and you want to store the data according to the following structure:
```bash
.
├── test
|   └── Abyssinian # contains images of Abyssinian from the test set
|   └── Bengal # contains images of Bengal from the test set
|    ...
|   └── american_bulldog # contains images of american bulldog from the test set
|    ...
├── trainval
|   └── Abyssinian # contains images of Abyssinian from the trainval set
|   └── Bengal # contains images of Bengal from the trainval set
|    ...
|   └── american_bulldog # contains images of american bulldog from the trainval set
|    ...
```

Note that all images with a name starting with a majuscule is a cat and all images with a name starting with a minuscule is a dog.

So here is one way to achieve your task: you will read the `./annotations/test.txt` file line by line; from each line, you will extract the name of the corresponding file and then copy it from the `./images/filename_##.jpg` to `./test/filename/filename_##.jpg`, where `##` is a number.

Then you'll do the same thing for `trainval.txt` file.

Below is a little piece of code to show you how to open a file and read it line by line:

In [None]:
# Open the file 'test.txt' in read mode
with open('./annotations/test.txt') as fp:
    # Read the first line from the file
    line = fp.readline()

    # Continue looping until there are no more lines to read
    while line:
        # Split the line into parts using space as the delimiter
        # This line splits the current line into parts using a space as the delimiter.
        #It expects that each line contains at least four elements.
        #The first element is assigned to the variable f,
        #while the underscores (_) are used as throwaway variables for the other three elements, meaning they are ignored.
        f, _, _, _ = line.split(' ')

        # Print the first part of the line
        print(f)

        # Read the next line from the file
        line = fp.readline()  # Read the next line to continue the loop


In order to remove the `_201` in the example above, you can use the `re` [regular expression lib](https://docs.python.org/3.6/library/re.html) as follows:

In [None]:
import re
pat = re.compile(r'_\d')
res,_ = pat.split(f)
print(res)
#this is to remove the appending digits at the end

This small piece of code is useful for creating directory

In [None]:
# create directory if it does not exist
def check_dir(dir_path):
    dir_path = dir_path.replace('//','/')
    os.makedirs(dir_path, exist_ok=True)

Some more functions that will be useful
- for moving files around you can use the `shutil` lib, see [here](https://docs.python.org/3.6/library/shutil.html#shutil.copy)
- you can use `os.path.join`
- have a look at python [f-string](https://cito.github.io/blog/f-strings/)

In [None]:
import os # import the os module
import shutil
import re # Assuming 'pat' is a compiled regex, import re

# create directory if it does not exist
def check_dir(dir_path):
    dir_path = dir_path.replace('//','/')
    os.makedirs(dir_path, exist_ok=True)
#for moving files around

In [None]:
# Here you read the ./annotations/test.txt file line by line,
# extract the name of the corresponding file
# copy the file from the ./images folder
# store it in the ./text folder at the right subfolder
path_test_dataset = 'test/'
with open('./annotations/test.txt') as fp:
    line = fp.readline()
    while line:
        f,_,_,_ = line.split(' ')
        res,_ = pat.split(f)
        path = os.path.join(path_test_dataset,res)
        check_dir(path)  # check and make directory
        shutil.copy(f'./images/{f}.jpg',os.path.join(path,f'{f}.jpg')) #here we use python f-string
        line = fp.readline()

In [None]:
# Here you do the same thing as above but for trainval data.
path_train_dataset='trainval/'
with open('./annotations/trainval.txt') as fp:
    line = fp.readline()
    while line:
        f,_,_,_ = line.split(' ')
        res,_ = pat.split(f)
        path = os.path.join(path_train_dataset,res)
        check_dir(path)
        shutil.copy(f'./images/{f}.jpg',os.path.join(path,f'{f}.jpg'))
        line = fp.readline()

## Data processing

In [None]:
%cd ..

Now you are ready to redo what we did during lesson 1.

Below, you give the path where the data is stored. If you are running this code on your computer, you should modifiy this cell.

In [None]:
data_dir = '/content/data'

`datasets` is a class of the `torchvision` package (see [torchvision.datasets](https://pytorch.org/vision/main/datasets.html)) and deals with data loading. It integrates a multi-threaded loader that fetches images from the disk, groups them in mini-batches and serves them continously to the GPU right after each _forward_/_backward_ pass through the network.

Images needs a bit of preparation before passing them throught the network. They need to have all the same size $224\times 224 \times 3$ plus some extra formatting done below by the normalize transform (explained later).

In [None]:
ls

In [None]:
cd '/content/data'

In [None]:
ls

In [None]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) #These values are used to standardize the pixel values of the images to have a mean of 0 and a standard deviation of 1.

vgg_format = transforms.Compose([
                transforms.CenterCrop(224), #This crops the central 224x224 pixels from the image. VGG models typically require input images of this size.
                transforms.ToTensor(),
                normalize,
            ])

In [None]:
dsets = {x: datasets.ImageFolder(os.path.join(data_dir, x), vgg_format)
         for x in ['trainval', 'test']}

In [None]:
os.path.join(data_dir,'trainval')

We now have 37 different classes.

In [None]:
dsets['trainval'].classes

In [None]:
dsets['trainval'].class_to_idx

In [None]:
dset_sizes = {x: len(dsets[x]) for x in ['trainval', 'test']}
dset_sizes

In [None]:
dset_classes = dsets['trainval'].classes

The `torchvision` packages allows complex pre-processing/transforms of the input data (_e.g._ normalization, cropping, flipping, jittering). A sequence of transforms can be grouped in a pipeline with the help of the `torchvision.transforms.Compose` function, see [torchvision.transforms](https://pytorch.org/vision/main/transforms.html)

**Exercise 1**: Fill in the following code cell to load the trainval data, with `batch_size` of 64, and `num_workers` of 6.

In [None]:
#your code here
loader_train = torch.utils.data.DataLoader(dsets['trainval'], batch_size=64, shuffle = True, num_workers = 6)

**Exercise 2**: Fill in the following code cell to load the test data, with batch_size of 5, and num_workers of 6.

In [None]:
#your code here
loader_valid = torch.utils.data.DataLoader(dsets['test'], batch_size=5, shuffle = True, num_workers = 6)

Check your dataloader and everything is doing fine. The following counts the number of batches in `loader_valid`, which should be 734.

In [None]:
count = 1
for data in loader_valid:
    print(count, end=',')
    if count == 1:
        inputs_try,labels_try = data
    count += 1

    #the reason why it is 734 is because there are a total of 3669 and that means if we are sending the images in groups of 5, it means there will be 734 batches

In [None]:
labels_try #labels of the first batch

The following code should output `torch.Size([5, 3, 224, 224])`.

In [None]:
inputs_try.shape #images of the first batch
# 5 is the batch-size
# 3 is the RGB
# 224 is the height and width

A small function to display images:

In [None]:
def imshow(inp, title=None):
#   Imshow for Tensor.
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = np.clip(std * inp + mean, 0,1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

In [None]:
# Make a grid from batch
out = torchvision.utils.make_grid(inputs_try)

imshow(out, title=[dset_classes[x] for x in labels_try])

In [None]:
inputs_try #just to check how the tensor looks like

In [None]:
# Get a batch of training data
inputs, classes = next(iter(loader_train))

n_images = 8

# Make a grid from batch
out = torchvision.utils.make_grid(inputs[0:n_images])

imshow(out, title=[dset_classes[x] for x in classes[0:n_images]])

## 2. Modifying VGG Model

The torchvision module comes with a zoo of popular CNN architectures which are already trained on [ImageNet](http://www.image-net.org/) (1.2M training images). When called the first time, if `pretrained=True` the model is fetched over the internet and downloaded to `~/.torch/models`.
For next calls, the model will be directly read from there.

**Exercise 3**: In the following cell, write code to load pretrained VGG16 model.

In [None]:
import torchvision.models as models # Import the necessary module

#your code here
model_vgg = models.vgg16(weights='DEFAULT')


In [None]:
VGG_total_params = sum(p.numel() for p in model_vgg.parameters())
VGG_total_params/1000000

In [None]:
!wget https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json

In [None]:
import json

fpath = '/content/data/imagenet_class_index.json'

with open(fpath) as f:
    class_dict = json.load(f)
dic_imagenet = [class_dict[str(i)][1] for i in range(len(class_dict))]

In [None]:
dic_imagenet[:4]

In [None]:
inputs_try , labels_try = inputs_try.to(device), labels_try.to(device)

model_vgg = model_vgg.to(device)

In [None]:
outputs_try = model_vgg(inputs_try)

In [None]:
outputs_try

In [None]:
outputs_try.shape #batch of 5, 1000 categories

### Modifying the last layer and setting the gradient false to all layers

In [None]:
print(model_vgg)

We'll learn about what these different blocks do later in the course. For now, it's enough to know that:

- Convolution layers are for finding small to medium size patterns in images -- analyzing the images locally
- Dense (fully connected) layers are for combining patterns across an image -- analyzing the images globally
- Pooling layers downsample -- in order to reduce image size and to improve invariance of learned features

![vgg16](https://dataflowr.github.io/notebooks/Module1/img/vgg16.png)

Here, our goal is to use the already trained model and just change the number of output classes. To this end we replace the last ```nn.Linear``` layer trained for 1000 classes to ones with 37 classes. In order to freeze the weights of the other layers during training, we set the field ```required_grad=False```. In this manner no gradient will be computed for them during backprop and hence no update in the weights. Only the weights for the 37-class layer will be updated.

PyTorch documentation for [LogSoftmax](https://pytorch.org/docs/stable/generated/torch.nn.LogSoftmax.html#logsoftmax)

In [None]:
m_softm = nn.Softmax(dim=1)
probs = m_softm(outputs_try)
vals_try,preds_try = torch.max(probs,dim=1)

In [None]:
probs

In [None]:
torch.sum(probs,1)

In [None]:
vals_try

In [None]:
print([dic_imagenet[i] for i in preds_try.data])

In [None]:
out = torchvision.utils.make_grid(inputs_try.data.cpu())

imshow(out, title=[dset_classes[x] for x in labels_try.data.cpu()])

**Exercise 4**: Write the code to modify the last layer.

In [None]:
for param in model_vgg.parameters():
    param.requires_grad = False
# your code here
model_vgg.classifier._modules['6'] = nn.Linear(4096, 37)
model_vgg.classifier._modules['7'] = torch.nn.LogSoftmax(dim = 1)

In [None]:
print(model_vgg.classifier)

**Exercise 5**: Once you modified the architecture of the network, write the code to move the model to GPU for fast training.

In [None]:
# your code here
model_vgg = model_vgg.to(device)

## Training fully connected module

### Creating loss function and optimizer

PyTorch documentation for [NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#nllloss) and the [torch.optim module](https://pytorch.org/docs/stable/optim.html#module-torch.optim)

**Exercise 6**: Write code to set the optimizer parameter so that only the linear layer just added is trained.

In [None]:
criterion = nn.NLLLoss()
lr = 0.001
# your code here
optimizer_vgg = torch.optim.SGD(model_vgg.classifier[6].parameters(),lr = lr)

### Training the model

In [None]:
# Modify the train_model function:
import torch.nn.functional as F

def train_model(model,dataloader,size,epochs=1,optimizer=None):
    model.train()

    for epoch in range(epochs):
        running_loss = 0.0
        running_corrects = 0
        for inputs,classes in dataloader:
            inputs = inputs.to(device)
            classes = classes.to(device)
            outputs = model(inputs)
            # Apply log_softmax to the outputs before calculating the loss
            log_probs = F.log_softmax(outputs, dim=1)
            loss = criterion(log_probs, classes)  # Calculate loss using log probabilities
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            _,preds = torch.max(outputs.data,1)
            # statistics
            running_loss += loss.data.item()
            running_corrects += torch.sum(preds == classes.data)
        epoch_loss = running_loss / size
        epoch_acc = running_corrects.data.item() / size
        print('Loss: {:.4f} Acc: {:.4f}'.format(
                     epoch_loss, epoch_acc))

In [None]:
%%time
train_model(model_vgg,loader_train,size=dset_sizes['trainval'],epochs=2,optimizer=optimizer_vgg)

In [None]:
def test_model(model,dataloader,size):
    model.eval()
    predictions = np.zeros(size)
    all_classes = np.zeros(size)
    all_proba = np.zeros((size,37))
    i = 0
    running_loss = 0.0
    running_corrects = 0
    #print(size)
    for inputs,classes in dataloader:
        inputs = inputs.to(device)
        classes = classes.to(device)
        outputs = model(inputs)
        loss = criterion(outputs,classes)
        _,preds = torch.max(outputs.data,1)
            # statistics
        running_loss += loss.data.item()
        running_corrects += torch.sum(preds == classes.data)
        predictions[i:i+len(classes)] = preds.to('cpu').numpy()
        all_classes[i:i+len(classes)] = classes.to('cpu').numpy()
        all_proba[i:i+len(classes),:] = outputs.data.to('cpu').numpy()
        i += len(classes)
    epoch_loss = running_loss / size
    epoch_acc = running_corrects.data.item() / size
    print('Loss: {:.4f} Acc: {:.4f}'.format(
                     epoch_loss, epoch_acc))
    return predictions, all_proba, all_classes

In [None]:
predictions, all_proba, all_classes = test_model(model_vgg,loader_valid,size=dset_sizes['test'])

In [None]:
# Get a batch of validation data
inputs, classes = next(iter(loader_valid))

out = torchvision.utils.make_grid(inputs[0:n_images])

imshow(out, title=[dset_classes[x] for x in classes[0:n_images]])

In [None]:
outputs = model_vgg(inputs[:n_images].to(device))
print(torch.exp(outputs))

**Exercise 7**: Wrte code to compute the predictions made by your network for `inputs[:n_images]` and the associated probabilities.

Hint: use `torch.max` and `torch.exp`.

Do not forget to put your inputs on the device!

In [None]:
# your code here
vals_try, preds_try = vals_try.to(device), preds_try.to(device)

In [None]:
preds_try

In [None]:
classes[:n_images]

In [None]:
torch.exp(vals_try)

What is your observation? Does the model predict well? Why?

## Speeding up the learning by precomputing features

In [None]:
def preconvfeat(dataloader):
    conv_features = []
    labels_list = []
    for data in dataloader:
        inputs,labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)
        x = model_vgg.features(inputs)
        conv_features.extend(x.data.cpu().numpy())
        labels_list.extend(labels.data.cpu().numpy())
    conv_features = np.concatenate([[feat] for feat in conv_features])
    return (conv_features,labels_list)

In [None]:
%%time
conv_feat_train,labels_train = preconvfeat(loader_train)

In [None]:
conv_feat_train.shape

In [None]:
%%time
conv_feat_valid,labels_valid = preconvfeat(loader_valid)

### Creating a new data generator

We will not load images anymore, so we need to build our own data loader.

In [None]:
dtype=torch.float
datasetfeat_train = [[torch.from_numpy(f).type(dtype),torch.tensor(l).type(torch.long)] for (f,l) in zip(conv_feat_train,labels_train)]
datasetfeat_train = [(inputs.reshape(-1), classes) for [inputs,classes] in datasetfeat_train]
loaderfeat_train = torch.utils.data.DataLoader(datasetfeat_train, batch_size=128, shuffle=True)

Now you can train for more epochs.

In [None]:
%%time
train_model(model_vgg.classifier,dataloader=loaderfeat_train,size=dset_sizes['trainval'],epochs=80,optimizer=optimizer_vgg)

In [None]:
datasetfeat_valid = [[torch.from_numpy(f).type(dtype),torch.tensor(l).type(torch.long)] for (f,l) in zip(conv_feat_valid,labels_valid)]
datasetfeat_valid = [(inputs.reshape(-1), classes) for [inputs,classes] in datasetfeat_valid]
loaderfeat_valid = torch.utils.data.DataLoader(datasetfeat_valid, batch_size=128, shuffle=False)

Now you can compute the accuracy on the test set.

In [None]:
predictions, all_proba, all_classes = test_model(model_vgg.classifier,dataloader=loaderfeat_valid,size=dset_sizes['test'])

## Confusion matrix

For 37 classes, plotting a confusion matrix is useful to see the performance of the algorithm per class.

In [None]:
from sklearn.metrics import confusion_matrix
import itertools
def make_fig_cm(cm):
    fig = plt.figure(figsize=(12,12))
    plt.imshow(cm, interpolation='nearest', cmap='Blues')
    tick_marks = np.arange(37);
    plt.xticks(tick_marks, dset_classes, rotation=90);
    plt.yticks(tick_marks, dset_classes, rotation=0);
    plt.tight_layout();
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        coeff = f'{cm[i, j]}'
        plt.text(j, i, coeff, horizontalalignment="center", verticalalignment="center", color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('Actual');
    plt.xlabel('Predicted');

In [None]:
cm = confusion_matrix(all_classes,predictions)

In [None]:
make_fig_cm(cm)

**Exercise 8**: What is your observation? Which breeds have the worst predicting performance?


Most of the classes were showing good performance but the American pitbull terrier, RagDoll and British Shorthair are some of those that were not performing excellent. The worst performing breed is Staffordshire Bull Terrier.

## Well done!