<a href="https://colab.research.google.com/github/yandexdataschool/MLatImperial2022/blob/master/Seminars/lab_07_01_TransferLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

In [None]:
!mkdir /content/.kaggle
!cp /content/gdrive/My\ Drive/kaggle.json /content/.kaggle/
!chmod 600 /content/.kaggle/kaggle.json
!ls -l /content/.kaggle

%env KAGGLE_CONFIG_DIR=/content/.kaggle

Go to https://www.kaggle.com/c/dogs-vs-cats and accept the rules to be able to get the data.

In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle

In [None]:
!kaggle competitions download --force -c dogs-vs-cats
!unzip dogs-vs-cats.zip
!unzip train.zip

In [None]:
!ls

### Using pre-trained model

Today we're going to build and fine-tune CNN based on weights pre-trained on ImageNet: the largest image classification dataset as of now.
More about imagenet: http://image-net.org/
Setup: classify from a set of 1000 classes.

In [None]:
import scipy as sp
import scipy.misc
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

In [None]:
import requests

# class labels
LABELS_URL = 'https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json'
labels = list(requests.get(LABELS_URL).json())

In [None]:
print(f"labels len: {len(labels)}")
print(labels[:5])

### TorchVision
PyTorch has several companion libraries, one of them being [torchvision](https://github.com/pytorch/vision/tree/master/) - it contains a number of popular vision datasets, preprocessing tools and most importantly, [pre-trained models](https://github.com/pytorch/vision/tree/master/torchvision/models).

For now, we're going to use torch Inception-v3 module.

We're gonna use the inception-v3 network:
![img](https://3811644265-files.gitbook.io/~/files/v0/b/gitbook-28427.appspot.com/o/assets%2F-LK1Q5wVABDXPa7Mueaw%2F-LWJ8IPgylwd7IEfGxE2%2F-LWJHJ9-6CrYABFt1F4T%2F1_rXcdL9OV5YKlYyks9XK-wA.png?alt=media&token=de746b98-05fe-47fd-afc1-6d3a6f0f2ca6)

Let's first look at the code here: [url](https://github.com/pytorch/vision/blob/master/torchvision/models/inception.py).

![img](https://habrastorage.org/files/449/171/7f8/4491717f88c34940b67947c1bc769bcd.png)

In [None]:
from torchvision.models.inception import inception_v3

model = inception_v3(pretrained=True,      # load existing weights
                     transform_input=False, # preprocess input image the same way as in training
                     progress=True) # progress bar

model.aux_logits = False # don't predict intermediate logits (yellow layers at the bottom)
model.eval()

In [None]:
# calculate the number of (scalar) parameters:
n_parameters = 0
for parameter in model.parameters():
    n_parameters += parameter.reshape(-1).shape[0]

print(n_parameters)

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from skimage.transform import resize

### Predict class probabilities

In [None]:
!wget https://upload.wikimedia.org/wikipedia/commons/d/de/Northern_Royal_Albatross_-_Kaikorua_-_New_Zealand_%2839039196692%29.jpg -O albatross.jpg

In [None]:
img = resize(plt.imread('albatross.jpg'), (299, 299))
plt.axis("off")
plt.imshow(img)
plt.show()

img = torch.FloatTensor(img.reshape([1, 299, 299, 3]).transpose([0,3,1,2]))

probs = torch.nn.functional.softmax(model(img), dim=-1)

probs = probs.data.numpy()

top_ix = probs.ravel().argsort()[-1:-10:-1]
print ('top-10 classes are: \n [prob : class label]')
for l in top_ix:
    print ('%.4f :\t%s' % (probs.ravel()[l], labels[l].split(',')[0]))



### Having fun with pre-trained nets

In [None]:
!wget http://cdn.com.do/wp-content/uploads/2017/02/Donal-Trum-Derogar.jpeg -O img.jpg

In [None]:
img = resize(plt.imread('img.jpg')[:-100,200:-150], (299,299))
plt.imshow(img)
plt.axis("off")
plt.show()

img = torch.FloatTensor(img.reshape([1, 299, 299, 3]).transpose([0,3,1,2]))

probs = torch.nn.functional.softmax(model(img), dim=-1)

probs = probs.data.numpy()

top_ix = probs.ravel().argsort()[-1:-10:-1]
print ('top-10 classes are: \n [prob : class label]')
for l in top_ix:
    print ('%.4f :\t%s' % (probs.ravel()[l], labels[l].split(',')[0]))



# Grand-quest: Dogs Vs Cats
* original competition
* https://www.kaggle.com/c/dogs-vs-cats
* 25k JPEG images of various size, 2 classes (guess what)

### Your main objective
* In this seminar your goal is to fine-tune a pre-trained model to distinguish between the two rivaling animals
* The first step is to just reuse some network layer as features

### As before, we will use auxilary function you have seen on Monday

In [None]:
from IPython.display import clear_output
from sklearn.metrics import accuracy_score

class Logger:
  def __init__(self):
    self.train_loss_batch = []
    self.train_loss_epoch = []
    self.test_loss_batch = []
    self.test_loss_epoch = []
    self.train_batches_per_epoch = 0
    self.test_batches_per_epoch = 0
    self.epoch_counter = 0
    
    self.accuracy = []

  def fill_train(self, loss):
    self.train_loss_batch.append(loss)
    self.train_batches_per_epoch += 1

  def fill_test(self, loss):
    self.test_loss_batch.append(loss)
    self.test_batches_per_epoch += 1
    
  def fill_accuracy(self, y_true, y_pred):    
    self.accuracy.append(accuracy_score(y_true, y_pred))

  def finish_epoch(self):
    self.train_loss_epoch.append(np.mean(
        self.train_loss_batch[-self.train_batches_per_epoch:]
    ))
    self.test_loss_epoch.append(np.mean(
        self.test_loss_batch[-self.test_batches_per_epoch:]
    ))
    self.train_batches_per_epoch = 0
    self.test_batches_per_epoch = 0
    
    clear_output()
  
    print("epoch #{} \t train_loss: {:.8} \t test_loss: {:.8} \t test_acc: {:.8}".format(
              self.epoch_counter,
              self.train_loss_epoch[-1],
              self.test_loss_epoch [-1],
              self.accuracy[-1]
          ))
    
    self.epoch_counter += 1

    plt.figure(figsize=(18, 5))

    plt.subplot(1, 3, 1)
    plt.plot(self.train_loss_batch, label='train loss')
    plt.xlabel('# batch iteration')
    plt.ylabel('loss')
    plt.legend()

    plt.subplot(1, 3, 2)
    plt.plot(self.train_loss_epoch, label='average train loss')
    plt.plot(self.test_loss_epoch , label='average test loss' )
    plt.legend()
    plt.xlabel('# epoch')
    plt.ylabel('loss')
    
    plt.subplot(1, 3, 3)
    plt.plot(self.accuracy, label='test acc')
    plt.xlabel('# epoch')
    plt.ylabel('acc')
    plt.legend()    
    
    plt.show();

We also introduce new functions, they are very convinient in PyTorch, when you need to work with data, that does not fit in memory but can be easily downloaded in batches, for example, images

In [None]:
from torch.utils.data import Dataset 
from PIL import Image
from torchvision import transforms
import os

In [None]:
class PathDataset(Dataset):
    """
    This class inherits from pytorch dataset.
    It defines, how the data will be downloaded and preprocessed.
    """
    
    def __init__(self, data_paths, transform_X=None):
        self.data_paths = data_paths
        self.transform_X = transform_X
    
    def __getitem__(self, index):
        x = Image.open(self.data_paths[index])
        if self.transform_X:
            x = self.transform_X(x)
        y = "cat" in self.data_paths[index]
        return x, np.float32(y)

    def __len__(self):
        return len(self.data_paths)

In [None]:
# Define path to folder with images
train_paths = ["./train/" + name for name in os.listdir("train/")]

# Here I split val/train half and half
val_paths = train_paths[:12500]
train_paths= train_paths[12500:]

len(val_paths), len(train_paths), np.sum(["cat" in path for path in val_paths]),\
                                  np.sum(["cat" in path for path in train_paths])

Since we are going to use pretrained model we need **TO MAKE SURE** that we preprocess the data in the same way, it was done during training.

In this case, we need to

- Resize the image
- Normalise it

In [None]:
# ImageNet mean and std based on millions of images
means = np.array((0.485, 0.456, 0.406))
stds = np.array((0.229, 0.224, 0.225))

transform_X = transforms.Compose([
    transforms.Resize((299, 299)),
    transforms.ToTensor(),
    transforms.Normalize(means, stds),
])

subset_of_train = 5000
subset_of_val = 1000

# Init train dataloader
train_ds = PathDataset(train_paths[:subset_of_train], transform_X=transform_X)
train_dl = torch.utils.data.DataLoader(train_ds, 
                                              batch_size=256,
                                              shuffle=True)

# Init validation dataloader
val_ds = PathDataset(val_paths[:subset_of_val], transform_X=transform_X)
val_dl = torch.utils.data.DataLoader(val_ds, 
                                            batch_size=256,
                                            shuffle=False)

# Task 1. Use standard sklearn to train

So now, we will use loaded above Inception model and get its output. Since we do not want to have classifcation as in ImageNet, we substitute the last layer with identity.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

In [None]:
# create layer that returns unchanged input
class Identity(torch.nn.Module):

    def __init__(self):
        super().__init__()

    def forward(self, x):
        return x

In [None]:
# visualises loop progress bar
from tqdm import tqdm

# Extract outputs of InceptionNet on train dataset
model.eval()
model.fc = Identity()
model.to(device)

new_X_train, new_y_train = [], []
for (X_batch, y_batch) in tqdm(train_dl):
    with torch.no_grad():
        new_X_train.extend(model(X_batch.to(device)).detach().cpu().numpy())
        new_y_train.extend(y_batch.detach().cpu().numpy())

new_X_train = np.array(new_X_train)
new_y_train = np.array(new_y_train)        

In [None]:
# Extract outputs of InceptionNet on validation dataset
new_X_val, new_y_val = [], []
for (X_batch, y_batch) in tqdm(val_dl):
    with torch.no_grad():
        new_X_val.extend(model(X_batch.to(device)).detach().cpu().numpy())
        new_y_val.extend(y_batch.detach().cpu().numpy())
        
new_X_val = np.array(new_X_val)
new_y_val = np.array(new_y_val)

In [None]:
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(max_iter=400)
logreg.fit(new_X_train, new_y_train)

print((logreg.predict(new_X_val) == new_y_val).mean())

# Task 2. Use our backbone model (Inception) to train Head Network

In reality, when you want to apply some pretrained (large) neural network to YOUR problem you don't really have many samples to train on.

Let's say we have 1024 samples for train and 256 samples for validation.

Let's train HEAD network on some subset of your training data. 

In [None]:
class PathDataset(Dataset):
    """
    This class inherits from pytorch dataset.
    It defines, how the data will be downloaded and preprocessed.
    """
    
    def __init__(self, data_paths, transform_X=None):
        self.data_paths = data_paths
        self.transform_X = transform_X
    
    def __getitem__(self, index):
        x = Image.open(self.data_paths[index])
        if self.transform_X:
            x = self.transform_X(x)
        y = "cat" in self.data_paths[index]
        return x.to(device), torch.tensor(y).float().to(device)

    def __len__(self):
        return len(self.data_paths)

In [None]:
subset_of_train = 256
subset_of_val = 256

HEAD_train_ds = PathDataset(train_paths[:subset_of_train], transform_X=transform_X)
val_ds = PathDataset(val_paths[:subset_of_val], transform_X=transform_X)

HEAD_train_dl = torch.utils.data.DataLoader(HEAD_train_ds, 
                                              batch_size=128,
                                              shuffle=True)
val_dl = torch.utils.data.DataLoader(val_ds, 
                                            batch_size=128,
                                            shuffle=False)

Now we define our new NN head

In [None]:
class HeadNet(nn.Module):
    def __init__(self, backbone):
        super().__init__()
        
        self.backbone = backbone
        self.head = nn.Sequential(
            torch.nn.Linear(2048, 16),
            torch.nn.ELU(),
            torch.nn.Linear(16, 1)
        )

    def cache_train(self, dl_train):
        self.train_cache = []
        with torch.no_grad():
            for batch_X, batch_y in dl_train:
                self.train_cache.append((self.backbone(batch_X), batch_y.view(-1, 1)))
        

    def cache_val(self, dl_val):
        self.val_cache = []
        with torch.no_grad():
            for batch_X, batch_y in dl_val:
                self.val_cache.append((self.backbone(batch_X), batch_y.view(-1, 1)))
        
    def reset(self):
        self.train_cache_iter = iter(self.train_cache)
        self.val_cache_iter = iter(self.val_cache)
        
    def cached_forward(self, mode='train'):
        if (mode == 'train'):
            X, y = next(self.train_cache_iter)

        if (mode == 'val'):
            X, y = next(self.val_cache_iter)

        return self.head(X), y

    def forward(self, X):
        out = self.backbone(X)
        out = self.head(out)
        return out


def train_head(model, optimizer, dl_train, dl_val, criterion, n_epochs):
    logger = Logger()

    model.cache_train(dl_train)
    model.cache_val(dl_val)
    
    for i_epoch in range(n_epochs):
        model.reset()
        model.head.train()
        for _ in range(len(dl_train)):
            optimizer.zero_grad()
            out, y = model.cached_forward(mode='train')
            loss = criterion(out, y)
            loss.backward()
            optimizer.step()

            logger.fill_train(loss.item())
            
        y_true = []
        y_pred = []
        model.head.eval()

        with torch.no_grad():
            for _ in range(len(dl_val)):
                out, y = model.cached_forward(mode='val')
                loss = criterion(out, y)

                y_pred.extend(out.squeeze().detach().cpu().numpy())
                y_true.extend(y.cpu().numpy())
                logger.fill_test(loss.item())
        logger.fill_accuracy(np.array(y_true), np.array(y_pred) > 0.5)
        logger.finish_epoch()

And train it as before we did before

In [None]:
head_net = HeadNet(model).to(device)

for param in head_net.backbone.parameters():
    param.requires_grad = False

criterion = torch.nn.BCEWithLogitsLoss() # Binary Cross Entropy with log-sum-exp trick (subtracting maximum)
learning_rate = 1e-3
optimizer = torch.optim.Adam(head_net.head.parameters(), lr=learning_rate)

train_head(head_net, optimizer, HEAD_train_dl, val_dl, criterion, n_epochs=20)

Impressive right?

In [None]:
torch.save(head_net.state_dict(), "trained_head.pth")

# Task 3. Use pretrained net to define new model (Transfer Learning)

OK, now, to get even better result, one can finetune the body network as well.
This procedure is unstable and require very small learning rate and simple optimisation algo.
Also, since the body is huge, we can only work with small batch size to fit in GPU.

In [None]:
head_net = HeadNet(model).to(device)
head_net.load_state_dict(torch.load('trained_head.pth'))

In [None]:
def train_TL(model, optimizer, scheduler, dl_train, dl_val, criterion, n_epochs):
    logger = Logger()

    for i_epoch in range(n_epochs):
        model.train()
        for batch_X, batch_y in dl_train:
            optimizer.zero_grad()
            out = model(batch_X)
            loss = criterion(out, batch_y.view(-1, 1))
            loss.backward()
            optimizer.step()

            logger.fill_train(loss.item())
            
        y_true = []
        y_pred = []
        model.eval()

        with torch.no_grad():
            for batch_X, batch_y in dl_val:
                out = model(batch_X)
                loss = criterion(out, batch_y.view(-1, 1))

                y_pred.extend(out.squeeze().detach().cpu().numpy())
                y_true.extend(batch_y.cpu().numpy())
                logger.fill_test(loss.item())
        logger.fill_accuracy(np.array(y_true), np.array(y_pred) > 0.5)
        logger.finish_epoch()

In [None]:
from torch.optim.lr_scheduler import StepLR

subset_of_train = 1024

train_ds = PathDataset(train_paths[:subset_of_train], transform_X=transform_X)

train_dl = torch.utils.data.DataLoader(train_ds, 
                                              batch_size=32,
                                              shuffle=True)

for param in head_net.backbone.parameters():
    param.requires_grad = True


loss_function = torch.nn.BCEWithLogitsLoss()
learning_rate = 5e-4
optimizer = torch.optim.SGD(head_net.parameters(), lr=learning_rate)
scheduler = StepLR(optimizer, step_size=3, gamma=0.9)

In [None]:
train_TL(head_net, optimizer, scheduler, train_dl, val_dl, criterion, n_epochs=18)

### Bonus reading: [Incremental learning](https://arxiv.org/pdf/1705.04228.pdf)