In [1]:
%matplotlib inline

torchvision model tuning
------------------------------

In this task you will have to run all the steps of the lab and save final model weights into `final_model.pt` file, which you will have to submit as the result of your work into Coursera Lab environment by clicking Submit Assignment button.

Part of the steps are complete, others will require you to do simple excercises.



https://pytorch.org/docs/stable/torchvision/models.html - models trained on ImageNet

First we get the pre-trained model and then use feature extraction for final layers training.

Steps:
-  get data
-  init pre-trained model
-  add new head layer, change output shape for desired dataset
-  define how we train our model (update only final layers or all)
-  train



In [2]:
from __future__ import print_function, division

import time
import os
import copy

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

import torchvision
from torchvision import datasets, models, transforms

Data
------

We will use *hymenoptera_data*: https://download.pytorch.org/tutorial/hymenoptera_data.zip.
It has 2 classes - bees and ants.
To get data we will use class ImageFolder - https://pytorch.org/docs/stable/torchvision/datasets.html#torchvision.datasets.ImageFolder

As a model we take vgg11_bn - VGG11 trained with batch normalization

Other params:
batch_size - size of the batch, num_classes - amount of different classes in data, num_epochs - how many epochs to train for, finetune - flag to determine if we train only last layers or the whole model





In [None]:
! wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
! mkdir data
! unzip hymenoptera_data.zip

In [None]:
! mv hymenoptera_data data/
! ls ./data

In [5]:
data_dir = "./data/hymenoptera_data" # path to data
num_classes = 2                      # amount of classes in new data

batch_size = 8                       # data batch size
num_epochs = 5                       # epochs count
feature_extract = True               # should we train with feature extraction (last layers fine tuning)

Model training
---------------

And helper for layers params setup

In [6]:
def train_model(model, dataloaders, criterion, optimizer, num_epochs=25):
    start = time.time()
    _hist = []
    best_model = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))

        for phase in ['train', 'val']:
            model.train() if phase == 'train' else model.eval()
            _loss, _acc = 0.0, 0

            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                optimizer.zero_grad()

                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                _loss += loss.item() * inputs.size(0)
                _acc += torch.sum(preds == labels.data)

            epoch_loss = _loss / len(dataloaders[phase].dataset)
            epoch_acc = _acc.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.6f} Acc: {:.6f}'.format(phase, epoch_loss, epoch_acc))

            if phase == 'val':
                _hist.append(epoch_acc)
                if epoch_acc > best_acc:
                  best_acc = epoch_acc
                  best_model = copy.deepcopy(model.state_dict())

    time_elapsed = time.time() - start
    print('Training finished: {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Validation: best score Accuracy: {:6f}'.format(best_acc))

    model.load_state_dict(best_model)
    return model, _hist


def set_requires_grad(model, feature_extract):
    if feature_extract:
        for param in model.parameters():
            param.requires_grad = False

Model init and update
-----------------------------------

For more details - https://pytorch.org/docs/stable/torchvision/models.html

In this block we change model's final layer. It is hard to automate this step, as each model has its own characteristics. Last layer of CNN model (usually fully conntected) has the same amount of outputs as the dataset classes. All models in torchvision are trained on Imagenet, so the size of the final layer is 1000.

Our goal - get the final layer to have the same inputs amount and change amount of outputs to satisfy the requirements of new dataset.

Important notice to distinguish retraining and feature extraction (final layers training, finetuning): in the last case we want to update only final layer (layers), meaning we ignore gradients calculation for the previous layers, in order to do so we set `required_grad=False` parameter of the layers. By default, the parameter is `True` (including newly created layer, but we do want to update it, that is why we do not set that param for the new layer).

VGG
---

More details about the model - https://arxiv.org/pdf/1409.1556.pdf

In torchvision library there is 8 versions of pre-trained VGG model with different size and batch-normalization usage. We will use VGG-11 with batch-normalization.

In model description we can see: classifier (model's head) includes final layer - Linear with 4096 input params and 1000 outputs:.
```
   (classifier): Sequential(
       ...
       (6): Linear(in_features=4096, out_features=1000, bias=True)
    )
```
We can change it by using following code:

`model.classifier[6] = nn.Linear(4096,num_classes)`

We update sixth (last) layer in classifier block of model's layers sequence.

In [None]:
def initialize_model(num_classes, feature_extract, use_pretrained=True):
    model_ft = models.vgg11_bn(pretrained=use_pretrained)
    
    set_requires_grad(model_ft, feature_extract)

    num_ftrs = model_ft.classifier[6].in_features

    model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)

    input_size = 224
    
    return model_ft, input_size

model_ft, input_size = initialize_model(num_classes, feature_extract, use_pretrained=True)
print(model_ft)

Data loader
---------

Now knowing input data params we can init data loader.
Important notice: models are trained with normalization values, details - https://pytorch.org/docs/master/torchvision/models.html





In [None]:
# Data normalization
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

image_datasets = {
    x: datasets.ImageFolder(os.path.join(data_dir, x),
                            data_transforms[x])
    for x in ['train', 'val']
}
dataloaders_dict = {
    x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
                                   shuffle=True, num_workers=4)
    for x in ['train', 'val']
}

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Init optimizer
--------------------

Now we have model and data, last thing left is to create optimizer, which will update only required params. We already specified `required_grad` param before.

We have to pass those (and only those) params into SGD for optimization




In [None]:
model_ft = model_ft.to(device)

params_to_update = model_ft.parameters() # all params by default
print("Params to update while training:")

if feature_extract:
    # only last layer, update the list
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    # all params, just output
    for name, param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

optimizer_ft = optim.SGD(params_to_update, lr=0.002, momentum=0.9)

Training and validation
--------------------------------

Now we have to determine loss function and start training process for specified amount of epochs. CPU learning might require some time (depending on the model), and learning rate could be optimized as well (as it is not optimal by default).





In [18]:
# loss function
criterion = nn.CrossEntropyLoss()

# training
# TODO: call train_model will all required params:
model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft)

Epoch 0/24


  cpuset_checked))


train Loss: 0.374047 Acc: 0.807377
val Loss: 0.144770 Acc: 0.960784
Epoch 1/24
train Loss: 0.177782 Acc: 0.926230
val Loss: 0.154126 Acc: 0.934641
Epoch 2/24
train Loss: 0.203547 Acc: 0.909836
val Loss: 0.134668 Acc: 0.947712
Epoch 3/24
train Loss: 0.231375 Acc: 0.905738
val Loss: 0.125119 Acc: 0.941176
Epoch 4/24
train Loss: 0.163230 Acc: 0.922131
val Loss: 0.109707 Acc: 0.947712
Epoch 5/24
train Loss: 0.215465 Acc: 0.930328
val Loss: 0.140373 Acc: 0.934641
Epoch 6/24
train Loss: 0.149236 Acc: 0.926230
val Loss: 0.173948 Acc: 0.928105
Epoch 7/24
train Loss: 0.174040 Acc: 0.922131
val Loss: 0.189856 Acc: 0.921569
Epoch 8/24
train Loss: 0.188477 Acc: 0.938525
val Loss: 0.176673 Acc: 0.934641
Epoch 9/24
train Loss: 0.133974 Acc: 0.938525
val Loss: 0.151183 Acc: 0.941176
Epoch 10/24
train Loss: 0.218024 Acc: 0.905738
val Loss: 0.159795 Acc: 0.934641
Epoch 11/24
train Loss: 0.193587 Acc: 0.930328
val Loss: 0.171715 Acc: 0.941176
Epoch 12/24
train Loss: 0.164147 Acc: 0.954918
val Loss: 0.15

In [19]:
torch.save(model_ft.state_dict(), 'final_model.pt')