## Transfer Learning

In [2]:
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, models, transforms
import numpy as np
import matplotlib.pyplot as plt
import time
import os
import copy

Setting up GPU (if available)

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


Defining the data setup

In [4]:
batch_size=32
num_epochs=4
lr=1e-2

In [5]:
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])

In [6]:
data_transforms = {
    "train": transforms.Compose(
        [transforms.RandomResizedCrop(size=(224, 224)),
         transforms.RandomHorizontalFlip(),
         transforms.ToTensor(),
         transforms.Normalize(mean, std)]),
    "val": transforms.Compose(
        [transforms.Resize(size=(256,256)),
         transforms.CenterCrop(size=(224,224)),
         transforms.ToTensor(),
         transforms.Normalize(mean, std)])
}

PyTorch provides the ```torch.datasets.ImageFolder``` class for loading generic image data that are categorized by the folders they are in. The ```ImageFolder``` takes the following common arguments:
- ```root```: Root directory where the data is stored.
- ```transform```: Transformations to apply to the data.
- ```target_transform```: Transformations to apply to the targets.

In [7]:
data_dir = "./data/hymenoptera_data"
sets = ["train", "val"]
image_datasets = {x: datasets.ImageFolder(
    root=os.path.join(data_dir, x),
                 transform=data_transforms[x])
                 for x in ["train", "val"]}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x],
                                            batch_size=batch_size,
                                            shuffle=True)
             for x in ["train", "val"]}

In [8]:
print(next(iter(dataloaders["train"]))[0].shape)

torch.Size([32, 3, 224, 224])


To get the classes of the images loaded using an ```ImageFolder```, use the ```classes``` attribute.

In [9]:
class_names = image_datasets["train"].classes
dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
print(class_names)

['ants', 'bees']


Defining a training function

It is possible to use a Context Manager that allows one to define the gradient updation protocol using ```torch.set_grad_enabled(bool)```.

Certain layers like Batch Normalization and Dropout need not be used at the time of prediction. To faciliate this, any model subclassed from the ```torch.nn.Module``` module can be set to evaluation mode using ```model.eval()```. This acts as switch that sets the corresponding layers to evaluation mode to generate proper results.

PyTorch allows for adaptation of the learning rate. Various methods of adapting the learning rate can be found in ```torch.optim.lr_scheduler```. These schedulers are given the optimizer and updated at the end of each epoch using the ```.step()``` method.

**IMPORTANT: Do not follow the following method setup for creating a combined training and testing method. For a proper idea of structuring, refer to** https://cs230.stanford.edu/blog/pytorch/. **Coding examples can be found in the following GitHub repo** https://github.com/cs230-stanford/cs230-code-examples

In [30]:
def train_model(model, objective, optimizer, scheduler, num_epochs = 25):
    since = time.time()
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    for epoch in range(num_epochs):
        print(f"Epoch {epoch}/{num_epochs-1}")
        
        for phase in ["train", "val"]:
            if phase == "train":
                model.train()
            else:
                model.eval()
            
            running_loss = 0.0
            running_corrects = 0
            
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                with torch.set_grad_enabled(phase == "train"):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, dim=1)
                    loss = objective(outputs, labels)
                    
                    if (phase == "train"):
                        optimizer.zero_grad()
                        loss.backward()
                        optimizer.step()
                running_loss += loss.item() * inputs.size(0)
                running_corrects += (torch.sum(preds == labels.data))
                                     
            if (phase == "train"):
                scheduler.step()
        
        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]
        
        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

        # deep copy the model
        if (phase == 'val' and epoch_acc > best_acc):
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())
            
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

        

Loading one of the given models

Standard pre-trained models can be found in ```torchvision.models```.

Loading the ResNet-18 and replacing the last layer.

In [31]:
model = models.resnet18(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, len(class_names))

In [32]:
model = model.to(device)

In [33]:
objective = nn.CrossEntropyLoss()

In [34]:
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
step_lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                       step_size=7,
                                       gamma=0.1)

In [35]:
model = train_model(model, objective, optimizer,
                   step_lr_scheduler, num_epochs=25)

Epoch 0/24
val Loss: 0.6490 Acc: 0.6471
Epoch 1/24
val Loss: 0.2880 Acc: 0.9150
Epoch 2/24
val Loss: 0.2449 Acc: 0.9150
Epoch 3/24
val Loss: 0.2494 Acc: 0.9150
Epoch 4/24
val Loss: 0.2301 Acc: 0.9216
Epoch 5/24
val Loss: 0.2346 Acc: 0.9216
Epoch 6/24
val Loss: 0.2266 Acc: 0.9216
Epoch 7/24
val Loss: 0.2197 Acc: 0.9281
Epoch 8/24
val Loss: 0.2154 Acc: 0.9281
Epoch 9/24
val Loss: 0.2133 Acc: 0.9346
Epoch 10/24
val Loss: 0.2130 Acc: 0.9346
Epoch 11/24
val Loss: 0.2143 Acc: 0.9346
Epoch 12/24
val Loss: 0.2123 Acc: 0.9346
Epoch 13/24
val Loss: 0.2134 Acc: 0.9346
Epoch 14/24
val Loss: 0.2128 Acc: 0.9346
Epoch 15/24
val Loss: 0.2126 Acc: 0.9412
Epoch 16/24
val Loss: 0.2132 Acc: 0.9346
Epoch 17/24
val Loss: 0.2097 Acc: 0.9346
Epoch 18/24
val Loss: 0.2109 Acc: 0.9346
Epoch 19/24
val Loss: 0.2118 Acc: 0.9346
Epoch 20/24
val Loss: 0.2113 Acc: 0.9346
Epoch 21/24
val Loss: 0.2107 Acc: 0.9346
Epoch 22/24
val Loss: 0.2115 Acc: 0.9346
Epoch 23/24
val Loss: 0.2111 Acc: 0.9346
Epoch 24/24
val Loss: 0.21

It is possible to freeze all the weights of a loaded model and train only the weights on added layers.

In [37]:
model_2 = models.resnet18(pretrained=True)

In [38]:
# Freezing the training weights
for param in model_2.parameters():
    param.requires_grad = False


In [44]:
num_features = model_2.fc.in_features
model_2.fc = nn.Linear(num_features, len(class_names))
optimizer_2 = torch.optim.SGD(model_2.fc.parameters(), lr=lr, momentum=0.9)
scheduler_2 = torch.optim.lr_scheduler.StepLR(optimizer_2,
                                              step_size=7, gamma=0.1)
model_2 = model_2.to(device)
model_2 = train_model(model_2, objective, optimizer_2,
                     scheduler_2, num_epochs=25)

Epoch 0/24
val Loss: 0.2490 Acc: 0.9085
Epoch 1/24
val Loss: 0.2641 Acc: 0.8889
Epoch 2/24
val Loss: 0.1585 Acc: 0.9477
Epoch 3/24
val Loss: 0.1630 Acc: 0.9477
Epoch 4/24
val Loss: 0.2208 Acc: 0.9085
Epoch 5/24
val Loss: 0.3651 Acc: 0.8758
Epoch 6/24
val Loss: 0.3627 Acc: 0.8627
Epoch 7/24
val Loss: 0.1701 Acc: 0.9412
Epoch 8/24
val Loss: 0.2799 Acc: 0.8954
Epoch 9/24
val Loss: 0.2235 Acc: 0.9085
Epoch 10/24
val Loss: 0.1790 Acc: 0.9477
Epoch 11/24
val Loss: 0.1717 Acc: 0.9477
Epoch 12/24
val Loss: 0.1843 Acc: 0.9477
Epoch 13/24
val Loss: 0.1892 Acc: 0.9412
Epoch 14/24
val Loss: 0.1886 Acc: 0.9346
Epoch 15/24
val Loss: 0.1896 Acc: 0.9412
Epoch 16/24
val Loss: 0.1884 Acc: 0.9477
Epoch 17/24
val Loss: 0.1852 Acc: 0.9477
Epoch 18/24
val Loss: 0.1841 Acc: 0.9477
Epoch 19/24
val Loss: 0.1839 Acc: 0.9412
Epoch 20/24
val Loss: 0.1832 Acc: 0.9542
Epoch 21/24
val Loss: 0.1827 Acc: 0.9542
Epoch 22/24
val Loss: 0.1760 Acc: 0.9477
Epoch 23/24
val Loss: 0.1734 Acc: 0.9477
Epoch 24/24
val Loss: 0.17