# Transfer Learning

There are two primary types of transfer learning:

* Feature Extraction
* Fine Tuning

## Feature Extraction

Transfer learning means **retraining the final layer** of a deep network. Not only is this useful for solving problems with **limited training examples**, but also when you don't have adequate **computing resources** to train a network from scratch. 

However, if you have sufficient data, adapting weights via transfer learning is not preferable because the features that were extracted from the original training process are unlikely to be ideal for another application.

Feature extraction in the context of a **CNN** is not necessarily an explicit process, rather a sort of high-level product of the training process. Feature extraction refers to the portion of the training process by which a CNN learns to map input space to a latent space that can subsequently be used for classification via the final layer. 

In other words, the hidden layers learn discriminatory features in the form of weight-adjusted convolutional filters. Thus the term "feature extraction" generally refers to the portion of the training process that occurs before the final layer. So it is not part of transfer learning in which only the last layer is trained.

### Data Preparation

In [1]:
from pyimagesearch import config
from torchvision import models
from torchvision import transforms
from tqdm import tqdm
from torch import nn
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchinfo import summary
import os

# torch.multiprocessing.set_sharing_strategy('file_system')
# torch.multiprocessing.freeze_support()

train_tansforms = transforms.Compose([
    transforms.RandomResizedCrop(config.IMAGE_SIZE),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(90),
    transforms.ToTensor(),
    transforms.Normalize(mean=config.MEAN, std=config.STD)
])

val_transforms = transforms.Compose([
    transforms.Resize((config.IMAGE_SIZE, config.IMAGE_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=config.MEAN, std=config.STD)
])

train_dataset = datasets.ImageFolder(root=config.TRAIN, transform=train_tansforms)
train_loader = DataLoader(train_dataset, batch_size=config.FEATURE_EXTRACTION_BATCH_SIZE, shuffle=True, num_workers=os.cpu_count(), pin_memory=True if config.DEVICE == "cuda" else False)

val_dataset = datasets.ImageFolder(root=config.VAL, transform=val_transforms)
val_loader = DataLoader(val_dataset, batch_size=config.FEATURE_EXTRACTION_BATCH_SIZE, shuffle=False, num_workers=os.cpu_count(), pin_memory=True if config.DEVICE == "cuda" else False)


### Create Model

In [2]:
# load ResNet50 model as feature extractor
model = models.resnet50(pretrained=True)

# freeze parameters to non-trainable (by default they are trainable)
for param in model.parameters():
    param.requires_grad = False

# append a new classification top to our feature extractor and pop it on to the current device
outFeatures = model.fc.in_features
model.fc = nn.Linear(outFeatures, len(train_dataset.classes))
model = model.to(config.DEVICE)

# initialize loss function and optimizer (notice that we are only providing the parameters of the classification top to our optimizer)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=config.LR)
#optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)

summary(model, (1, 3, 1024, 1024), row_settings=('depth', 'var_names'), depth=2)

Layer (type (var_name):depth-idx)                  Output Shape              Param #
ResNet                                             --                        --
├─Conv2d (conv1): 1-1                              [1, 64, 512, 512]         (9,408)
├─BatchNorm2d (bn1): 1-2                           [1, 64, 512, 512]         (128)
├─ReLU (relu): 1-3                                 [1, 64, 512, 512]         --
├─MaxPool2d (maxpool): 1-4                         [1, 64, 256, 256]         --
├─Sequential (layer1): 1-5                         [1, 256, 256, 256]        --
│    └─Bottleneck (0): 2-1                         [1, 256, 256, 256]        (75,008)
│    └─Bottleneck (1): 2-2                         [1, 256, 256, 256]        (70,400)
│    └─Bottleneck (2): 2-3                         [1, 256, 256, 256]        (70,400)
├─Sequential (layer2): 1-6                         [1, 512, 128, 128]        --
│    └─Bottleneck (0): 2-4                         [1, 512, 128, 128]        (379,392)
│ 

### Train

In [None]:
train_steps = len(train_dataset) // config.FEATURE_EXTRACTION_BATCH_SIZE
val_steps = len(val_dataset) // config.FEATURE_EXTRACTION_BATCH_SIZE

log = {"train_loss": [], "train_acc": [], "val_loss": [], "val_acc": []}

for epoch in tqdm(range(config.EPOCHS)):

    model.train()

    total_train_loss = 0
    total_val_loss = 0
    train_correct = 0
    val_correct = 0

    # loop over the training set
    for (batch_idx, (X, y)) in enumerate(train_loader):

        (X, y) = (X.to(config.DEVICE), y.to(config.DEVICE))
        pred = model(X)            # perform a forward pass
        loss = loss_fn(pred, y)    # calculate the training loss
        loss.backward()            # calculate the gradients

        # check if we are updating the model parameters and if so update them, and zero out the previously accumulated gradients
        if (batch_idx + 2) % 2 == 0:
            optimizer.step()
            optimizer.zero_grad()

        total_train_loss += loss
        train_correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    # validation
    with torch.no_grad():
        model.eval()
        for (X, y) in val_loader:
            (X, y) = (X.to(config.DEVICE), y.to(config.DEVICE))
            pred = model(X)
            total_val_loss += loss_fn(pred, y)
            val_correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    # calculate the average training and validation loss
    avg_train_loss = total_train_loss / train_steps
    avg_val_loss = total_val_loss / val_steps

    # calculate the training and validation accuracy
    train_correct = train_correct / len(train_dataset)
    val_correct = val_correct / len(val_dataset)

    # update our training history
    log['train_loss'].append(avg_train_loss.cpu().detach().numpy())
    log['train_acc'].append(train_correct)
    log['val_loss'].append(avg_val_loss.cpu().detach().numpy())
    log['val_acc'].append(val_correct)

    # print the model training and validation information
    print(f'EPOCH: {epoch + 1}/{config.EPOCHS}')
    print(f'Train Loss: {avg_train_loss:.6f}, Train Accuracy: {train_correct:.4f}')
    print(f'Validation Loss: {avg_val_loss:.6f}, Validation Accuracy: {val_correct:.4f}')

# display the total time needed to perform the training
end_time = time.time()
print(f'Total time to train the model: {(end_time - start_time):.2f}s')


  3%|█████▊                                                                                                                                                                         | 1/30 [02:04<1:00:01, 124.18s/it]

[INFO] EPOCH: 1/30
Train loss: 1.700284, Train accuracy: 0.3406
Val loss: 2.758565, Val accuracy: 0.5036


  7%|███████████▊                                                                                                                                                                     | 2/30 [04:07<57:42, 123.64s/it]

[INFO] EPOCH: 2/30
Train loss: 1.339692, Train accuracy: 0.6149
Val loss: 1.874355, Val accuracy: 0.6752


 10%|█████████████████▋                                                                                                                                                               | 3/30 [06:10<55:32, 123.44s/it]

[INFO] EPOCH: 3/30
Train loss: 1.106096, Train accuracy: 0.7079
Val loss: 1.728146, Val accuracy: 0.7445


 13%|███████████████████████▌                                                                                                                                                         | 4/30 [08:13<53:28, 123.40s/it]

[INFO] EPOCH: 4/30
Train loss: 0.967500, Train accuracy: 0.7597
Val loss: 1.390400, Val accuracy: 0.7701


 17%|█████████████████████████████▌                                                                                                                                                   | 5/30 [10:18<51:31, 123.68s/it]

[INFO] EPOCH: 5/30
Train loss: 0.863759, Train accuracy: 0.7605
Val loss: 1.329366, Val accuracy: 0.7628


 20%|███████████████████████████████████▍                                                                                                                                             | 6/30 [12:21<49:27, 123.65s/it]

[INFO] EPOCH: 6/30
Train loss: 0.788425, Train accuracy: 0.7836
Val loss: 1.122735, Val accuracy: 0.8029


In [None]:
import matplotlib.pyplot as plt

plt.plot(log['train_loss'], label='train loss')
plt.plot(log['val_loss'], label='val loss')
plt.plot(log['train_acc'], label='train acc')
plt.plot(log['val_acc'], label='val acc')
plt.xlabel('Epoch')
plt.ylabel('Loss / Accuracy')
plt.legend()

## Fine Tuning

Transfer learning via **fine-tuning**: When applying fine-tuning, we again remove the FC layer head from the pre-trained network, but this time we construct a brand new, freshly initialized FC layer head and place it on top of the original body of the network. The weights in the body of the CNN are frozen, and then we train the new layer head (typically with a very small learning rate). We may then choose to unfreeze the body of the network and train the entire network.