# Transfer Learning

There are two primary types of transfer learning from a pre-trained CNN model:

* Feature Extraction
* Fine Tuning

### Pre-Trained Model

In [6]:
from torchvision import models
from torch import nn
from torchinfo import summary

# load ResNet50 model as feature extractor
model = models.resnet50(pretrained=True)

summary(model, (1, 3, 1024, 1024), row_settings=('depth', 'var_names'), depth=2)

Layer (type (var_name):depth-idx)                  Output Shape              Param #
ResNet                                             --                        --
├─Conv2d (conv1): 1-1                              [1, 64, 512, 512]         9,408
├─BatchNorm2d (bn1): 1-2                           [1, 64, 512, 512]         128
├─ReLU (relu): 1-3                                 [1, 64, 512, 512]         --
├─MaxPool2d (maxpool): 1-4                         [1, 64, 256, 256]         --
├─Sequential (layer1): 1-5                         [1, 256, 256, 256]        --
│    └─Bottleneck (0): 2-1                         [1, 256, 256, 256]        75,008
│    └─Bottleneck (1): 2-2                         [1, 256, 256, 256]        70,400
│    └─Bottleneck (2): 2-3                         [1, 256, 256, 256]        70,400
├─Sequential (layer2): 1-6                         [1, 512, 128, 128]        --
│    └─Bottleneck (0): 2-4                         [1, 512, 128, 128]        379,392
│    └─Bottlen

## Feature Extraction

Transfer learning means **retraining the final layer** of a deep network. Not only is this useful for solving problems with **limited training examples**, but also when you don't have adequate **computing resources** to train a network from scratch. 

However, if you have sufficient data, adapting weights via transfer learning is not preferable because the features that were extracted from the original training process are unlikely to be ideal for another application.

Feature extraction in the context of a **CNN** is not necessarily an explicit process, rather a sort of high-level product of the training process. Feature extraction refers to the portion of the training process by which a CNN learns to map input space to a latent space that can subsequently be used for classification via the final layer. 

In other words, the hidden layers learn discriminatory features in the form of weight-adjusted convolutional filters. Thus the term "feature extraction" generally refers to the portion of the training process that occurs before the final layer. So it is not part of transfer learning in which only the last layer is trained.

### Create Model for Feature Extraction

In [4]:
from torchvision import models
from torch import nn
from torchinfo import summary

# load ResNet50 model as feature extractor
model = models.resnet50(pretrained=True)

# freeze parameters to non-trainable (by default they are trainable)
for param in model.parameters():
    param.requires_grad = False

# append a new classification top to our feature extractor and pop it on to the current device
num_features = model.fc.in_features
num_classes = 5
model.fc = nn.Linear(num_features, num_classes)

summary(model, (1, 3, 1024, 1024), row_settings=('depth', 'var_names'), depth=2)

Layer (type (var_name):depth-idx)                  Output Shape              Param #
ResNet                                             --                        --
├─Conv2d (conv1): 1-1                              [1, 64, 512, 512]         (9,408)
├─BatchNorm2d (bn1): 1-2                           [1, 64, 512, 512]         (128)
├─ReLU (relu): 1-3                                 [1, 64, 512, 512]         --
├─MaxPool2d (maxpool): 1-4                         [1, 64, 256, 256]         --
├─Sequential (layer1): 1-5                         [1, 256, 256, 256]        --
│    └─Bottleneck (0): 2-1                         [1, 256, 256, 256]        (75,008)
│    └─Bottleneck (1): 2-2                         [1, 256, 256, 256]        (70,400)
│    └─Bottleneck (2): 2-3                         [1, 256, 256, 256]        (70,400)
├─Sequential (layer2): 1-6                         [1, 512, 128, 128]        --
│    └─Bottleneck (0): 2-4                         [1, 512, 128, 128]        (379,392)
│ 

## Fine Tuning

On **transfer learning** we remove the FC layer head from the pre-trained network, but this time we construct a new, freshly initialized FC layer head and place it on top of the original body of the network. 

The weights in the body of the CNN are frozen, and then we train the new layer head (typically with a very small learning rate). We may then choose to unfreeze the body of the network and train the entire network.

### Create Fine Tuned Model

In [5]:
from torchvision import models
from torch import nn
from torchinfo import summary

# load ResNet50 model for fine tuning
model = models.resnet50(pretrained=True)

num_features = model.fc.in_features

# loop over the modules of the model and set the parameters of batch normalization modules as not trainable
for module, param in zip(model.modules(), model.parameters()):
    if isinstance(module, nn.BatchNorm2d):
        param.requires_grad = False

# define the network head and attach it to the model
num_classes = 5
model.fc = nn.Sequential(
    nn.Linear(num_features, 512),
    nn.ReLU(),
    nn.Dropout(0.25),
    nn.Linear(512, 256),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(256, num_classes)
)

summary(model, (1, 3, 1024, 1024), row_settings=('depth', 'var_names'), depth=2)

Layer (type (var_name):depth-idx)                  Output Shape              Param #
ResNet                                             --                        --
├─Conv2d (conv1): 1-1                              [1, 64, 512, 512]         9,408
├─BatchNorm2d (bn1): 1-2                           [1, 64, 512, 512]         128
├─ReLU (relu): 1-3                                 [1, 64, 512, 512]         --
├─MaxPool2d (maxpool): 1-4                         [1, 64, 256, 256]         --
├─Sequential (layer1): 1-5                         [1, 256, 256, 256]        --
│    └─Bottleneck (0): 2-1                         [1, 256, 256, 256]        75,008
│    └─Bottleneck (1): 2-2                         [1, 256, 256, 256]        70,400
│    └─Bottleneck (2): 2-3                         [1, 256, 256, 256]        70,400
├─Sequential (layer2): 1-6                         [1, 512, 128, 128]        --
│    └─Bottleneck (0): 2-4                         [1, 512, 128, 128]        379,392
│    └─Bottlen