# 1. Transfer Learning - Cheat Sheet

## a. Introduction

- ***Transfer learning***:  consists of taking features learned on one problem, and leveraging them on a new, similar problem.

Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch.
There are two most common ways of doing transfer learning (the other two methods are **Joint Training** and **Learning Without Forgetting** that could be read from [Zhizhong Li's paper](https://arxiv.org/abs/1606.09282))

1. ***Feature Extraction***: 
    - What is it: 
        - Take a pretrained model, remove the last fully-connected layer, replace it by a new network ***block*** at the end. 
        - Freeze ***all*** of the previous layers. 
    - Explain in more details: 
        - When we train on new data set, we only train the new last layers, we don't need to retrain the entire model. \
        - We freeze all of the previous layers of the model but still take the activations from our frozen layers. By this, we treat these layers as as a fixed feature extractor for the new dataset since base convolutional network already contains features that are generically useful for classifying images. 

2. ***Fine Tuning***: 
    - What is it: 
        - Take a pretrained model, remove the last fully-connected layer, replace it by a new network ***block*** at the end (same with Feature Extraction).
        - Freeze ***no/some*** of the previous layers. This step means we ***fine tune all/some of the weights*** of the pretrained model. 
    - Explain in more details: 
        - The first step is the same with feature extraction. 
        - In the second step, when we fine-tune all of the weights, it is the same as training them with our new data set. But instead of training the random initialized weights, we train the weights that were pretrained with the original data set.
    - Note: 
        - When doing fine tuning some of the layers, we should 
            - **keep some of the earlier layers fixed** (due to overfitting concerns) and, 
            - only **fine-tune some higher-level portion** of the network. 
        
        This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.

## b. How to decide the type of transfer learning that we need to use

# 2. Transfer learning with Keras Tensorflow

## a. Feature Extraction

In [None]:
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
# Freeze the convolution base
# # setting the entire model's trainable flag to False will freeze all of the base_models' layers
base_model.trainable = False 
# Add a classification head
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)
prediction_layer = tf.keras.layers.Dense(1)
prediction_batch = prediction_layer(feature_batch_average)
# Now is the new model 
inputs = tf.keras.Input(shape=(160, 160, 3))
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

Compile the model

In [None]:
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

Train model

In [None]:
history = model.fit(train_dataset,
                    epochs=initial_epochs,
                    validation_data=validation_dataset)

When you set `layer.trainable = False`, the BatchNormalization layer will run in inference mode, and will not update its mean and variance statistics.

When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing `training = False` when calling the base model. Otherwise, the updates applied to the non-trainable weights will destroy what the model has learned.

## b. Fine Tuning

The same with Feature Extraction code except we don't use base_model.trainable = False

In [None]:
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
# Add a classification head
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)
prediction_layer = tf.keras.layers.Dense(1)
prediction_batch = prediction_layer(feature_batch_average)
# Now is the new model 
inputs = tf.keras.Input(shape=(160, 160, 3))
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

# 3. Transfer Learning with Pytorch

## a. Feature Extraction

Keywords: ***pretrained = True, named_parameters, requires_grad, fc, classifier, in_features, out_features, not freeze BatchNorm layers***.

In [None]:
# Create a pretrained ResNet-50 model:
transfer_model = torchvision.models.ResNet50(pretrained=True)
#
# Freeze the *ALL* layers by setting require_grad= False
# Note: you might not want to freeze the BatchNorm layers in a model
for name, param in transfer_model.named_parameters():
    if("bn" not in name):
        param.requires_grad = False
#
# Then we need to replace the final classification block with a new one that we will work with your new problem
# In this example, we replace it with a couple of Linear layers, a ReLU, and Dropout, 
# but you could have extra CNN layers here too
# Note: PyTorch stores the final classifier *block* as an instance variable, fc or classifier  
# so all we need to do is replace fc or classifier with our new structure
transfer_model.fc = nn.Sequential(
                                nn.Linear(transfer_model.fc.in_features,500),
                                nn.ReLU(),
                                nn.Dropout(), 
                                nn.Linear(500,your_new_number_of_catergories))


In [None]:
# For new pytorch document: the option pretrained = TRUE has been deprecated

# Create a pretrained ResNet-50 model: 
# # Reference for other options for weights: https://pytorch.org/vision/stable/models.html
transfer_model = torchvision.models.ResNet50(weights=ResNet50_Weights.DEFAULT)



NOTE: **you might not want to freeze the BatchNorm layers in a model, as they will be trained to approximate the mean and standard deviation of the dataset that the model was originally trained on, not the dataset that you want to fine-tune on**. Some of the signal from your data may end up being lost as corrects your input. 

## b. Fine Tuning

In [None]:
# Create a pretrained ResNet-50 model:
transfer_model = torchvision.models.ResNet50(pretrained=True)
#
transfer_model.fc = nn.Linear(model_ft.fc.in_features, 
                                # add more layers here if you want
                                number_of_your_problem_classes)

# Other methods to boost the performance of your transfer model

- Finding good learning rate: grid search, fit_one_cycle



# References

[CS231n CNN for Visual Recognition](https://cs231n.github.io/) \
[Keras Transfer learning & fine-tuning](https://keras.io/guides/transfer_learning/#do-a-round-of-finetuning-of-the-entire-model) \
[Pytorch transfer learning for computer vision tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) \
[Tensorflow Transfer learning and fine-tuning](https://www.tensorflow.org/tutorials/images/transfer_learning) \

Others (Pytorch): \
[https://androidkt.com/modify-pre-train-pytorch-model-for-finetuning-and-feature-extraction/](https://androidkt.com/modify-pre-train-pytorch-model-for-finetuning-and-feature-extraction/) \
[https://pyimagesearch.com/2021/10/11/pytorch-transfer-learning-and-image-classification/](https://pyimagesearch.com/2021/10/11/pytorch-transfer-learning-and-image-classification/) \
[https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html) \
[https://harinramesh.medium.com/transfer-learning-in-pytorch-f7736598b1ed](https://harinramesh.medium.com/transfer-learning-in-pytorch-f7736598b1ed) \
[https://androidkt.com/modify-pre-train-pytorch-model-for-finetuning-and-feature-extraction/](https://androidkt.com/modify-pre-train-pytorch-model-for-finetuning-and-feature-extraction/) \
[https://www.pluralsight.com/guides/expediting-deep-learning-with-transfer-learning:-pytorch-playbook](https://www.pluralsight.com/guides/expediting-deep-learning-with-transfer-learning:-pytorch-playbook) \
[http://seba1511.net/tutorials/beginner/transfer_learning_tutorial.html](http://seba1511.net/tutorials/beginner/transfer_learning_tutorial.html)

# Need to read

[https://www.learndatasci.com/tutorials/hands-on-transfer-learning-keras/](https://www.learndatasci.com/tutorials/hands-on-transfer-learning-keras/)

