<a href="https://colab.research.google.com/github/tonyscan6003/etivities/blob/main/Example_4_1_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer learning and Fine-tuning

In this example, you will learn how to perform binary classification between images of cats and dogs by using transfer learning from a pre-trained network.

This example is Adopted from the [tensorflow tutorial](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/images/transfer_learning.ipynb).
In this case the same cats & dogs dataset is loaded with the more recently developed [tensorflow datasets](https://www.tensorflow.org/datasets) API. This is a more general approach to loading datasets into tensorflow that can be instead of using the keras preprocessing utility `tf.keras.preprocessing.image_dataset_from_directory` (that is detailed in the original tutorial). Also a slightly different approach is used in this notebook for data augmentation (i.e. keras preprocessing commands not used). 

**Introduction:**
A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. You can either use the pretrained model as is or use transfer learning to customize this model to a given task.

The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can then take advantage of these learned feature maps without having to start from scratch by training a large model on a large dataset.

![Image](https://github.com/tonyscan6003/etivities/blob/main/CE6003_section3_partB-Page-14.jpg?raw=true)

In this notebook, you will try two ways to customize a pretrained model:

1. Feature Extraction: Use the representations learned by a previous network to extract meaningful features from new samples. You simply add a new classifier, which will be trained from scratch, on top of the pretrained model so that you can repurpose the feature maps learned previously for the dataset.

 You do not need to (re)train the entire model. The base convolutional network already contains features that are generically useful for classifying pictures. However, the final, classification part of the pretrained model is specific to the original classification task.

1. Fine-Tuning: Unfreeze a few of the top layers of a frozen model base and jointly train both the newly-added classifier layers and the last layers of the base model. This allows us to "fine-tune" the higher-order feature representations in the base model in order to make them more relevant for the specific task.

You will follow the general machine learning workflow.

* Examine and understand the data
* Build an input pipeline, in this case using Tensorflow datasets.
* Build the model
   * Load in the pretrained base model (and pretrained weights)
   * Stack the required classification layers on top
* Train & Fine tune the model
* Evaluate model


#House Keeping:
Import packages, set variables

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing import image_dataset_from_directory

2.3.0


In [None]:
batch_size = 64  # Batch size to use
H_trg =160       # Height of image to input to network
W_trg =160       # Width of image to input to network  

# Data download
In this example, you will use the [cats_vs_dogs dataset]( https://www.tensorflow.org/datasets/catalog/cats_vs_dogs) containing several thousand images of cats and dogs. Using the tensorflow datasets API (tfds) the dataset can be downloaded directly into a `tf.data.Dataset` which is a streaming input pipeline that avoids loading the dataset into memory and enables the use of very large datasets. 

In the cell below we initally load the dataset using tfds API. As this original dataset only has training data, we split it into train, val and test splits using `tfds.load`. Note that we don't use all the available data. For this transfer learning problem we demonstrate that only need a small amount of data with data augmentation applied to re-train the neural network. Not requiring too much data is one of the major advantages of [transfer learning](https://cs231n.github.io/transfer-learning/) over training a network to perform a task from scratch.

In [None]:

import tensorflow_datasets as tfds

train_ds = tfds.load('cats_vs_dogs', split='train[0%:20%]')
val_ds = tfds.load('cats_vs_dogs', split='train[11%:12%]')
test_ds = tfds.load('cats_vs_dogs', split='train[13%:14%]')



We have now very quickly setup raw our dataset using the tfds API. However before this data can be sent to the network for training it is necessary to preprocess it to (a) make it compatible with the network inputs (b) apply some data augmentation. In the cell below, two functions call `train_pipe` and `test_pipe` are shown. Both these functions re-size the input image and ensure it is in the correct range for the input. The train_pipe function also contains some methods from the tf.image package to perform data augmentation, including random crops of the image, flipping the image left and right. Data augmentation helps to avoid overfitting, particularly if there is small dataset.

In [None]:
# List of functions that we can apply to the dataset before sending to network for training
def test_pipe(image,label):
  image = tf.image.convert_image_dtype(image, tf.float32) # Cast and normalize the image to [0,1]
  image = tf.image.resize(image, (H_trg,W_trg), method='bilinear')
  image = image-0.5
  label = tf.cast(label, tf.float32)
  return image,label

def train_pipe(image,label):
  image = tf.image.convert_image_dtype(image, tf.float32) # Cast and normalize the image to [0,1]
  image = tf.image.resize(image, (H_trg+48,W_trg+48), method='bilinear')
  image = image-0.5
  image = tf.image.random_crop(image, size=[H_trg, W_trg, 3]) # Random crop back to H_trgxW_trg
  #image = tf.image.random_brightness(image, max_delta=0.1) # Random brightness 
  image=tf.cond(tf.random.uniform(()) < 0.25, lambda:  tf.image.flip_left_right(image), lambda: image) #Random mirroring (left/right)
  label = tf.cast(label, tf.float32)
  return image,label

The cell below contains higher level functions that use the dataset `.map` method to apply the train_pipe and test_pipe functions to the datasets. A lambda function is used to extract the image and label from the originally downloaded dataset. (Note that the original tfds dataset entries may contain lots of information, e.g. image, labels, classes, bounding boxes etc. For each problem we will likely only need to send part of this information to the network model) Also shown below is the `.batch` method, this directly allows the batch size that we want sent to the network to be setup as part of the dataset.

In [None]:
def gen_tr_datasets(src_dataset):    
    # Define Datasets 
    tr_dataset = src_dataset.map(lambda x: (x['image'],x['label']))  
    tr_dataset = tr_dataset.map(train_pipe)
    tr_dataset = tr_dataset.batch(batch_size) 
    return tr_dataset

def gen_val_datasets(src_dataset): 
    # Define Datasets 
    test_dataset = src_dataset.map(lambda x: (x['image'],x['label']))  
    test_dataset = test_dataset.map(test_pipe)
    test_dataset = test_dataset.batch(batch_size) 
    return test_dataset

# Apply pre-processing functions, creating final datasets.
train_dataset = gen_tr_datasets(train_ds)
val_dataset = gen_tr_datasets(val_ds)
test_dataset = gen_val_datasets(test_ds)



Show the train dataset.

In [None]:

i=0
n_plots = 12 # number of plots
f, axarr = plt.subplots(1,n_plots,figsize=(20,10))

for image, label in train_dataset.take(n_plots):  # Only take a single example
  axarr[i].imshow(image[0,:,:,:]+0.5)
  axarr[i].axis('off')
  axarr[i].title.set_text(str(label.numpy()[0]))
  i = i+1

## Import Base Model
We will import the [**`MobileNet V2`**](https://arxiv.org/pdf/1801.04381.pdf) model developed at Google . This model was pre-trained on the ImageNet dataset.The MobileNet V2 is small lightweight model, helping to minimise computation. This makes it a suitable for various transfer learning and other deep learning problems that can be performed within the colab environment. 

We will instantiate the MobileNet V2 model using `tf.keras.applications` pre-loaded with weights trained on ImageNet. By specifying the `include_top=False` argument, you load a network that doesn't include the fully connected layers, only the convolutional portion which is ideal for feature extraction.

In [None]:
# Create the base model from the pre-trained model MobileNet V2
IMG_SHAPE = (H_trg,W_trg) + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

As we are using the convolutional part of the model as a feature extractor, we can send any size image into the model.  This feature extractor converts each of our `160x160x3` image into a `5x5x1280` block of features. We can find out this information by sending a batch of images from the dataset through the model:

In [None]:
image_batch, label_batch = next(iter(train_dataset))
feature_batch = base_model(image_batch)
print(feature_batch.shape)

Note that we can also obtain the size of the output (i.e. last convolution) by inspection of the base model using `.summary`

In [None]:
# Let's take a look at the base model architecture
base_model.summary()

# Train with base model as Feature Extractor
In this step, we will use the base model as a Feature Extractor. Additionally, we will add a classifier on top of it and train the top-level classifier part only.
Freezing (by setting layer.trainable = False) prevents the weights in a given layer from being updated during training. MobileNet V2 has many layers, so setting the entire model's `trainable` flag to False will freeze all of them.

In [None]:
base_model.trainable = False

### Important note about BatchNormalization layers

Many models contain `tf.keras.layers.BatchNormalization` layers. This layer is a special case and precautions should be taken in the context of fine-tuning, as shown later in this tutorial. 

When you set `layer.trainable = False`, the `BatchNormalization` layer will run in inference mode, and will not update its mean and variance statistics. 

When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing `training = False` when calling the base model. Otherwise, the updates applied to the non-trainable weights will destroy what the model the model has learned.

### Add a classification backend

To generate predictions from the block of features, average over the spatial `5x5` spatial locations, using a `tf.keras.layers.GlobalAveragePooling2D` layer to convert the features to  a single 1280-element vector per image.

In [None]:
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)
print(feature_batch_average.shape)

(64, 1280)


Apply a `tf.keras.layers.Dense` layer to convert these features into a single prediction per image. You don't need an activation function here because this prediction will be treated as a `logit`, or a raw prediction value.  Positive numbers predict class 1, negative numbers predict class 0.

In [None]:
prediction_layer = tf.keras.layers.Dense(1)
prediction_batch = prediction_layer(feature_batch_average)
print(prediction_batch.shape)

(64, 1)


Build a model by chaining together the data augmentation, rescaling, base_model and feature extractor layers using the [Keras Functional API](https://www.tensorflow.org/guide/keras/functional). As previously mentioned, use training=False as our model contains a BatchNormalization layer.

In [None]:
inputs = tf.keras.Input(shape=(160, 160, 3))
x = base_model(inputs, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

### Compile the model

Compile the model before training it. Since there are two classes, use a binary cross-entropy loss with `from_logits=True` since the model provides a linear output.

In [None]:
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [None]:
model.summary()

The 2.5M parameters in MobileNet are frozen, but there are 1.2K _trainable_ parameters in the Dense layer.  These are divided between two `tf.Variable` objects, the weights and biases.

In [None]:
len(model.trainable_variables)

### Train the model

After training for 10 epochs, you should see ~94% accuracy on the validation set.


In [None]:
initial_epochs = 10

loss0, accuracy0 = model.evaluate(val_dataset)

NameError: ignored

In [None]:
print("initial loss: {:.2f}".format(loss0))
print("initial accuracy: {:.2f}".format(accuracy0))

NameError: ignored

In [None]:
history = model.fit(train_dataset,
                    epochs=initial_epochs,
                    validation_data=val_dataset)

### Learning curves

Let's take a look at the learning curves of the training and validation accuracy/loss when using the MobileNet V2 base model as a fixed feature extractor.

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

Note: If you are wondering why the validation metrics are clearly better than the training metrics, the main factor is because layers like `tf.keras.layers.BatchNormalization` and `tf.keras.layers.Dropout` affect accuracy during training. They are turned off when calculating validation loss.

To a lesser extent, it is also because training metrics report the average for an epoch, while validation metrics are evaluated after the epoch, so validation metrics see a model that has trained slightly longer.

# Fine tuning
In the feature extraction experiment, you were only training a few layers on top of an MobileNet V2 base model. The weights of the pre-trained network were **not** updated during training.

One way to increase performance even further is to train (or "fine-tune") the weights of the top layers of the pre-trained model alongside the training of the classifier you added. The training process will force the weights to be tuned from generic feature maps to features associated specifically with the dataset.

Note: This should only be attempted after you have trained the top-level classifier with the pre-trained model set to non-trainable. If you add a randomly initialized classifier on top of a pre-trained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier) and your pre-trained model will forget what it has learned.

Also, you should try to fine-tune a small number of top layers rather than the whole MobileNet model. In most convolutional networks, the higher up a layer is, the more specialized it is. The first few layers learn very simple and generic features that generalize to almost all types of images. As you go higher up, the features are increasingly more specific to the dataset on which the model was trained. The goal of fine-tuning is to adapt these specialized features to work with the new dataset, rather than overwrite the generic learning.

### Un-freeze the top layers of the model


All you need to do is unfreeze the `base_model` and set the bottom layers to be un-trainable. Then, you should recompile the model (necessary for these changes to take effect), and resume training.

In [None]:
base_model.trainable = True

In [None]:
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))

# Fine-tune from this layer onwards
fine_tune_at = 100

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable =  False

Number of layers in the base model:  155


### Compile the model

As you are training a much larger model and want to readapt the pretrained weights, it is important to use a lower learning rate at this stage. Otherwise, your model could overfit very quickly.

In [None]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer = tf.keras.optimizers.RMSprop(lr=base_learning_rate/10),
              metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
len(model.trainable_variables)

### Continue training the model

If you trained to convergence earlier, this step will improve your accuracy by a few percentage points.

In [None]:
fine_tune_epochs = 10
total_epochs =  initial_epochs + fine_tune_epochs

history_fine = model.fit(train_dataset,
                         epochs=total_epochs,
                         initial_epoch=history.epoch[-1],
                         validation_data=val_dataset)

Let's take a look at the learning curves of the training and validation accuracy/loss when fine-tuning the last few layers of the MobileNet V2 base model and training the classifier on top of it. The validation loss is much higher than the training loss, so you may get some overfitting.

You may also get some overfitting as the new training set is relatively small and similar to the original MobileNet V2 datasets.


After fine tuning the model nearly reaches 98% accuracy on the validation set.

In [None]:
acc += history_fine.history['accuracy']
val_acc += history_fine.history['val_accuracy']

loss += history_fine.history['loss']
val_loss += history_fine.history['val_loss']

In [None]:
plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.ylim([0.8, 1])
plt.plot([initial_epochs-1,initial_epochs-1],
          plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.ylim([0, 1.0])
plt.plot([initial_epochs-1,initial_epochs-1],
         plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

### Evaluation and prediction

Finaly you can verify the performance of the model on new data using test set.

In [None]:
loss, accuracy = model.evaluate(test_dataset)
print('Test accuracy :', accuracy)

And now you are all set to use this model to predict if your pet is a cat or dog.

In [None]:
#Retrieve a batch of images from the test set
image_batch, label_batch = test_dataset.as_numpy_iterator().next()
predictions = model.predict_on_batch(image_batch).flatten()

# Apply a sigmoid since our model returns logits
predictions = tf.nn.sigmoid(predictions)
predictions = tf.where(predictions < 0.5, 0, 1)
print('Predictions:\n', predictions.numpy())


i=0
n_plots = 12 # number of plots
f, axarr = plt.subplots(1,n_plots,figsize=(20,10))

for image in image_batch[0:n_plots,:,:,:]:  # Only take a single example
  axarr[i].imshow(image[:,:,:]+0.5)
  axarr[i].axis('off')
  label = ('dog' if predictions[i] == 1 else 'cat')
  axarr[i].title.set_text(label)
  i = i+1

## Summary

* **Using a pre-trained model for feature extraction**:  When working with a small dataset, it is a common practice to take advantage of features learned by a model trained on a larger dataset in the same domain. This is done by instantiating the pre-trained model and adding a fully-connected classifier on top. The pre-trained model is "frozen" and only the weights of the classifier get updated during training.
In this case, the convolutional base extracted all the features associated with each image and you just trained a classifier that determines the image class given that set of extracted features.

* **Fine-tuning a pre-trained model**: To further improve performance, one might want to repurpose the top-level layers of the pre-trained models to the new dataset via fine-tuning.
In this case, you tuned your weights such that your model learned high-level features specific to the dataset. This technique is usually recommended when the training dataset is large and very similar to the original dataset that the pre-trained model was trained on.

To learn more, visit the [Transfer learning guide](https://www.tensorflow.org/guide/keras/transfer_learning).
