# Activity 2.2 - Transfer Learning

## Objective(s):

This activity aims to introduce how to apply transfer learning

## Intended Learning Outcomes (ILOs):
* Demonstrate how to build and train neural network
* Demonstrate how to apply transfer learning in neural network


## Resources:
* Jupyter Notebook
* CIFAR-10 dataset

## Procedures

Load the necessary libraries

In [1]:
from __future__ import print_function

import datetime
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

Set the parameters

In [2]:
now = datetime.datetime.now
batch_size = 128
num_classes = 5
epochs = 5
img_rows, img_cols = 28, 28
filters = 32
pool_size = 2
kernel_size = 3

Set how the input data is loaded

In [3]:
if K.image_data_format() == 'channels_first':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

* Write a function to include all the training steps.
* Use the model, training set, test set and number of classes as function parameters


In [4]:
def train_model(model, train, test, num_classes):
    x_train = train[0].reshape((train[0].shape[0],) + input_shape)
    x_test = test[0].reshape((test[0].shape[0],) + input_shape)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(train[1], num_classes)
    y_test = keras.utils.to_categorical(test[1], num_classes)

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    t = now()
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    print('Training time: %s' % (now() - t))

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

Shuffle and split the data between train and test sets

In [5]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()


Create two datasets
* one with digits below 5
* one with 5 and above

In [6]:
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

* Define the feature layers that will used for transfer learning
* Freeze these layers during fine-tuning process

In [None]:
feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape=input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),
]

Define the classification layers

In [8]:
classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(num_classes),
    Activation('softmax')
]

Create a model by combining the feature layers and classification layers

In [9]:
model = Sequential(feature_layers + classification_layers)

Check the model summary

In [10]:
model.summary()

 Train the  model on the digits 5,6,7,8,9

In [11]:
train_model(model,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 127ms/step - accuracy: 0.1909 - loss: 1.6126 - val_accuracy: 0.2096 - val_loss: 1.5873
Epoch 2/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 137ms/step - accuracy: 0.2457 - loss: 1.5909 - val_accuracy: 0.3746 - val_loss: 1.5634
Epoch 3/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 128ms/step - accuracy: 0.3114 - loss: 1.5670 - val_accuracy: 0.5550 - val_loss: 1.5382
Epoch 4/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 123ms/step - accuracy: 0.3852 - loss: 1.5449 - val_accuracy: 0.6715 - val_loss: 1.5105
Epoch 5/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 119ms/step - accuracy: 0.4519 - loss: 1.5174 - val_accuracy: 0.7365 - val_loss: 1.4794
Training time: 0:02:45.653644
Test score: 1.4794206619262695
Test accuracy: 0.7364739775657654


Freeze only the feature layers

In [12]:
for l in feature_layers:
    l.trainable = False

Check again the summary and observe the parameters from the previous model

In [13]:
model.summary()

Train again the model using the 0 to 4 digits

In [14]:
train_model(model,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 56ms/step - accuracy: 0.2909 - loss: 1.5827 - val_accuracy: 0.4627 - val_loss: 1.5473
Epoch 2/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 51ms/step - accuracy: 0.3608 - loss: 1.5446 - val_accuracy: 0.5758 - val_loss: 1.5052
Epoch 3/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 52ms/step - accuracy: 0.4468 - loss: 1.5046 - val_accuracy: 0.6377 - val_loss: 1.4632
Epoch 4/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 50ms/step - accuracy: 0.5004 - loss: 1.4671 - val_accuracy: 0.6972 - val_loss: 1.4219
Epoch 5/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 51ms/step - accuracy: 0.5543 - loss: 1.4301 - val_accuracy: 0.7523 - val_loss: 1.3811
Training time: 0:01:05.902568
Test score: 1.3812743425369263
Test accuracy: 0.752286434173584


## Supplementary Activity
Now write code to reverse this training process. That is, you will train on the digits 0-4, and then finetune only the last layers on the digits 5-9.

In [15]:
### Train on the digits 0-4

train_model(model,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 52ms/step - accuracy: 0.5984 - loss: 1.3940 - val_accuracy: 0.7933 - val_loss: 1.3420
Epoch 2/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 48ms/step - accuracy: 0.6511 - loss: 1.3557 - val_accuracy: 0.8319 - val_loss: 1.3031
Epoch 3/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 51ms/step - accuracy: 0.6871 - loss: 1.3223 - val_accuracy: 0.8583 - val_loss: 1.2651
Epoch 4/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 49ms/step - accuracy: 0.7159 - loss: 1.2858 - val_accuracy: 0.8770 - val_loss: 1.2278
Epoch 5/5
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 50ms/step - accuracy: 0.7465 - loss: 1.2489 - val_accuracy: 0.8885 - val_loss: 1.1908
Training time: 0:01:11.013238
Test score: 1.190819263458252
Test accuracy: 0.8884997367858887


In [16]:
for layer in feature_layers:
  layer_trainable =False

In [17]:
# compile

model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [22]:
train_model(model,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5),
            num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 58ms/step - accuracy: 0.4272 - loss: 1.4919 - val_accuracy: 0.5569 - val_loss: 1.4458
Epoch 2/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 54ms/step - accuracy: 0.4688 - loss: 1.4547 - val_accuracy: 0.5933 - val_loss: 1.4130
Epoch 3/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 56ms/step - accuracy: 0.5027 - loss: 1.4276 - val_accuracy: 0.6392 - val_loss: 1.3815
Epoch 4/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 55ms/step - accuracy: 0.5356 - loss: 1.3995 - val_accuracy: 0.6741 - val_loss: 1.3510
Epoch 5/5
[1m230/230[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 60ms/step - accuracy: 0.5751 - loss: 1.3707 - val_accuracy: 0.7060 - val_loss: 1.3211
Training time: 0:01:24.542641
Test score: 1.3211255073547363
Test accuracy: 0.7060275673866272


## Conclusion

In summary, this activity shows us how to demonstrate transfer learning. Transfer learning is a technique where a model needs to be trained on one task and then used or adapted for a second related task. In this activity, we use the MNIST dataset. In the procedure, we train the numbers that are greater than or equal to 5, and then the model's test accuracy gives us 73.65%. After that, we freeze the feature layers because it allows us to reuse the feature extraction capabilities that were learned from the original or previous task without modifying them. After freezing the feature layers, we train the model with digit numbers less than 5, resulting in a test accuracy of 72.23%.

In the supplementary activity, we are asked to do a reversed training process, where we need to train the number digits 0-4 and then fine-tune the digits 5-9. The results are as follows: after training the model with number digits 0-4, the test accuracy comes up to 88.85%. After that, we freeze the feature layers and fine-tune the digits 5-9, resulting in a test accuracy of 70.60%, which is relatively low. The accuracy of digits 0-4 is relatively high because the model was trained specifically for those digits. The reason why the accuracy of digits 5-9 is lower is because the model is being fine-tuned to recognize those digits based on the features learned from the previous training.