# Transfer Learning

The idea of transfer learning is to grab one of the high quality networks trained for a similar problem and tweak it to work well for the problem at hand. You might be asking, what is a high quality network or model. Well it depends but generally speaking if you have an image recognition problem there is a yearly competition called ImageNet and winner models are published. For other similar domains many models are already hosted in places like Keras or TensorFlow GitHub repos. Keras itself has a few models wrapped into easy to use classes and well documented. Please have a look on [Keras Applications](https://keras.io/applications/). Many other amazing models can be found in [TensorFlow repo](https://github.com/tensorflow/models).

In [24]:
import os
import sys
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator

import numpy as np
import math

# fixed random seed to have consistent results
np.random.seed(123)

train_dir = "data/train"
val_dir = "data/test"
epochs = 5
batch_size = 30
nb_train_samples = 3000
nb_validation_samples = 300
img_width, img_height = 299, 299 # fixed size for InceptionV3

The idea here is simple. Load one high quality pre-trained network and remove last layer that is supposed to do final prediction. Replace that layer with a final layer doing the prediction for our two classes and train only the weights for the added layer. As long as you are using a network trained for similar problem like recognising animals or birds, the first layers before prediction are already trained to understand features and representations that would probably apply well to dogs and cats.

In [25]:
# data prep
train_datagen =  ImageDataGenerator(
  preprocessing_function=preprocess_input,
  rotation_range=30,
  width_shift_range=0.2,
  height_shift_range=0.2,
  shear_range=0.2,
  zoom_range=0.2,
  horizontal_flip=True
)

test_datagen = ImageDataGenerator(
  preprocessing_function=preprocess_input,
  rotation_range=30,
  width_shift_range=0.2,
  height_shift_range=0.2,
  shear_range=0.2,
  zoom_range=0.2,
  horizontal_flip=True
)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size, class_mode='binary'
)

validation_generator = test_datagen.flow_from_directory(
    val_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size, class_mode='binary'
)

Found 3000 images belonging to 2 classes.
Found 300 images belonging to 2 classes.


In the next step, we will grab Inception V3 network with its pre-trained weights simply remove the last layer and add our own last `Dense` layer.

In [26]:
# setup model
base_model = InceptionV3(weights='imagenet', include_top=False) # include_top=False excludes final fully connected layer

x = base_model.output
x = GlobalAveragePooling2D()(x) # this layer prevents overfitting and generally recommended between conv layers and dense ones
x = Dense(1024, activation='relu')(x) #new FC layer
prediction = Dense(1, activation='sigmoid')(x) # new sigmoid layer
model = Model(inputs=base_model.input, outputs=prediction)

If you would like to imagine the depth and complexity of current network, here you go. 
I guess you will appreciate the idea of transfer learning by the end of this exercise.

In [27]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            (None, None, None, 3 0                                            
__________________________________________________________________________________________________
conv2d_283 (Conv2D)             (None, None, None, 3 864         input_4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_283 (BatchN (None, None, None, 3 96          conv2d_283[0][0]                 
__________________________________________________________________________________________________
activation_283 (Activation)     (None, None, None, 3 0           batch_normalization_283[0][0]    
__________________________________________________________________________________________________
conv2d_284

In [28]:
# Here, we just mark all layers other the the one(s) we added as non trainable. 
# This is actually recommended and it should speed up training.
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
    layer.trainable = False
    
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [29]:
# Let us do the training
model.fit_generator(
    generator = train_generator,
    epochs = epochs,
    steps_per_epoch = math.ceil(nb_train_samples / batch_size),    
    validation_data = validation_generator,
    validation_steps = math.ceil(nb_validation_samples/batch_size))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fdc72e965f8>

**
Wow, ~ 98% accuracy !!
I think this will do the job for our demo.
**

Next we just need to save the current model and test its accuracy over test dataset.

In [30]:
model.save("dogs-vs-cats-transfer-learning.h5")

In [31]:
test_datagen1 = ImageDataGenerator(
  preprocessing_function=preprocess_input 
)

test_generator1 = test_datagen1.flow_from_directory(val_dir, 
                                                    target_size=(img_width, img_height), 
                                                    batch_size=batch_size,  class_mode='binary')

test_loss, test_acc = model.evaluate_generator(test_generator1, steps=50)
print('test acc:', test_acc)
print('test loss:', test_loss)

Found 300 images belonging to 2 classes.
test acc: 0.9933333289623261
test loss: 0.04084457136457786
