# Transfer Learning with VGG, Inception and ResNet

In this lab, you will continue exploring transfer learning. You've already explored feature extraction with Alexnet and TensorFlow. Next, you will use Keras to explore feature extraction with the VGG, Inception and ResNet architectures. The models you will use were trained for days or weeks on the [ImageNet dataset](http://www.image-net.org/). Thus, the weights escapsulate higher-level features learned from thousands of classes.

We'll use two datasets in this lab:

1. [German Traffic Sign Dataset](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset)
2. [Cifar10](https://www.cs.toronto.edu/~kriz/cifar.html)

How will the pretrained model perform on the new datasets?

In [1]:
from keras.applications.resnet50 import ResNet50
from keras.applications.inception_v3 import InceptionV3
from keras.applications.vgg16 import VGG16
from keras.layers import Dense, Flatten, Input
from keras.models import Model
from keras.datasets import cifar10
from skimage.transform import resize
import numpy as np

Using TensorFlow backend.


In [16]:
# Load data functions
traffic_training_file = ''
traffic_testing_file = ''

def load_traffic():
    with open(traffic_training_file, mode='rb') as f:
        train = pickle.load(f)
    with open(traffic_testing_file, mode='rb') as f:
        test = pickle.load(f)
    return train['features'], train['labels'], test['features'], test['labels']


# NOTE: it will take a while on first use since Keras will download the dataset
def load_cifar10():
    (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    return X_train, y_train, X_test, y_test

# Resizing all at once will take up too much data, so we resize it per batch
# VGG, Inception, and ResNet expect a (224, 224, 3) input
def gen_and_resize_data(data, labels, batch_size, size=(224, 224)):
    def _f():
        start = 0
        end = start + batch_size
        n = data.shape[0]
        while True:
            X_batch_old, y_batch = data[start:end], labels[start:end]
            X_batch = []
            for i in range(X_batch_old.shape[0]):
                img = resize(X_batch_old[i, ...], size)
                X_batch.append(img)

            X_batch = np.array(X_batch)
            start += batch_size
            end += batch_size
            if start >= n:
                start = 0
                end = batch_size

            yield (X_batch, y_batch)
    return _f

## Feature Extraction

Before you try feature extraction on pretrained models it's a good idea to take a moment and run the classifier you used in the Traffic Sign project on the Cifar10 dataset. Cifar10 images are also (32, 32, 3) so the only thing you'll need to change is the number of classes to 10 instead of 43.

Cool, now you have something to compare the Cifar10 feature extraction results with!

Keep in mind the following as you experiment:

_Does feature extraction outperform the Traffic Signs classifier on the Cifar10 dataset? Why?_

_Does feature extraction outperform the Traffic Signs classifier on the Traffic Signs dataset? Why?_



In [None]:
# load and preprocess data
X_train, y_train, X_test, y_test = load_cifar10()
# X_train, y_train, X_test, y_test = load_traffic()

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# 0-255 -> 0-1
X_train /= 255
X_test /= 255

In [15]:
# constants
nb_epochs = 3
batch_size = 32
nb_classes = 10 # NOTE: change this to 43 if using traffic sign data

# define model
input_tensor = Input(shape=(224, 224, 3))

# NOTE: It will take a while on the first use since Keras will download the weights for the model
# pretrained_model = VGG16(input_tensor=input_tensor, include_top=False, weights='imagenet')
# pretrained_model = InceptionV3(input_tensor=input_tensor, include_top=False, weights='imagenet')
pretrained_model = ResNet50(input_tensor=input_tensor, include_top=False, weights='imagenet')

# NOTE: feel free to change this
x = pretrained_model.output
x = Flatten()(x)
x = Dense(nb_classes, activation='softmax')(x)
model = Model(pretrained_model.input, x)

# freeze pretrained model layers
for layer in pretrained_model.layers:
    layer.trainable = False

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [None]:
# train the model
train_gen = gen_and_resize_data(X_train, y_train)
test_gen = gen_and_resize_data(X_test, y_test)
model.fit_generator(
    train_gen(),
    X_train.shape[0],
    nb_epoch,
    nb_val_samples=X_test.shape[0],
    validation_data=test_gen())

In [18]:
for i, l in enumerate(pretrained_model.layers):
    print(i, l)

0 <keras.engine.topology.InputLayer object at 0x14f487208>
1 <keras.layers.convolutional.ZeroPadding2D object at 0x14f487128>
2 <keras.layers.convolutional.Convolution2D object at 0x14f487630>
3 <keras.layers.normalization.BatchNormalization object at 0x14f488f28>
4 <keras.layers.core.Activation object at 0x154547d30>
5 <keras.layers.pooling.MaxPooling2D object at 0x154756828>
6 <keras.layers.convolutional.Convolution2D object at 0x14f756fd0>
7 <keras.layers.normalization.BatchNormalization object at 0x14f774dd8>
8 <keras.layers.core.Activation object at 0x14f949630>
9 <keras.layers.convolutional.Convolution2D object at 0x14ad00f60>
10 <keras.layers.normalization.BatchNormalization object at 0x14ad29e80>
11 <keras.layers.core.Activation object at 0x14ad46668>
12 <keras.layers.convolutional.Convolution2D object at 0x14ad4bda0>
13 <keras.layers.convolutional.Convolution2D object at 0x14af37978>
14 <keras.layers.normalization.BatchNormalization object at 0x14af0ff28>
15 <keras.layers.norm

## Summary

By now you should have a good feel for feature extraction and when it might be a good choice. To end this lab, let's summarize when we should consider:

1. Feature extraction (train only the top-level of the network, the rest of the network remains fixed)
2. Finetuning (train the entire network end-to-end, start with pretrained weights)
3. Training from scratch (train the entire network end-to-end, start from random weights)

**Consider feature extraction when ...**

If dataset is small and similar to the original dataset. The higher-level features learned from the original dataset should be relevant to the new dataset.

**Consider finetuning when ...** 

If the dataset is large and similar to the original dataset. In this case we should be much more confident we won't overfit so it should be safe to alter the original weights.

If the dataset is small and very different from the original dataset. You could also make the case for training from scratch. If we choose to finetune it might be a good idea to only use features found earlier on in the network, features found later might be too dataset specific.

**Consider training from scratch when ...**

If the dataset is large and very different from the original dataset. In this case we have enough data to confidently train from scratch. However, even in this case it might be more beneficial to finetune and the entire network from pretrained weights.

---

Most importantly, keep in mind for a lot of problems you won't need an architecture as complicated and powerful as VGG, Inception, or ResNet. These architectures were made for the task of classifying thousands of complex classes. A much smaller network might be a much better fit for your problem, especially if you can comfortably train it on moderate hardware.