This workshop will try to make you realize the impact of transfer learning, applied to images!
Let's do some classification over images: cats or dogs?

# Imports

In [None]:
import numpy as np
from matplotlib import pyplot as plt

import pandas as pd
from tqdm import tqdm
import PIL
from PIL import Image
import cv2
import tensorflow as tf

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, BatchNormalization, Dropout
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import SGD, RMSprop, Adam
from tensorflow.keras.metrics import categorical_crossentropy, categorical_accuracy
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from imageio import imread
from skimage.transform import resize
import os
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

# Dataset

Let's start by downloading our example data, a .zip of 2,000 JPG pictures of cats and dogs, and extracting it locally in /tmp.

NOTE: The 2,000 images used in this exercise are excerpted from the "Dogs vs. Cats" dataset available on Kaggle, which contains 25,000 images. Here, we use a subset of the full dataset to decrease training time for educational purposes.

In [None]:
!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip \
    -O /tmp/cats_and_dogs_filtered.zip

In [None]:
# Unzip the downloaded foler
import os
import zipfile

local_zip = "/tmp/cats_and_dogs_filtered.zip"
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')
zip_ref.close()

In [None]:
IMG_SIZE = 224
input_shape = (IMG_SIZE, IMG_SIZE)  # Dimensions of the images

DATA_DIR = "/tmp/cats_and_dogs_filtered/"

train_data_dir = DATA_DIR + "train"
validation_data_dir = DATA_DIR + "validation"

The contents of the .zip are extracted to the base directory /tmp/cats_and_dogs_filtered, which contains train and validation subdirectories for the training and validation datasets

In [None]:
#@title display utilities [RUN ME]
def plot_examples(path: str):
    """ Plot 10 images found in input path"""
    print("Images belonging to class:", path)
    listdir = os.listdir(path)
    fig, ax = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
    ax = ax.ravel()
    for idx, e in enumerate(listdir[:10]):
        img = imread(os.path.join(path, e))
        img = resize(img, input_shape)
        ax[idx].imshow(img)

def plot_training_curve(metric):
    plt.plot(range(EPOCHS), history.history[metric], label=f'{metric} training set')
    plt.plot(range(EPOCHS), history.history[f"val_{metric}"], label=f'{metric} validation set')
    plt.legend()
    plt.show()

Let's build the datasets!

In [None]:
BATCH_SIZE = 64

training_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    train_data_dir,
    labels="inferred",  # The labels will be inferred from the name of the directories /cats/ and /dogs/
    label_mode="int",
    color_mode="rgb",
    batch_size=BATCH_SIZE,
    image_size=(IMG_SIZE, IMG_SIZE),
    shuffle=True,
    seed=123,  # for reproducibility
    class_names=["cats", "dogs"]  # Must be same as the directories. Is specified in order to choose the order (0=cats, 1=dogs)
)
training_dataset.repeat()  # Just in case the nb of example is lower than batch size, it will repeat the dataset for 1 epoch.
training_dataset = training_dataset.prefetch(tf.data.AUTOTUNE) # prefetch next batch while training (autotune prefetch buffer size)
# See doc: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#prefetch

validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    validation_data_dir,
    labels="inferred",  # The labels will be inferred from the name of the directories /cats/ and /dogs/
    label_mode="int",
    color_mode="rgb",
    batch_size=BATCH_SIZE,
    image_size=(IMG_SIZE, IMG_SIZE),
    shuffle=False,
    seed=None,
    class_names=["cats", "dogs"]  # Must be same as the directories. Is specified in order to choose the order (0=cats, 1=dogs)
)
validation_dataset = validation_dataset.prefetch(tf.data.AUTOTUNE)


In [None]:
for x, y in training_dataset.as_numpy_iterator():
    print(x)
    print(y)
    break

Let's vizualize some pictures!

In [None]:
plot_examples(DATA_DIR + "train/cats")

In [None]:
plot_examples(DATA_DIR + "train/dogs")

# 💡 Model [WORK REQUIRED]
1. Start with a dummy single-layer model using one dense layer:

* Use a tf.keras.Sequential model. The constructor takes a list of layers.
* First, flatten the pixel values of the the input image to a 1D vector so that a dense layer can consume it. The first layer must also specify input shape (hint: size of the image X RGB for red/green/blue)
* Add a single dense layer with the appropriate activation and the correct number of units (hint: 2 classes)
* Add the last bits and pieces with model.compile(). For a classifier, you need 'sparse_categorical_crossentropy' loss.

**==>Train this model: not very good... but all the plumbing is in place.**

In [None]:
model = tf.keras.Sequential([
  pretrained_model,
  tf.keras.layers.Flatten(),
  # Add some dropout / dense layers
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(units=2, activation='softmax'),
])


# 💡 For fine-tuning a pretrained model, you need a very slow learning rate
# and a slow optimizer, like SGD or RMSProp.
# optimizer = tf.keras.optimizers.RMSprop(learning_rate=1e-4)
# When all freezing, you can use a higher learning rate because no risk
# to loose all information already leart by the pretrained model;
optimizer = tf.keras.optimizers.RMSprop(learning_rate=1e-3)


model.compile(
  loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=["accuracy"]
)

model.summary()

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True, dpi=90)

If you're using a pretrained model, check that the number of trainable parameters is 0!

In [None]:
EPOCHS = 5

history = model.fit(training_dataset, steps_per_epoch=2000//BATCH_SIZE+1, epochs=EPOCHS, validation_data=validation_dataset)

In [None]:
plot_training_curve(metric="loss")

In [None]:
plot_training_curve(metric="accuracy")

Let's check some examples predicted as dogs whereas it's cats!

In [None]:
predictions = model.predict_generator(validation_dataset)

Here, we'll display cats wrongly predicted as dogs by our model.

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
ax = ax.ravel()
fig_idx = 0

for idx, prediction in enumerate(predictions[:500]):
  if prediction[0] < prediction[1]:  # means that category dog has been predicted.
        img = imread(validation_data_dir + "/cats/" + os.listdir(validation_data_dir + "/cats")[idx])
        img = resize(img, input_shape)
        if fig_idx < 10:
          ax[fig_idx].imshow(img)
        fig_idx += 1
print(f"{fig_idx} cats wrongly predicted as dogs!")

Here, we'll display dogs wrongly predicted as cats by our model.

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
ax = ax.ravel()
fig_idx = 0

for idx, prediction in enumerate(predictions[500:]):
  if prediction[0] > prediction[1]:
        img = imread(validation_data_dir + "/dogs/" + os.listdir(validation_data_dir + "/dogs")[idx])
        img = resize(img, input_shape)
        if fig_idx < 10:
          ax[fig_idx].imshow(img)
        fig_idx += 1
print(f"{fig_idx} dogs wrongly predicted as cats!")

# Now try transfer learning!

💡 WORK REQUIRED

Instead of trying to figure out a better architecture, we will adapt a pretrained model to our data. Please remove all your layers to restart from scratch.

* Instantiate a pre-trained model from tf.keras.applications.* You do not need its final softmax layer (include_top=False) because you will be adding your own to fine-tune. The code is already written in the cell below.
* Use pretrained_model as your first "layer" in your Sequential model.
* Follow with Flatten() to turn the data from the pretrained model into a flat 1D vector.
* Add your Dense layer with softmax activation and the correct number of units (hint: 2 classes).
* You also have to preprocess the datasets according to the pretrained model.

**==>Train the model: you should be able to reach above 90% accuracy by training for 5 epochs**


This technique is called "transfer learning". The pretrained model has been trained on a different dataset but its layers have still learned to recognize bits and pieces of images that can be useful for flowers. You are retraining the last layer only, the pretrained weights are frozen. With far fewer weights to adjust, it works with less data.

In [None]:
# each Keras Application expects a specific kind of input preprocessing.
# For VGG16, call tf.keras.applications.vgg16.preprocess_input on your inputs before passing them to the model.
# Will convert the input images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.
for training_images, _ in training_dataset:
    training_images = tf.keras.applications.vgg16.preprocess_input(training_images)

for val_images, _ in validation_dataset:
    val_images = tf.keras.applications.vgg16.preprocess_input(val_images)

In [None]:
pretrained_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False ,input_shape=[*input_shape, 3])
pretrained_model.trainable = False
for layer in pretrained_model.layers:
    layer.trainable = False
# Check that the number of trainable parameters is empty!
assert not pretrained_model.trainable_weights

💡 WORK REQUIRED

OK, it's working very well! Nice, but maybe it's just because VGG has much more parameters than a simple model.

In order to really see the impact of transfer learning, we'll try to train the VGG model from scratch (without any pre-training). You need to initialize its parameters randomly, and make its parameters trainable (see below).
Check the results and the impact on that.

In [None]:
pretrained_model = tf.keras.applications.VGG16(weights=None, include_top=False ,input_shape=[*input_shape, 3])

for layer in pretrained_model.layers:
    layer.trainable = True

# Now, let's finetune!

It's cool to be able to use an existing model, but it's better to fine-tune the hidden layers for the model for our specific use-case. We hope then to get better results.
For that, you can unfreeze all the layers of the model, or just the last ones:

In [None]:
pretrained_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False ,input_shape=[*input_shape, 3])

for idx, layer in enumerate(pretrained_model.layers):
    if idx >= 11:
        layer.trainable = True
    else:
        layer.trainable = False

layers = [(layer, layer.name, layer.trainable) for layer in pretrained_model.layers]
pd.DataFrame(layers, columns=['Layer Type', 'Layer Name', 'Layer Trainable'])

💡 WORK REQUIRED

Try to unfreeze some of the last layers and see what happens!

# Vizualize the convolutions!

In [None]:
from matplotlib import pyplot

In [None]:
# summarize filter shapes
for layer in pretrained_model.layers:
	# check for convolutional layer
	if 'conv' not in layer.name:
		continue
	# get filter weights
	filters, biases = layer.get_weights()
	print(layer.name, filters.shape)

We can see that all convolutional layers use 3×3 filters, which are small and perhaps easy to interpret.

An architectural concern with a convolutional neural network is that the depth of a filter must match the depth of the input for the filter (e.g. the number of channels).

We can see that for the input image with three channels for red, green and blue, that each filter has a depth of three (here we are working with a channel-last format). We could visualize one filter as a plot with three images, one for each channel, or compress all three down to a single color image, or even just look at the first channel and assume the other channels will look the same. The problem is, we then have 63 other filters that we might like to visualize.

We can retrieve the filters from the first layer as follows:

In [None]:
layer = pretrained_model.get_layer('block1_conv1')
filters, biases = layer.get_weights()

Now we can enumerate the first six filters out of the 64 in the block and plot each of the three channels of each filter.

We use the matplotlib library and plot each filter as a new row of subplots, and each filter channel or depth as a new column.

In [None]:
f_min, f_max = filters.min(), filters.max()

f = pyplot.figure(figsize=(16,16))
# plot first few filters
n_filters, ix = 6, 1
for i in range(n_filters):
	# get the filter
	f = filters[:, :, :, i]
	# plot each channel separately
	for j in range(3):
		# specify subplot and turn of axis
		ax = pyplot.subplot(n_filters, 3, ix)
		ax.set_xticks([])
		ax.set_yticks([])
		# plot filter channel on a two-color scale,
		# from red (small, negative) to blue (large, positive)
		pyplot.imshow(f[:, :, j], vmin=f_min, vmax=f_max, cmap='RdBu')
		ix += 1
# show the figure
pyplot.show()

# Some insights on the solution

You should notice that using a pretrained model by freezing all the weights is the best solution. Indeed, the task here is so related to the ImageNet dataset that trying to fine-tune is too risky, and you may loose some information.

During finetuning, we already have a model which is very good so we don’t want to change the weights too much. So, we would use an optimizer with a very slow learning rate. In general, SGD is good choice for this as opposed to adaptive methods like Adam etc. (lr=1e-4).

