<a href="https://colab.research.google.com/github/touseefashraf/DL_course/blob/main/DL_Lab_2_2_homework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DL Lab 2.2 - Homework - Transfer Learning and Fine-Tuning of Pretrained Models

In the last lab, you built a simple ConvNet, trained it from scratch, added augmentation and regularization techniques, and achieved about 90% accuracy.

In this homework, you will use two techniques for (re-)using the "experience" stored in **pre-trained** models. These techniques are extremely useful if you don't have suffcient data for training a bigger model from scratch. Data such as our images of 10 different classes of flowers. ;-)

In detail, you will use **transfer learning** and **fine-tuning** of a model that was already trained on a very large dataset, namely the *ImageNet ILSVRC* data.

In the end, you will investigate how our ConvNet actually perceives and recognizes the given data by **visualizing feature maps** as well performing **Activation Maximization**.

***

**After completing this homework you will be able to**

- Use **pretrained models** for **transfer-learning** and **fine-tuning**
- alter the **train scope** of specific layers of your models
- **interprete** what your model has learned.

***

**Instructions**

- You'll be using Python 3 in the iPython based Google Colaboratory
- Lines marked by "<font color='green'>`# TODO`</font>" denote the code fragments to be completed by you.
- There's no need to write any other code.
- After writing your code, you can run the cell by either pressing `SHIFT`+`ENTER` or by clicking on the play symbol on the left side of the cell.
- We may specify "<font color='green'>`(≈ X LOC)`</font>" in the "<font color='green'>`# TODO`</font>" comments to tell you about how many lines of code you need to write. This is just a rough estimate, so don't feel bad if your code is longer or shorter.
- If you get stuck, check your Lecture and Lab notes and use the [discussion forum](https://moodle2.tu-ilmenau.de/mod/forum/view.php?id=166458) in Moodle.

Let's get started!

**Note**: Training a ConvNet is a computationally expensive process. Most of the computations can be parallelized very efficently, making them a perfect fit for GPU-acceleration. In order to enable a GPU for your Colab session, do the following steps:
- Click '*Runtime*' -> '*Change runtime type*'
- In the pop-up window for '*Hardware accelerator*', select '*GPU*' 
- Click '*Save*'

# 0 - Test for GPU

Execute the code below for printing the TF version and testing for GPU availability.

In [None]:
#@title Print TF version and GPU stats
import tensorflow as tf
import sys
print('TensorFlow version:', tf.__version__)

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
   raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name), '', sep='\n')
!nvidia-smi

# 1 - Download and Prepare the Data
Execute the cells below to download the data and configure the data generators.

In [None]:
#@title Dataset Downloader

import os
import numpy as np

#@title Dataset Downloader
DEST_PATH = '/tmp/flowers10.zip'

!wget -nv -t 0 --show-progress -O $DEST_PATH 'https://cloud.tu-ilmenau.de/s/WGNk32LRQ847rS6/download/flowers10.zip'
!sleep 1
!unzip -uq $DEST_PATH
!rm  $DEST_PATH

base_dir = 'flowers10/'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'val')

In [None]:
#@title Prepare Data Generators
from tensorflow.keras.preprocessing.image import ImageDataGenerator

batch_size = 128

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.4,
    brightness_range=(.5, 1.5),
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest'
)
val_datagen = ImageDataGenerator(rescale=1./255)

# Flow training images in batches using train_datagen generator
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150,150),
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=True,
    seed=42
)

num_classes = train_generator.num_classes

# Flow validation images in batches using val_datagen generator
validation_generator = val_datagen.flow_from_directory(
    validation_dir,
    target_size=(150,150),
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

train_steps = np.ceil(train_generator.samples / train_generator.batch_size)  # 800 images = batch_size * steps
val_steps = np.ceil(validation_generator.samples / validation_generator.batch_size)  # 200 images = batch_size * steps

# 2 - Transfer-Learning Using Pretrained Models

## 2.1 - Feature Extraction for Downstream Classification

One thing that is typically done in computer vision tasks is to take a model trained on a very large dataset, e.g., the *ImageNet ILSVRC* data, and use this model for feature extraction on smaller datasets. Even though the dataset and task the model was originally trained on might be quite different to the actual dataset and task, the extracted features are typically very informative. This versatility and repurposability of ConvNets is one of the most interesting aspects of Deep Learning. The extracted features (aka **representations**) can then be used for downstream tasks such as classification.

In this homework, you will use the famous [VGG16 model](https://arxiv.org/abs/1409.1556) named after the **V**isual **G**eometry **G**roup from Oxford. The model was pre-trained on the ImageNet ILSVRC data, i.e., a large dataset of web images (1.4M images and 1000 classes). Although outperformed by more recent architectures, this simple but powerful model won the ILSVR challenge in 2014 and remains very famous for feature extraction since then. Let's see how these features help for our flower classification task!

First, we need to pick which intermediate layer of VGG16 we will use for feature extraction. A common practice is to use the output of the last convolution layer before the fully connected (aka dense) layers. The fully connected layers are too specialized for the original task the network was trained in. Hence using their outputs as features won't be very useful for a new task.

The zoo of pre-trained models is provided in the [`keras.applications` module](https://www.tensorflow.org/api_docs/python/tf/keras/applications) in TF. You can directly load the architecture from there. If you set `weights='imagenet'`, the `imagenet` weights will be loaded automatically.

If `include_top=False` is specified, the network is loaded without the original classification layers at the top. Instead the network's last layer is a `MaxPooling2D` layer pooling across the output of the last convolution layer.

**Task**: Complete the code below for building a feature extraction model based on the VGG16. Use a `GlobalMaxPooling2D` layer on the output of the pre-trained model.

**Hint**: Layers are callable objects. Same as models, they have the properties `.input` and `.output`, providing their input and output tensors.

**Hint$^2$**: `pre_trained_model.input` provides the input tensor of the pretrained VGG16 model.

In [None]:
from tensorflow.keras import layers, Model
from tensorflow.keras.applications.vgg16 import VGG16

def build_feature_extraction_model(input_shape, summary=True):

  ### START YOUR CODE HERE ###  (≈3 LOC)

  # Initialize the pre-trained model
  pre_trained_model = 
  
  # Add global maxpooling layer on output of pre-trained model

  # Define the feature_extraction_model
  feature_extraction_model = 
  
  ### END YOUR CODE HERE ###

  # Freeze pre-trained model
  pre_trained_model.trainable = False

  if summary:
    print(feature_extraction_model.summary())

  return feature_extraction_model

In [None]:
feature_extraction_model = build_feature_extraction_model( (150,150,3) )

You can now use the `feature_extraction_model` for computing powerful representations of the images in your dataset. We create a utility function that reads the data from the data generators in batches and accumulates the extracted features and their associated class labels in the memory:

In [None]:
def extract_features_in_batches(model, generator, repetitions=1):
  ''' Loop over `generator` batches and return extracted features and labels '''

  X, Y = ( list(), list() )
  generator.reset()
  
  for i in range(repetitions):

    if i:
      print('\nrepetition', i)

    batch_index = 0
    while batch_index <= generator.batch_index:
        sys.stderr.write('\rbatch {}'.format(batch_index) )
        batch_x, batch_y = generator.next()
        Y.extend( batch_y )
        X.extend( model.predict( batch_x ) )
        batch_index += 1
  
  return np.asarray(X), np.asarray(Y)

**Task**: Extract validation and training features (and their associated labels) using the `feature_extraction_model` on the `validation_generator` and the `training_generator`. For **augmenting** the training data, use **five repetitions** on the training data.

In [None]:
### START YOUR CODE HERE ###  (≈2 LOC)

# Validation features `X_val` and labels `Y_val`
X_val, Y_val = 

# Training features `X_train` and labels `Y_train`
X_train, Y_train = 

### END YOUR CODE HERE ###

Next, build a small Neural Network consisting of two hidden dense layers of 512 neurons each and 20% dropout, followed by a dense layer for classification. Don't forget to make reasonable decisions about the activation functions.

In [None]:
from tensorflow.keras.optimizers import Adam

def build_classifier_model(input_shape, num_classes, init_lr=1e-3):
    ### START YOUR CODE HERE ### (≈7 LOC)
    
    ### END YOUR CODE HERE ###

    model.compile(
        loss='categorical_crossentropy',
        optimizer=Adam(learning_rate=init_lr),
        metrics=['accuracy']
    )
  
    return model

In [None]:
#@title `plot_history()` definition
from matplotlib import pyplot as plt

def plot_history(history):
  fig, (ax1, ax2) = plt.subplots(2,1, sharex=True, dpi=150)
  ax1.plot(history.history['loss'], label='training')
  ax1.plot(history.history['val_loss'], label='validation')
  ax1.set_ylabel('Cross-Entropy Loss')
  ax1.set_yscale('log')
  if history.history.__contains__('lr'):
    ax1b = ax1.twinx()
    ax1b.plot(history.history['lr'], 'g-', linewidth=1)
    ax1b.set_yscale('log')
    ax1b.set_ylabel('Learning Rate', color='g')

  ax2.plot(history.history['accuracy'], label='training')
  ax2.plot(history.history['val_accuracy'], label='validation')
  ax2.set_ylabel('Accuracy')
  ax2.set_xlabel('Epochs')
  ax2.legend()
  plt.show() 

Using the extracted features, you can now train your classification model. Let's see how it performs after 50 epochs of training:

In [None]:
classifier_model = build_classifier_model( X_val.shape[1], Y_val.shape[1], init_lr=1e-4)

history = classifier_model.fit(
    x=X_train,
    y=Y_train,
    batch_size=batch_size,
    epochs=50,
    validation_data=(X_val, Y_val),
    verbose=2
)
plot_history(history)

You should make two observations:

1.   Without much effort, you achieve roughly the same validation accuracy as with the simple ConvNet designed in the last lab. :-D
2.   The model is converging quite fast but is plateauing on the validation data. Remember that every training image was only augmented five times.

## 2.2 - Learning new Heads on pre-trained Models

Instead of a) using the pre-trained ConvNet as feature extractor and b) another classifier model, you can do both jobs a) + b) in one and the same model. For this, you simply add your classifier on top of your ConvNet backbone (or replace the "old" classifier by your "new" classifier).

**Task**: Complete the code below in order to add a classifier on top of the VGG16 stub.

In [None]:
from tensorflow.keras import layers, Model
from tensorflow.keras.applications.vgg16 import VGG16

def build_model(input_shape, num_classes, summary=True):

    pre_trained_model = VGG16(
        input_shape=input_shape,
        weights='imagenet',
        include_top=False
    )

    ### START YOUR CODE HERE ### (≈7 LOC)
    
    # Add global maxpooling layer on output of pre-trained model
    
    # Add a fully connected layer with 512 hidden units and ReLU activation
    
    # Add a dropout layer with rate 0.2
    
    # Add a fully connected layer with 512 hidden units and ReLU activation
    
    # Add a dropout layer with rate 0.2
    
    # Add a final softmax layer for classification
    
    # Define the model

    ### END YOUR CODE HERE ###

    if summary:
        print(model.summary())

    return model, pre_trained_model

In [None]:
model, pre_trained_model = build_model((150,150,3), num_classes)

You want to use the pretrained VGG16 backbone only for feature extraction. In order to prevent any weights of the pretrained model from beeing updated during training, you will **freeze** it. Only the new layers on top shall be trainable.

In Keras, you can easily access the list of layers of a model using its `layers` property. Each layer has a `trainable` property defining whether or not to update it's weights during training. You can set `trainable = False` for all layers of the pretrained VGG16 model. 

Instead of setting the `trainable` property per layer, you can also set the `trainable` property of entire models!

Anyway, the model summary should tell you the reduced number of trainable parameters, amounting to ≈530k parameters. Also compare the total number of parameters: the simple ConvNet designed in last lab's homework assignment roughly had ≈24k parameters. The VGG16 based model contains ≈15M parameters!

**Task**: Freeze all weights of the `pre_trained_model`.

In [None]:
### START YOUR CODE HERE ### (≈1 LOC)

### END YOUR CODE HERE ###

print(model.summary())

Whenever you change the `trainable` setting of any layer, you need to compile your model for making the changes take effect.

**Task**: Compile the model. For compilation, use the 
`'categorical_crossentropy'` as loss function, Adam optimizer with learning rate `INITIAL_LEARNING_RATE`, and `'accuracy'` as metric.

In [None]:
from tensorflow.keras.optimizers import Adam

INITIAL_LEARNING_RATE = 1e-3

### START YOUR CODE HERE ### (≈1 LOC)

### END YOUR CODE HERE ###

Train the model for 30 epochs and check the results.

In [None]:
history = model.fit(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=val_steps,
    verbose=2
)

plot_history( history )

Despite the smaller number of samples per epoch, the time for each epoch is notably larger as every image needs to be loaded from disk and propagated through the entire ConvNet. On the other hand, every image is randomly augmented in every epoch, which should increase the model's generalizability. Training for more epochs while using learning rate decay would likely improve the accuracy.

# 3 - Fine-Tuning Pre-trained Models

## 3.1 - Unfreeze Layers
In the previous two experiment, you only trained the weights of the new layers, both as independent model as well as in the joint model for feature extraction and classification. The weights of the pre-trained network were not updated during training. 

To increase the accuracy, you can also **fine-tune** the weights of the pre-trained network for learning more discriminant representations of your data. All you need to to is **unfreeze** the layers of the pre-trained network. Note that you have to recompile the model in order to let this take effect. In comparison to the learning rate used during transfer learning, the learning rate used for fine-tuning is typically smaller. 

**Task**: Unfreeze the layers of the pre-trained model, i.e., make them trainable. Then compile the model again and compare with the model summary.

In [None]:
INITIAL_LEARNING_RATE = 3e-5

### START YOUR CODE HERE ### (≈2 LOC)

### END YOUR CODE HERE ###

print(model.summary())

## 3.2 - Using Callbacks

You will define [callbacks](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks) for scheduling the learning rate as training progresses. 
Callbacks are utility classes that are called in every epoch during training. In the cell below, the custom function `lr_step_decay` defines the step-wise reduction of the learning rate by 90% after every ten epochs. The 
[Learning rate scheduler](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler) callback `LRDecayCallback` then uses this function for actually setting the learning rate.

In [None]:
def lr_step_decay(epoch, lr, drop=.9, drop_epochs=10):
    if epoch < 10:
        return INITIAL_LEARNING_RATE
    else:
        return INITIAL_LEARNING_RATE * np.power(drop, np.floor(epoch/drop_epochs))

LRDecayCallback = tf.keras.callbacks.LearningRateScheduler(lr_step_decay, verbose=1)

In addition to the learning rate callback, you will also use an [Early stopping callback](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) that stops the training process as soon as the validation loss stops decreasing. You will use this `StopCallback` for preventing your model from overfitting on the training data.

**Task**: Create an early stopping callback monitoring the validation loss and stopping the training if the validation loss did not increase for ten epochs.

In [None]:
### START YOUR CODE HERE ### (1 LOC)
StopCallback = 
### END YOUR CODE HERE ###

Now, continue to train the model for a maximum of 200 epochs. Please note the VGG16 model is really large and quite slow during training.

**Task**: Add the `LRDecayCallback` and `StopCallback` to the call of the `fit` method and fine-tune your model.

In [None]:
### START YOUR CODE HERE ### (1 LOC)
history = model.fit(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=200,
    validation_data=validation_generator,
    validation_steps=val_steps,
    verbose=2,
)
### END YOUR CODE HERE ###

plot_history( history )

**Notes on fine-tuning**:

- Fine-tuning should only be attempted *after* you have trained the new head layers with the frozen pre-trained model.
- Depending on the amount of training data available, you might fine-tune only *few layers* of the pre-trained model rather than all layers of the pre-trained model.

# 4 - Interpreting ConvNets

## 4.1 - Visualizing Feature Maps

Feature maps display the output of a convolution layer after forwarding the input data through the network. The idea of visualizing a feature map for a specific input image would be to understand what features of the input are activated in a feature map. We expect feature maps close to the input to activate small details, such as corners, edges, as well as colors, whereas feature maps close to the output of the model capture more abstract and high-level concepts.

First, we need a clearer idea about the output shape and layer index of the convolution layers:

In [None]:
for layer_idx, layer in enumerate(model.layers):
    # check if convolutional layer
    if not 'convolutional' in str(layer.__class__):
        continue
    print(layer_idx, layer.name, layer.output.shape)

You can now use this information for designing a new model that contains a subset of the layers from the full VGG16 model. The model would have similar input layer as your original model, but the output would be the output of a given convolutional layer.

**Task**: Create a new model `visualization_model` that returns the output of the first convolutional layer (`layer_idx=1`):

**Hint**: Use the `layers` property of the model and the `output` property of the respective layer.

In [None]:
### START YOUR CODE HERE ### (1 LOC)
visualization_model = 
### END YOUR CODE HERE ###

print(visualization_model.summary())

Let's pick a random image from your validation data for exploring the feature map visualizations:

In [None]:
def show(img):
    '''display image'''
    plt.figure(figsize=(6,6))
    plt.grid(False)
    plt.axis('Off')
    plt.imshow(img)
    plt.show()

img = validation_generator.next()[0][0]
show(img)

# expand dimensions so that it fakes a batch containing a single sample
img = np.expand_dims(img, axis=0)

You are now ready to get the feature maps by forwarding the image through the network calling `visualization_model.predict()`.

The result will be feature maps with shape `(150, 150, 64)`. Let's plot the result as an 8x8 array of images.

In [None]:
feature_maps = visualization_model.predict(img) 

In [None]:
#@title Plot feature maps
square = 8
fig = plt.gcf()
fig.set_size_inches(square*2,square*2)
idx = 1
for _ in range(square):
    for _ in range(square):
        sp = plt.subplot(square, square, idx)
        sp.axis('Off')
        sp.title.set_text(str(idx-1))
        plt.imshow(feature_maps[0, :, :, idx-1])
        idx += 1

plt.show()   

You can see that the result of applying the filters of the first convolution layer is a lot of versions of the original flower image with different features highlighted.

In order to print feature maps of deeper convolution layers, you simply update the `visualization_model` for the index of the targeted layer. You can also collect feature maps from each block of the model in a single forward pass, and then create a square of images for each block.

In the VGG16 architecture, there are five main blocks `block1`, `block2`, and so on. The layer_indices of the first convolutional layer in each block are `[1, 4, 7, 11, 15]`.

**Task**: Define a new `visualization_model` that returns the outputs of the first convolutional layers in each block:

In [None]:
### START YOUR CODE HERE ### (≈2 LOC)
layer_indices = 
visualization_model = 
### END YOUR CODE HERE ###

For sake of visibility, let us cap the number of displayed feature maps per layer at 16:

In [None]:
#@title Plot feature maps
plt.imshow(img[0])
plt.axis('Off')
plt.title('Input Image')
plt.show()

square = 8
feature_maps = visualization_model.predict(img)
for layer_idx, fmap in enumerate(feature_maps):
  fig = plt.gcf()
  fig.set_size_inches(square*2,square*2)
  fig.suptitle( model.layers[ layer_indices[layer_idx] ].name )
  idx = 0
  for _ in range(square):
    for _ in range(2):
      fm = fmap[0, :, :, idx]
      sp = plt.subplot(square, square, idx+1)
      sp.axis('Off')
      sp.title.set_text(str(idx))
      plt.imshow(fm)
      idx += 1

  plt.show()

As expected, the feature maps closer to the input capture a lot of fine details in the image. As you progress deeper into the model, the feature maps show less and less details. Some channels show large activations at the original location of the flower in the image.

Analyzing all channels would be rather time-consuming. Instead, let's visualize where your model *looks* on average, i.e., the average of the feature maps:

In [None]:
#@title Plot average feature maps
plt.imshow(img[0])
plt.axis('Off')
plt.title('Input Image')
plt.show()

fig=plt.figure(figsize=(14, 14))
for layer_idx, fmap in enumerate(feature_maps):
  sp = fig.add_subplot(1, len(feature_maps), layer_idx+1)
  sp.axis('Off')
  sp.title.set_text(model.layers[ layer_indices[layer_idx] ].name)
  plt.imshow(np.squeeze(fmap.mean(axis=-1)))

Instead of plotting the average of the feature maps, it can be quite informative to visualize the feature map that had the largest overall activation:

In [None]:
#@title Plot max activation feature maps
plt.imshow(img[0])
plt.axis('Off')
plt.title('Input Image')
plt.show()

fig=plt.figure(figsize=(14, 14))
for layer_idx, fmap in enumerate(feature_maps):
  sp = fig.add_subplot(1, len(feature_maps), layer_idx+1)
  sp.axis('Off')
  sp.title.set_text(model.layers[ layer_indices[layer_idx] ].name)
  max_idx = np.argmax(np.sum(fmap, axis=(1,2)), axis=-1)
  plt.imshow(np.squeeze(fmap[:,:,:,max_idx]))

## 4.2 - Visualizing Convolution Filters

Now that you learned how to analyze where your network *looks*, let us visualize how your network actually perceives images. In detail, you will visualize input patterns that activate specific filters of your network by **activation maximization**.

The idea is to optimize the pixel values of a random input image via **gradient ascent** in order to maximize the average of a specific feature map.

For performing gradient ascent, you need to:
1. Freeze the network (you don't want to update any weights of the network).
2. Forward a randomly initialized image through the network.
3. Compute the average of a specific feature map in a layer.
4. Compute the gradient of the average feature map with respect to the pixel values of the input image.
5. Update the pixel values based on the backpropagated gradient.
6. Repeat steps 2-5

Make it so!

**Task**: Freeze all layers of `model`.

In [None]:
from tensorflow.keras import backend as K
import tensorflow as tf

### START YOUR CODE HERE ### (≈2 LOC)

### END YOUR CODE HERE ###

**Task**: Define `vis_model` to output the feature map specified by `filter_idx` of the layer specified by `layer_idx`.

**Hint**: The shape of a convolution layer's output is `(num_samples, width, height, num_filters)`.

In [None]:
def deprocess(img):
  return tf.cast( 255*(img[0,:] + 1.) / 2., tf.uint8)

def calc_loss(img, model):
  '''Compute loss as average activation of the model'''
  return K.mean( model(img) )

def filter_activation_maximization(model, img, layer_idx, filter_idx, 
                                   steps=100, 
                                   step_size=1, 
                                   vis_steps=100,
                                   display=True):

    # Define the model
    ### START YOUR CODE HERE ### (1 LOC)
    vis_model = Model(
        model.input,
        
    )
    ### END YOUR CODE HERE ###

    for step in range(steps):
        with tf.GradientTape() as gtape:
            gtape.watch(img)
            loss = calc_loss(img, vis_model)

            # Compute the gradient of the loss with respect to the input image
            grads = gtape.gradient(loss, img)

            # Normalize the gradients
            grads = K.l2_normalize(grads) + 1e-8

            # Perform gradient ascent to make the image increasingly activate the filter
            img += grads*step_size
            img = tf.clip_by_value(img, -1, 1)

            if ((step+1) % vis_steps == 0) or step+1 == steps:
                print('step {} - loss: {:.3g} - avg. gradient: {:.3g}'.format(step+1, loss, np.mean(np.abs(grads))))
                if display:
                    show( deprocess(img) )
    return img

Run the cell below to restore a checkpoint where the model was already fine-tuned on our flowers dataset.

In [None]:
#@title Restore fine-tuned model from checkpoint

ckp_path = '/tmp/checkpoints/fi_flowers_vgg16_fine_tuned'
os.makedirs(ckp_path, exist_ok=True)
DEST_PATH = os.path.join(ckp_path, 'model.zip')

# download and unzip
!wget -nv -t 0 --show-progress -O $DEST_PATH 'https://cloud.tu-ilmenau.de/s/zW7t9oD9dHJA8jj/download/lab_2.2_ckp.zip'
!sleep 1
!unzip -uq $DEST_PATH
!rm  $DEST_PATH

# restore model
model.load_weights(os.path.join( os.path.basename(ckp_path), 'model'))

Evaluation using this checkpoint should return 97% accuracy on the validation data:

In [None]:
loss, accuracy = model.evaluate(validation_generator)
print('\nloss: {:.4f}\nacc: {:.4f}'.format(loss, accuracy))

Now run `filter_activation_maximization` on an image initialized with random noise and visualize the input patterns activating the first twelve filters of the first convolution layer (`layer_idx = 1`).

In [None]:
layer_idx = 1
num_filters = 12

fig = plt.gcf()
fig.set_size_inches(num_filters*2,8)
fig.suptitle( model.layers[ layer_idx ].name )

for filter_idx in range(num_filters):
    filter_idx_offset = 0
    img = filter_activation_maximization(
        model, 
        tf.cast( np.random.random((1, 150, 150, 3))*.1 +.3 , tf.float32 ),
        layer_idx, 
        filter_idx + filter_idx_offset,
        steps=200,
        display=False
    )

    sp = plt.subplot(num_filters // 6, 6, filter_idx+1)
    sp.axis('Off')
    sp.title.set_text(str(filter_idx+filter_idx_offset))
    plt.imshow(deprocess(img))

plt.show()

Filter of later layers activate more abstract high-level concepts:

In [None]:
# Define the layer to be activated
layer_idx = 17 # corresponds to layer 'block5_conv3'

# Define the layer's filter to be activated
# Try out different filters: 3, 9, 17, 136, 157, 160, 168, 179, 185, 199, 202, 222, 224
filter_idx = 136 

# Init random gray image with some noise
img = tf.cast( np.random.random((1, 150, 150, 3))*.1 +.3 , tf.float32 )

# Run activation maximization
img2 = filter_activation_maximization(model, img, layer_idx, filter_idx, 200, 1, 200)

***

# Congratulations!

You've learned how to apply **Transfer Learning** and **Fine-tuning** of large models on small datasets. In addition, you learned how to use **learning rate schedules** and **early stopping** as **CallBacks**. At last, you saw how your model can be tweaked to return intermediate results such as feature maps and how to do **activation maximization** for interpreting the filters your model has learned.

***