> **DO NOT EDIT IF INSIDE annadl_f19 folder**


# Week 5: Transfer learning and multitask learning

Deep neural networks are **extremely expensive** to train. Training a good classifier on a complex task, like telling objects in images apart, or determining whether a move in a board game is good, can take weeks on multiple GPUs, cost millions of dollars in cloud computing fees and release massive amounts of CO$_2$ into the atmosphere ([some more than 5 cars over their entire lifetimes!](https://arxiv.org/pdf/1906.02243.pdf)). Because of this, we want to be able to **reuse** weights in models we have trained. This is called transfer learning. The fundamental idea is that things learned in one context can be *transferred* to another context.



## Exrcises

We will follow [a very nice blog post](https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/) written by Jason Brownlee of 'Machine Learning 
Mastery' for most of these exercises. In his blog post, Jason takes the reader through
the process of using pretrained models in Keras. Below I have outlined the steps you
will go through with reference to his blog post. I strongly recommend you read from the
top and down to 'Models for Transfer Learning' before proceeding.

### Loading pretrained models

The first practical thing we need to figure out when doing transfer learning is loading pretrained models. Keras makes this very easy by offering a number of pretrained models for image classification which can be downloaded through their [Applications API](https://keras.io/applications/#densenet). 

#### Applications API arguments

When loading pretrained models, we will want to provide some arguments that depend on what
we want to do with the model after loading. Below I ask you to explain, in your own words,
what some of these parameters do. See the Application API reference on some of the models
and the 'Models for Transfer Learning' section in Jason's bloc post for help.

> **Ex. 6.1.1**: In your own words, explain what the following function arguments do in
the different model loading functions:
1. `include_top`
1. `weights`
1. `input_shape`
1. `pooling`
1. `classes`
1. Explain what 'global pooling' does, and why it is needed when `include_top=False`

1. `include_top` specifies whether to include the "top" or last layer (fully connected layer) 
2. `weights` specifies the initialization of the network to either random initialization or pre-trained weights on ImageNet
3. `input_shape` specifies the input shape of the net
4. `pooling` determines whether pooling is applied to the last layer
5. `classes` specifies the number of classes to classify images into

#### Load a model and predict an image

> **Ex. 6.1.2**: Following Jason's example under 'Pre-Trained Model as Classifier'
classify [this image](https://66.media.tumblr.com/tumblr_mc46e7Zm4R1qbqngeo1_1280.jpg).
Print not just the most likely label, but everything that `decode_predictions` returns.
>
> ***Important***: *Don't use VGG as he does. It's 500 MB to download, and will take too long.
> Use one of the smaller models instead ([here](https://keras.io/applications/#documentation-for-individual-models)'s an overview of model sizes), such as DenseNet121.*

In [23]:
from keras.applications.densenet import DenseNet121

model = DenseNet121()

In [18]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
zero_padding2d_5 (ZeroPadding2D (None, 230, 230, 3)  0           input_3[0][0]                    
__________________________________________________________________________________________________
conv1/conv (Conv2D)             (None, 112, 112, 64) 9408        zero_padding2d_5[0][0]           
__________________________________________________________________________________________________
conv1/bn (BatchNormalization)   (None, 112, 112, 64) 256         conv1/conv[0][0]                 
__________________________________________________________________________________________________
conv1/relu

In [24]:
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.densenet import preprocess_input
from keras.applications.densenet import decode_predictions

image = load_img('ulfaslak.jpg', target_size=(224,224))
image = img_to_array(image)
image = image.reshape((-1, image.shape[0], image.shape[1], image.shape[2]))

image = preprocess_input(image)

In [25]:
yhat = model.predict(image)
yhat

array([[1.88591900e-06, 3.30823235e-09, 1.05562708e-06, 6.98458507e-07,
        6.72708438e-06, 2.54616054e-07, 3.14656944e-07, 3.95036288e-08,
        8.53716264e-10, 2.53497645e-09, 1.10890186e-08, 3.99433508e-09,
        1.02950395e-08, 1.41748231e-08, 4.83069451e-09, 5.33371458e-10,
        2.11743001e-09, 7.73878384e-09, 1.47610333e-08, 5.17284420e-08,
        1.98311434e-09, 1.70079928e-09, 5.09798648e-10, 1.08855778e-08,
        3.69480659e-08, 7.55279984e-07, 8.78806645e-08, 4.32128147e-08,
        4.30566843e-07, 9.66730624e-08, 1.41663151e-07, 2.18568047e-07,
        4.70635905e-06, 3.76720936e-06, 1.46494611e-04, 2.53633698e-06,
        5.18799152e-06, 1.56465632e-07, 3.38856019e-08, 5.91074354e-07,
        1.64300697e-08, 3.51538887e-08, 1.17251737e-07, 1.76167612e-06,
        2.70661019e-07, 2.97074416e-06, 7.75385445e-09, 2.18628273e-07,
        2.56786848e-07, 1.76132247e-07, 2.05485026e-06, 8.92312255e-06,
        4.09186896e-06, 1.61318428e-06, 4.48744601e-07, 1.046599

In [32]:
label = decode_predictions(yhat)[0][0]

In [34]:
print('%s : %.4f' % (label[1], label[2]))

hammer : 0.4042


### Adapting pretrained models

#### Simple feature extractor for ML prediction

By removing the last layer, we can turn a pretrained convolutional neural network into a
feature extractor. We can then use it to extract features of a large number of images and
classify those using any machine learning model. Jason describes this under 'Pre-Trained Model as Feature Extractor Preprocessor'.

> **Ex. 6.2.1:** Extract features for every datapoint in the [fashion-mnist dataset](https://keras.io/datasets/#fashion-mnist-database-of-fashion-articles), and build a feature matrix X. Train an SVM classifier on the learned features, and report the accuracy on the test data.
>
> *Hint: You can import SVM from sklearn. It has a simply API, just check out some of the examples on the [documentation page](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).*

In [36]:
from keras.datasets import fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz


In [77]:
import numpy as np

x = x_train
x = np.append(x, x_test, axis=0)
x = x.reshape(-1, 28, 28, 1)
x.shape

(70000, 28, 28, 1)

In [75]:
from keras.models import Model

model = DenseNet121(include_top=False, input_shape=(28,28,1))

model.layers.pop()

model = Model(inputs=model.inputs, outputs=model.layers[-1].output)

model.summary()

ValueError: The input must have 3 channels; got `input_shape=(28, 28, 1)`

In [None]:
matrix = []

for img in x:
    img = img.reshape(-1, 28, 28, 1)
    features = model.predict(img)
    matrix.append(features)

print(matrix.shape)

#### Changing the prediction task (switching out the last layer)

Another way to achieve roughly the same thing is to remove the last layer and insert a new one with a different number of outputs. Jason describes this under 'Pre-Trained Model as Feature Extractor in Model'.

> **Ex. 6.2.2**: Do the same as above, but by following Jason's example under 'Pre-Trained Model as Feature Extractor in Mode'.
Compare to the accuracy you got in 6.2.1.