##### Copyright 2019 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Title

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/addons/tutorials/image_ops"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/addons/blob/master/docs/tutorials/_template.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/addons/blob/master/docs/tutorials/_template.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
      <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/docs/tutorials/image_ops.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

[Update button links]

## Overview

This tutorial will demonstrate how to implement discriminative layer training and how it can help in transfer learning. 

In this example, we will fine tune a pretrained imagenet resnet50 to classify a subset of the cifar 100 dataset, a tanks vs trains dataset. 

This tutorial will demonstrate that discriminative layer training helps improves training speed. The intuition is that lower layers are more generalizable and should be preserved, while higher layers are task specific. Setting a lower learning rate for the lower layers helps preserve general features for use by the high layers and prevent over fitting. 



## Setup

In [0]:
try:
  %tensorflow_version 2.x
except:
  pass

import tensorflow as tf

#it will be much faster on gpu, but you can still run this on cpu 
tf.config.list_physical_devices('GPU')

In [0]:
!pip install --no-deps tensorflow-addons~=0.7
!pip install typeguard

#discriminative wrapper not available in current tfa
!git clone https://github.com/hyang0129/addons

In [0]:
#duct taping to get the imports 
#will be changed or removed once we can import the wrapper from the main tfa modules

import shutil 

shutil.copy("addons/tensorflow_addons/optimizers/discriminative_layer_training.py", "discriminative_layer_training.py")

from discriminative_layer_training import DiscriminativeWrapper

## Prepare Data

First, we want to prepare our dataset. We will download cifar 100 and only keep data in label 85 and 90 (tanks and trains) 

In [0]:
from skimage import io 
import numpy as np 


train, test = tf.keras.datasets.cifar100.load_data()

#find the tanks and trains and filter down the dataset
train_tanksandtrains = np.isin(train[1], [85, 90]).flatten()

train_x = train[0][train_tanksandtrains ]
train_y = train[1][train_tanksandtrains ]
#if is tank then 1 else 0 
train_y = (train_y == 85) * 1

# do the same for test dataset
test_tanksandtrains = np.isin(test[1], [85, 90]).flatten()

test_x = test[0][test_tanksandtrains] 
test_y = test[1][test_tanksandtrains] 
test_y = (test_y == 85) * 1


# show a train 
print(train_y[0])
io.imshow(train_x[0])


We will also use some data augmentation because our training set is very small (1k images)

In [0]:

#create a data generator for augmentation 
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

#we only have 1000 training images, so we limit the steps to ensure the generator doesn't run out 
epochs = 10 
steps = 1000//64


##Define Model

This is our model function. It is a simple resnet50 with a pooling layer as the output. This gets fed to our classifer head. We will initialize this for regular training than reinitialize for discriminative layer training. 

In [0]:
#build a simple pretrained resnet with a custom head 
def get_model(): 
  model = tf.keras.Sequential() 
  model.add(tf.keras.applications.resnet50.ResNet50(weights = 'imagenet', 
                                                  input_shape = (32,32,3),
                                                  include_top = False, 
                                                  pooling = 'avg'))
  model.add(tf.keras.layers.Dense(1))
  model.add(tf.keras.layers.Activation('sigmoid'))
  return model 

example_model = get_model()
example_model.summary()

##Training Comparison

This is regular training. We assign a learning rate for the whole model then train for 10 epochs. However, because Adam is a momentum based optimizer, it has a tendency to pick up on irrelevant low level features and overfit the data. 

In [0]:
#get a copy of the model before any training
model = get_model() 

#define optimizer and compile 
opt = tf.keras.optimizers.Adam(learning_rate = 0.001)
model.compile(loss = 'binary_crossentropy',
              optimizer = opt)

#fit for 10 epochs
model.fit(datagen.flow(train_x, train_y, batch_size=64), 
          steps_per_epoch = steps, 
          epochs = epochs, 
          validation_data= (test_x, test_y))

Now we will attempt to correct that behaviour. We know that the lower level features don't need to be changed greatly, so we assign a lower learning rate multiplier of 0.1. If the overall learning rate is 0.001, then the resnet lower layers will learn at 0.1 * 0.001 = 0.0001, which is slower than the head. 

In [0]:
#get a copy of the model before any training
model = get_model() 

"""
intuitively, the lower layers contain general features like shapes, etc 
these features shouldn't need to change drastically for this new task 
"""

#assign layer 0, which is the resnet50 model an lr_mult of 0.1 to reduce lr 
model.layers[0].lr_mult = 0.1

'''
use the wrapper around an Adam class (do not pass an instance)
you can pass other kwargs to the wrapper, they will go straight to the 
base_optimizer. This is because the wrapper creates a copy of the base_optimizer
for each unique learning rate multiplier 
'''
opt = DiscriminativeWrapper(base_optimizer = tf.keras.optimizers.Adam, 
                            model = model, 
                            learning_rate = 0.001, )

#compile in the same way as a regular model 
model.compile(loss = 'binary_crossentropy',
              optimizer = opt)

#fit in the same way as a regular model
model.fit(datagen.flow(train_x, train_y, batch_size=64), 
          steps_per_epoch = steps, 
          epochs = epochs, 
          validation_data= (test_x, test_y))

Based on the results, you can see that slowing down the lower layers can help in transfer learning. This method requires more hyper parameter tuning, but can save you a lot of time for transfer learning tasks. By lowering the learning rate for lower layers, you can preserve the more generalizable features and allow your model to generalize better. 

I hope you find this tutorial helpful and find awesome ways to apply transfer learning and discriminative layer learning. 