# Load the pre-trained BiT model

### Where to find the models

Models that output image features (pre-logit layer) can be found at
* `https://tfhub.dev/google/bit/m-{archi, e.g. r50x1}/1`

whereas models that return outputs in the Imagenet  (ILSVRC-2012) label space can be found at **bold text**

* `https://tfhub.dev/google/bit/m-{archi, e.g. r50x1}/ilsvrc2012_classification/1`

The architectures we have include R50x1, R50x3, R101x1, R101x3 and R152x4. The architectures are all in lowercase in the links.

In [1]:
import numpy as np
import tensorflow_hub
# Load model into KerasLayer
module = tensorflow_hub.KerasLayer("https://tfhub.dev/google/bit/m-r50x1/1")

In [2]:
# Hyperparameters
batch_size = 64 # Training batch size
num_classes = 15  # Classes in dataset
num_epochs = 40   # Epochs for training   

#### Add new head to the BiT model

Since we want to use BiT on a new dataset (not the one it was trained on), we need to replace the final layer with one that has the correct number of output classes. This final layer is called the head.

Note that it is important to **initialise the new head to all zeros**.

In [3]:
import tensorflow as tf

class MyBiTModel(tf.keras.Model):
  """BiT with a new head."""

  def __init__(self, num_classes, module):
    super().__init__()

    self.num_classes = num_classes
    self.head = tf.keras.layers.Dense(num_classes, kernel_initializer='zeros')
    self.bit_model = module
  
  def call(self, images):
    # No need to cut head off since we are using feature extractor model
    bit_embedding = self.bit_model(images)
    return self.head(bit_embedding)

model = MyBiTModel(num_classes=num_classes, module=module)

### Data and preprocessing

In [4]:
from google.colab import drive 
drive.mount('/content/gdrive') # 將 google drive 掛載在 colob，
%cd gdrive/My Drive/Colab Notebooks

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
/content/gdrive/My Drive/Colab Notebooks


If Last layer (Dense) activation=None

Using 'sparse' class_mode to fit the pretrain weight.

Also, the loss should be 'SparseCategoricalCrossentropy'


---

If Last layer (Dense) activation=softmax

Using 'categorical' class_mode to fit the pretrain weight.

Also, the loss should be 'categorical_crossentropy'

In [5]:
from keras.preprocessing.image import ImageDataGenerator


# preprocessing image and divide validaiton set
train_datagen=ImageDataGenerator(horizontal_flip=True, brightness_range=[0.5,1.5], zoom_range=[0.8,1], rescale=1/255)

train_generator=train_datagen.flow_from_directory('hw5_data/train/',
                                                 target_size=(256,256),
                                                 batch_size=batch_size,
                                                 class_mode='sparse',
                                                 shuffle=True,
                                                 subset='training')

validation_datagen=ImageDataGenerator(rescale=1/255)

validation_generator = validation_datagen.flow_from_directory('hw5_data/test/',target_size=(256,256),
                                                 batch_size=batch_size,
                                                 class_mode='sparse')

Found 1500 images belonging to 15 classes.
Found 150 images belonging to 15 classes.


Using TensorFlow backend.


**Hyperparameter heuristic details**

In BiT-HyperRule, we use a vanilla SGD optimiser with an initial learning rate of 0.003, momentum 0.9 and batch size 512. We decay the learning rate by a factor of 10 at 30%, 60% and 90% of the training steps. 

As data preprocessing, we resize the image, take a random crop, and then do a random horizontal flip (details in table below). We do random crops and horizontal flips for all tasks except those where such actions destroy label semantics. E.g. we don’t apply random crops to counting tasks, or random horizontal flip to tasks where we’re meant to predict the orientation of an object.

In [6]:
# Define optimiser and loss

lr = 0.003 * batch_size / 512 

# Decay learning rate by a factor of 10 at SCHEDULE_BOUNDARIES.
step_size_train = train_generator.n // train_generator.batch_size
total_step = num_epochs*step_size_train

lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(boundaries=[int(total_step*0.3), int(total_step*0.6), int(total_step*0.9)], 
                                                                   values=[lr, lr*0.1, lr*0.001, lr*0.0001])

optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule, momentum=0.9)

In [7]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
              loss=loss_fn,
              metrics=['accuracy'])

history = model.fit_generator(generator=train_generator, validation_data=validation_generator,
                              epochs=num_epochs)

Instructions for updating:
Please use Model.fit, which supports generators.


Instructions for updating:
Please use Model.fit, which supports generators.


Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40
