#  Using Keras models without pre-trained weights

- So far, we've seen the effectiveness of models pre-trained on ImageNet weights, but what if we specify **weights=None** when we load a model? 

- Well, we'll instead be randomly initializing the weights, as if we had built a model on our own and were starting from scratch.

- There are few situations where this might even be a potential use case - basically, when you have data that is very different from the original data. 

- However, given the large size of the ImageNet dataset (remember, it's over 14 million images from 1,000 classes!), it's highly unlikely this is really the case - it will almost always make the most sense to start with ImageNet pre-trained weights, and only fine-tune from there.


Let's check out what happens when we try to use a pre-made model but set the weights to None 
- This means no training has occurred yet!

In [1]:
# VGG without Pre-trained weights. Set weights=None.
from keras.applications.vgg16 import VGG16, decode_predictions

from glob import glob
import numpy as np

import matplotlib.image as mpimg
import matplotlib.pyplot as plt

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input


image_paths = glob('images/*.jpg')
img_path = image_paths[2]

img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
plt.imshow(img)

### Load VGG16 model, without pre-trained weights. Get wacky predictions.
model = VGG16(weights=None)
predictions = model.predict(x)
print('Predicted:', decode_predictions(predictions, top=3)[0])

  return f(*args, **kwds)
Using TensorFlow backend.
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


Predicted: [('n02892767', 'brassiere', 0.0022033758), ('n03794056', 'mousetrap', 0.0019655433), ('n01734418', 'king_snake', 0.0017350846)]


# Lab: Transfer Learning

- Train a network with ImageNet pre-trained weights as a base
    - but with additional network layers of our own added on... 
    
- See the difference between using frozen weights and training on all layers.

### GoogLeNet 

#### Inception architecture 

```python
from keras.applications.inception_v3 import InceptionV3
input_size = 139

# Using Inception with ImageNet pre-trained weights
inception = InceptionV3(weights=weights_flag, include_top=False,
                        input_shape=(input_size,input_size,3))
```


![Inception architecture](inception.PNG)

#### Batch normalization 

- https://keras.io/api/layers/normalization_layers/batch_normalization
- Paper: https://arxiv.org/abs/1502.03167


### Pre-trained with frozen weights

- Frozen weights are often used when only fine-tuning the model. Backpropagation and weight updates will not be applied to any frozen layers during training.

- If we have an ImageNet pre-trained model, most of the network is likely applicable to our situation, so we may only need to cut off the top fully-connected layer, freeze all other layers, and just add one or more layers at the end that are not frozen to perform some fine-tuning.

- There is also the option of not freezing the weights, which will start  model on the ImageNet pre-trained weights (if applicable) and then perform further training from there.

- Freezing the weights also helps "memory usage" and "training speed" 
    - for larger networks such as VGG, there is a substantially larger memory and slower speed when it needs to perform backpropagation and weight updates across all layers instead of just on a small portion of layers.


In [2]:
from keras.applications.inception_v3 import InceptionV3
from keras.utils import plot_model

freeze_flag = True
weights_flag = 'imagenet' 
preprocess_flag = True 

inception = InceptionV3(weights=weights_flag, include_top=False, input_shape=(139,139,3))
# plot_model(inception)

# Check out layers of the model.
inception.summary()

# for idx, layer in enumerate(inception.layers):
#     print("{:4}: {}".format(idx, layer)) 
#     print(inception.layers[-5].activation) 
  
if freeze_flag == True:
    for layer in inception.layers:
        layer.trainable = False            
        

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 139, 139, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 69, 69, 32)   864         input_2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 69, 69, 32)   96          conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 69, 69, 32)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (

### Dropping layers

#### inception.layers.pop()

In a normal Inception network, inception.summary() showed:
- The last two layers were a global average pooling layer, and a fully-connected "Dense" layer.
- However, since $ InceptionV3(..., include\_top=False) $ is constructed, both last two layers get dropped.

To drop additional layers, one would use $model.laypers.pop()$ - which works from the end of the model backwards.

```python
model.layers.pop()
```

### Adding new layers

- Keras's Sequential model is for simplicity. Use Model API which functions a little differently. 
    - Instead of using model.add(), we explicitly tell the model which previous layer to attach to the current layer. 
    - This is useful for advanced features, e.g. skip layers - which were used heavily in ResNet.
    
```python

## Attach a new dropout layer x, with it's input coming from a layer with the variable name *inp.*
x = Dropout(0.2)(inp)


```

**Let's use CIFAR-10 dataset, which consists of 60,000 32x32 images of 10 classes.**
- Use Keras's Input function
- Re-size the images up to the **input_size** specified earlier (139x139).


In [4]:
from keras.layers import Input, Lambda
from keras.layers import Dense, GlobalAveragePooling2D
import tensorflow as tf
from keras.models import Model

input_size = 139
cifar_input = Input(shape=(32, 32, 3))

# Re-sizes the input with Kera's Lambda layer & attach to cifar_input
resized_input = Lambda(lambda x: tf.image.resize_images(x, (input_size, input_size)))(cifar_input)

# Feeds the re-sized input into Inception model
inp = inception(resized_input)

## Setting `include_top=False` removed both GlobalAveragePool and Dense layers
## Add it back here, and make sure to connect it to the end of Inception
x = GlobalAveragePooling2D()(inp)
x = Dense(512, activation = 'relu')(x)
predictions = Dense(10, activation = 'softmax')(x)

# NOW use the actual Model API to create the full model.
model = Model(inputs=cifar_input, outputs=predictions)

# Compile the model
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Check the summary of this new model to confirm the architecture
# Notice how this method of adding layers before InceptionV3 and appending to the end of it 
# made InceptionV3 condense down into one line in the summary
# If we use the Inception model's normal input (gather from inception.layers.input), it would 
# instead show all the layers like before.
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
lambda_2 (Lambda)            (None, 139, 139, 3)       0         
_________________________________________________________________
inception_v3 (Model)         (None, 3, 3, 2048)        21802784  
_________________________________________________________________
global_average_pooling2d_1 ( (None, 2048)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               1049088   
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5130      
Total params: 22,857,002
Trainable params: 1,054,218
Non-trainable params: 21,802,784
________________________________________________________

### Keras Callbacks

Keras callbacks allow you to gather and store additional information during training, such as the best model, or even stop training early if the validation accuracy has stopped improving. These methods can help to avoid overfitting, or avoid other issues.

There's two key callbacks to mention here, **ModelCheckpoint** and **EarlyStopping**. As the names may suggest, model checkpoint saves down the best model so far based on a given metric, while early stopping will end training before the specified number of epochs if the chosen metric no longer improves after a given amount of time.

**To set these callbacks:**

```python
checkpoint = ModelCheckpoint(filepath=save_path, monitor='val_loss', save_best_only=True)
```

This would save a model to a specified $save\_path$, based on validation loss, and only save down the best models.

If set *save_best_only* to False, every single epoch will save down another version of the model.

```python
stopper = EarlyStopping(monitor='val_acc', min_delta=0.0003, patience=5)
```

This will monitor validation accuracy, and if it has not decreased by more than 0.0003 from the previous best validation accuracy for 5 epochs, training will end early.

Feed these callbacks into $ fit() $ when training the model (along with all other relevant data to feed into fit):


```python
model.fit(callbacks=[checkpoint, stopper])
```

#### Check out Keras's ImageDataGenerator docs: https://faroit.com/keras-docs/2.0.9/preprocessing/image

- Can also add additional image augmentation through this function 
- Although we are skipping it from below code. Explore it in the upcoming project.

# GPU time

In [5]:
from sklearn.utils import shuffle
from sklearn.preprocessing import LabelBinarizer
from keras.datasets import cifar10

from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input

(X_train, y_train), (X_val, y_val) = cifar10.load_data()

label_binarizer = LabelBinarizer()
y_one_hot_train = label_binarizer.fit_transform(y_train)
y_one_hot_val = label_binarizer.fit_transform(y_val)

X_train, y_one_hot_train = shuffle(X_train, y_one_hot_train)
X_val, y_one_hot_val = shuffle(X_val, y_one_hot_val)

# Use the first 10,000 images for speed reasons
X_train = X_train[:10000]
y_one_hot_train = y_one_hot_train[:10000]
X_val = X_val[:2000]
y_one_hot_val = y_one_hot_val[:2000]

# Use a generator to pre-process images for ImageNet
if preprocess_flag == True:
    datagen     = ImageDataGenerator(preprocessing_function=preprocess_input)
    val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
else:
    datagen     = ImageDataGenerator()
    val_datagen = ImageDataGenerator()
    
batch_size = 128
epochs = 10
# Note: we aren't using callbacks here since we only are using 5 epochs to conserve GPU time
model.fit_generator(datagen.flow(X_train, y_one_hot_train, batch_size=batch_size),
                    steps_per_epoch=len(X_train)/batch_size, epochs=epochs, verbose=1, 
                    validation_data=val_datagen.flow(X_val, y_one_hot_val, batch_size=batch_size),
                    validation_steps=len(X_val)/batch_size)

  return f(*args, **kwds)
  return f(*args, **kwds)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fdd84848240>

## Comparison

#### Test without frozen weights, or by training from scratch.

- IF the majority of the model was frozen above, training speed is pretty quick. 
- TO check out the training speed, as well as final accuracy without "freeze the weights". 
    - Note that this can be fairly slow, so we're marking this as optional in order to conserve GPU time.
    - To see results, go back to the first code cell and set **freeze_flag=False**. 
    - To completely train from scratch without ImageNet pre-trained weights, set **weights_flag=None**
    Then, go to Kernel > Restart & Run All.

Training Mode | Val Acc @ 1 epoch | Val Acc @ 5 epoch | Time per epoch
---- | :----: | :----: | ----:
Frozen weights | 65.5% | 70.3% | 50 seconds
Unfrozen weights | 50.6% | 71.6% | 142 seconds
No pre-trained weights | 19.2% | 39.2% | 142 seconds

From the above, we can see that the pre-trained model with frozen weights actually began converging the fastest (already at 65.5% after 1 epoch), while the model re-training from the pre-trained weights slightly edged it out after 5 epochs.

However, this does not tell the whole story - the training accuracy was substantially higher, nearing 87% for the unfrozen weights model. It actually began overfit the data much more under this method. We would likely be able to counteract some of this issue by using data augmentation. On the flip side, the model using frozen weights could also have been improved by actually only freezing a portion of the weights; some of these are likely more specific to ImageNet classes as it gets later in the network, as opposed to the simpler features extracted early in the network.

### The Power of Transfer Learning
Comparing the last line to the other two really shows the power of transfer learning. After five epochs, a model without ImageNet pre-training had only achieved 39.2% accuracy, compared to over 70% for the other two. As such, pre-training the network has saved substantial time, especially given the additional training time needed when the weights are not frozen.

There is also evidence found in various research that pre-training on ImageNet weights will result in a higher overall accuracy than completely training from scratch, even when using a substantially different dataset.