# Transfer Learning

---

Let’s load an Xception model, pretrained on ImageNet. We exclude the top of
the network by setting include_top=False. This excludes the global average pooling
layer and the dense output layer. We then add our own global average pooling layer
(feeding it the output of the base model), followed by a dense output layer with one
unit per class, using the softmax activation function. Finally, we wrap all this in a
Keras Model:
```py
base_model = tf.keras.applications.xception.Xception(weights="imagenet", include_top=False)
avg = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
output = tf.keras.layers.Dense(n_classes, activation="softmax")(avg)
model = tf.keras.Model(inputs=base_model.input, outputs=output)
```
As explained in Chapter 11, it’s usually a good idea to freeze the weights of the
pretrained layers, at least at the beginning of training:
```python
for layer in base_model.layers:
    layer.trainable = False
```
> Since our model uses the base model’s layers directly, rather than the base_model object itself, setting base_model.trainable=False would have no effect.

Finally, we can compile the model and start training:
```python
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
history = model.fit(train_set, validation_data=valid_set, epochs=3)
```
> If you are running in Colab, make sure the runtime is using a GPU: select Runtime → “Change runtime type”, choose “GPU” in the “Hardware accelerator” drop-down menu, then click Save. It’s possible to train the model without a GPU, but it will be terribly slow (minutes per epoch, as opposed to seconds).

---

After training the model for a few epochs, its validation accuracy should reach a bit
over 80% and then stop improving. This means that the top layers are now pretty
well trained, and we are ready to unfreeze some of the base model’s top layers, then
continue training. For example, let’s unfreeze layers 56 and above (that’s the start of
residual unit 7 out of 14, as you can see if you list the layer names):
```python
for layer in base_model.layers[56:]:
    layer.trainable = True
```
Don’t forget to compile the model whenever you freeze or unfreeze layers. Also make
sure to use a much lower learning rate to avoid damaging the pretrained weights:
```python
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
history = model.fit(train_set, validation_data=valid_set, epochs=10)
```
This model should reach around 92% accuracy on the test set, in just a few minutes of training (with a GPU). If you tune the hyperparameters, lower the learning rate, and train for quite a bit longer, you should be able to reach 95% to 97%. With that, you can start training amazing image classifiers on your own images and classes!

---