# Transfer Learning

In this module, you will understand what is transfer learning and how it works. You will implement transfer learning in 5 general steps using a variety of popular pre-trained CNN architectures, such as VGG-16 and ResNet-50. You will study the differences among those CNN architectures and see how the invention of each solves the problem of its predecessors. Last, but not least, as we are moving to working with deeper neural networks, you will also be equipped with regularization techniques to prevent overfitting of complex models and networks.

## Learning Objectives

Understand the concept of transfer learning

Describe modern CNN architectures such as ResNet-50 and VGG-16

Implement transfer learning using modern CNN architectures

Describe how different regularization techniques help prevent overfitting

## Introduction to Transfer Learning

### Motivation

Early layers in a Neural Network are the hardest (i.e. slowest) to train.
- Due to vanishing gradient property.
- But these "primitive" features should be general across many image classification tasks.

Later layers in the network are capturing features that are more particular
to the specific image classification problem.
- Later layers are easier (quicker) to train since adjusting their weights
has a more immediate impact on the final result.

Famous, Competition-Winning Models are difficult to train from scratch:
- Huge datasets (like ImageNet)
- Long number of training iterations
- Very heavy computing machinery
- Time experimenting to get hyper-parameters just right

However, the basic features (edges, shapes) learned in the early layers
of the network should generalize.
- Results of the training are just weights (numbers) that are easy to store.
- Idea: keep the early layers of a pre-trained network,
and re-train the later layers for a specific application
- This is called Transfer Learning.

![](./images/28_TransferLearning.png)


## Transfer Learning and Fine Tuning

### Transfer Learning Options

The additional training of a pre-trained network on a specific new dataset
is referred to as: "Fine-Tuning".
- There are different options on "how much" and "how far back" to fine-tune.
- Should I train just the very last layer?
- Go back a few layers?
- Re-train the entire network (from the starting point of the existing network)?

### Guiding Principles for Fine-Tuning

While there are no "hard and fast" rules, there are some guiding principles to keep in mind:
- The more similar your data and problem are to the source data
of the pre-trained network, the less fine-tuning is necessary.
- E.g. Using a network trained on ImageNet to distinguish "dogs" from "cats"
should need relatively little fine-tuning.
- It already distinguished different breeds of dogs and cats,
so likely has all the features you will need.

### Guiding Principles for Fine-Tuning
The more data you have about your specific problem,
the more the network will benefit from longer and deeper fine-tuning.
- E.g. If you have only 100 dogs and 100 cats in your training data,
you probably want to do very little fine-tuning.
- If you have 10,000 dogs and 10,000 cats you may get more value
from longer and deeper fine-tuning.

If your data is substantially different in nature than the data the source model was trained on,
Transfer Learning may be of little value.
- E.g. A network that was trained on recognizing typed Latin alphabet characters
would not be useful in distinguishing cats from dogs.
- But it likely would be useful as a starting point for recognizing Cyrillic Alphabet characters.



## Convolutional Neural Network Architectures - LeNet

## Convolutional Neural Network Architectures - AlexNet

## Convolutional Neural Network Architectures - Inception

## Convolutional Neural Network Architectures - ResNet

## Regularization Techniques for Deep Learning