## Deep Neural Networks
Deep neural networks can take weeks to train on complex datasets even on fast GPU servers.  The mnist example we are using is simple so it is possible to train quickly on local machines.

![file_name](https://cloud.githubusercontent.com/assets/17914936/20403127/2862931e-acc5-11e6-853c-02cac20c4ce1.png?style=centerme)

Often in order to be useful on complex data you will need many more layers than we are using.

## Transfer Learning

One solution is to use transfer learning which is becomming more popular as people make their weights and network architecture available to others for use.

![file_name](https://cloud.githubusercontent.com/assets/17914936/20403126/286226fe-acc5-11e6-9855-693183fab83e.png?style=centerme)

For transfer learning the network is doing forward propagation through all layers and then just updating weights through backward propagation in the unfrozen layers.  This is much much faster while still taking advantage of previous learning from the frozen layers. 
![file_name](https://github.com/JostineHo/neural-nets/blob/master/images/forward_back_prop.png?raw=true)

In [None]:
from __future__ import print_function
import numpy as np
import datetime
import os

# eveyone should stop and pip install h5py.  It is needed for weights loading.
import h5py

np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.models import model_from_json
from keras import backend as K

In [None]:
batch_size = 128
nb_classes = 5
nb_epoch = 5

architecture_path = 'weights/final_arch.json'
weights_path = 'weights/final_weights.h5'

now = datetime.datetime.now

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = 2
# convolution kernel size
kernel_size = 3

if K.image_dim_ordering() == 'th':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

Below we are loading the second half of the data our neural network hasn't seen yet.  

In [None]:
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# create a dataset with digits including 5 and above
X_train_gte5 = X_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5  # make classes start at 0 for
X_test_gte5 = X_test[y_test >= 5]         # np_utils.to_categorical
y_test_gte5 = y_test[y_test >= 5] - 5

In [None]:
def train_model(model, train, test, nb_classes):
    X_train = train[0].reshape((train[0].shape[0],) + input_shape)
    X_test = test[0].reshape((test[0].shape[0],) + input_shape)
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255
    print('X_train shape:', X_train.shape)
    print(X_train.shape[0], 'train samples')
    print(X_test.shape[0], 'test samples')

    # convert class vectors to binary class matrices
    Y_train = np_utils.to_categorical(train[1], nb_classes)
    Y_test = np_utils.to_categorical(test[1], nb_classes)

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    t = now()
    model.fit(X_train, Y_train,
              batch_size=batch_size, nb_epoch=nb_epoch,
              verbose=1,
              validation_data=(X_test, Y_test))
    print('Training time: %s' % (now() - t))
    score = model.evaluate(X_test, Y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

In [None]:
# define two groups of layers: feature (convolutions) and classification (dense)
feature_layers = [
    Convolution2D(nb_filters, kernel_size, kernel_size,
                  border_mode='valid',
                  input_shape=input_shape),
    Activation('relu'),
    Convolution2D(nb_filters, kernel_size, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=(pool_size, pool_size)),
    Dropout(0.25),
    Flatten(),
]
classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(nb_classes),
    Activation('softmax')
]

In [None]:
# create complete model
model = Sequential(feature_layers + classification_layers)
model.load_weights(weights_path)

In [None]:
# freeze feature layers and rebuild model
for l in feature_layers:
    l.trainable = False

In [None]:
# transfer: train dense layers for new classification task [5..9]
train_model(model,
            (X_train_gte5, y_train_gte5),
            (X_test_gte5, y_test_gte5), nb_classes)

All code was adapted from https://github.com/fchollet/keras/blob/master/examples/mnist_transfer_cnn.py