# Neural network models

This notebook picks up after the `simple_models` notebook. After trying a range of classification algorithms, we'll try out some of the neural network models in [1]. These include fully-connected models of varying layer sizes, and finally convolutional models including the famous LeNet-5. 

Along the way, we'll be using Keras which is a library sitting on top of Theano or Tensorflow. This allows easy construction, training and evaluation of neural nets. Before we get started, here's a recap of the `simple_models` notebook models.

`[1]` - [Gradient-Based Learning Applied to Document Recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf),  LeCun et al, Nov 1998

In [1]:
# Load in dependencies, may not use all of these
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import pickle

%matplotlib inline

## Load pickle files

In [2]:
# Set up the file directory and names
DIR = '../input/'
X_TRAIN = DIR + 'train-images-idx3-ubyte.pkl'
Y_TRAIN = DIR + 'train-labels-idx1-ubyte.pkl'
X_TEST = DIR + 't10k-images-idx3-ubyte.pkl'
Y_TEST = DIR + 't10k-labels-idx1-ubyte.pkl'

print('Loading pickle files ...')
X_train = pickle.load( open( X_TRAIN, "rb" ) )
y_train = pickle.load( open( Y_TRAIN, "rb" ) )
X_test = pickle.load( open( X_TEST, "rb" ) )
y_test = pickle.load( open( Y_TEST, "rb" ) )

n_train = X_train.shape[0]
n_test = X_test.shape[0]

w = X_train.shape[1]
h = X_train.shape[2]

# Reshape the images so they're a single row in the numpy array
X_train_input = X_train.reshape((n_train, w * h))
X_test_input = X_test.reshape((n_test, w * h))
y_train_input = y_train.squeeze()
y_test_input = y_test.squeeze()

assert X_train.shape[0] == y_train.shape[0]
assert X_test.shape[0] == y_test.shape[0]

print('Loaded train images shape {}, labels shape {}'.format(X_train.shape, y_train.shape))
print('Loaded test images shape {}, labels shape {}'.format(X_test.shape, y_test.shape))

Loading pickle files ...
Loaded train images shape (60000, 28, 28), labels shape (60000, 1)
Loaded test images shape (10000, 28, 28), labels shape (10000, 1)


## Helper functions

Before trying a few different algorithms out, let's define a reusable set of functions to cross-validate and predict on the test set. Because scikit-learn has such a uniform interface, we can re-use these on pretty much any classification algorithm out there.

In [3]:
# todo !

## Data preparation

Before we start 

In [17]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer, StandardScaler

N_JOBS=-2 # Leave 1 core free for UI updates
VERBOSE=1 # 3 is the most verbose level
SEED = 1234 # Fix the seed for repeatability
MAX_ITER = 100 # L-BFGS may show warnings that it doesn't converge
N_FOLDS = 5 # How may folds to do k-folds cross validation for

# Create a stratified, shuffled subset of the training data if needed
N = n_train # How may training examples to use
if N < n_train:
    print('Reducing the X_train size from {} to {} examples'.format(n_train, N))
    X_train, _, y_train, _ = train_test_split(X_train_input, y_train_input, 
                                          train_size=N, random_state=SEED)
else:
    X_train = X_train_input
    y_train = y_train_input
    
X_test = X_test_input
y_test = y_test_input


# Need to convert the classes to one-hot encoding
print('Converting y variables to one-hot encoding')
lbe = LabelBinarizer()
lbe.fit(y_train_input)
y_train = lbe.transform(y_train_input)
y_test = lbe.transform(y_test_input)

print('Z-normalizing X')
std = StandardScaler()
X_train_float = X_train.astype(np.float32)
X_test_float = X_test.astype(np.float32)
std.fit(X_train.astype(np.float32))
X_train = std.transform(X_train_float)
X_test = std.transform(X_test_float)

print('Train images shape {}, labels shape {}'.format(X_train.shape, y_train.shape))
print('Test images shape {}, labels shape {}'.format(X_test.shape, y_test.shape))

Converting y variables to one-hot encoding
Z-normalizing X
Train images shape (60000, 784), labels shape (60000, 10)
Test images shape (10000, 784), labels shape (10000, 10)


## [1] C.5 - One hidden layer models

The paper continues with an evaluation of single hidden-layer models. We'll be training a selection of three models from the paper, whhich are listed below.  All use sigmoid activations.

* Model a - 28x28-300-10: 4.7% Error
* Model b - 20x20-300-10:  1.6% Error (images were reduced to 20x20 and centred in 28x28  background)
* Model c - 28x28-1000-10: 4.5% Error


In [19]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(300, input_dim=784))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(X_train, y_train,
          nb_epoch=20,
          batch_size=64,
          verbose=2)

score = model.evaluate(X_test, y_test, batch_size=16)

print(score)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20

KeyboardInterrupt: 

In [10]:
print(score)

[0.10804161474906322, 0.97470000000000001]
