Theano + Lasagne :: MNIST CNN
====================================

This is a quick illustration of a Convolutional Neural Network being trained on the MNIST data.

( Credit for initially creating this workbook : Eben Olson :: https://github.com/ebenolson/pydata2015 )


In [21]:
import numpy as np
import theano
import theano.tensor as T
import lasagne

import matplotlib.pyplot as plt
%matplotlib inline

import gzip
import pickle

import pandas as pd
from sklearn.cross_validation import train_test_split

In [22]:
# Seed for reproduciblity
np.random.seed(42)

### Get the MNIST data
Put it into useful subsets, and show some of it as a sanity check

In [23]:
data1 = pd.read_csv('./300_300_data/1.csv', sep=",")
data2 = pd.read_csv('./300_300_data/2.csv', sep=",")
data3 = pd.read_csv('./300_300_data/3.csv', sep=",")
data4 = pd.read_csv('./300_300_data/4.csv', sep=",")
data=data1.append(data2,ignore_index = True)
data=data.append(data3,ignore_index = True)
data=data.append(data4,ignore_index = True)

In [46]:
X_train, X_val, y_train, y_val = train_test_split(data[data.columns[0:90000]], data[data.columns[90000]], test_size=0.25, random_state=200)

In [47]:
X_train = X_train.as_matrix().astype(np.float)
X_val = X_val.as_matrix().astype(np.float)
y_train = y_train.as_matrix().astype(np.float)
y_val = y_val.as_matrix().astype(np.float)

In [48]:
def batch_gen(X, y, N):
    while True:
        idx = np.random.choice(len(y), N)
        yield X[idx].astype('float32'), y[idx].astype('int32')

### Create the Network
This is a Convolutional Neural Network (CNN), where each 'filter' in a given layer is produced by scanning a small (here 3x3) matrix over the whole of the previous layer (a convolution operation).  These filters can produce effects like : averaging, edge detection, etc.  

In [32]:
# We need to reshape from a 1D feature vector to a 1 channel 2D image.
# Then we apply 3 convolutional filters with 3x3 kernel size.
l_in = lasagne.layers.InputLayer((None, 90000))

l_shape = lasagne.layers.ReshapeLayer(l_in, (-1, 1, 300, 300))

l_conv = lasagne.layers.Conv2DLayer(l_shape, num_filters=1, filter_size=1, pad=1, stride=(1, 1))

l_out = lasagne.layers.DenseLayer(l_conv,
                                  num_units=2,
                                  nonlinearity=lasagne.nonlinearities.softmax)


###  Compile and train the network.
Accuracy is much better than the single layer network, despite the small number of filters.

In [33]:
X_sym = T.matrix()
y_sym = T.ivector()

output = lasagne.layers.get_output(l_out, X_sym)
pred = output.argmax(-1)

loss = T.mean(lasagne.objectives.categorical_crossentropy(output, y_sym))

acc = T.mean(T.eq(pred, y_sym))

params = lasagne.layers.get_all_params(l_out)
grad = T.grad(loss, params)
updates = lasagne.updates.adam(grad, params, learning_rate=0.005)

f_train = theano.function([X_sym, y_sym], [loss, acc], updates=updates)
f_val = theano.function([X_sym, y_sym], [loss, acc])
f_predict = theano.function([X_sym], pred)

BATCH_SIZE = 8
N_BATCHES = len(X_train) // BATCH_SIZE
N_VAL_BATCHES = len(X_val) // BATCH_SIZE

train_batches = batch_gen(X_train, y_train, BATCH_SIZE)
val_batches = batch_gen(X_val, y_val, BATCH_SIZE)

for epoch in range(20):
    train_loss = 0
    train_acc = 0
    for _ in range(N_BATCHES):
        X, y = next(train_batches)
        loss, acc = f_train(X, y)
        train_loss += loss
        train_acc += acc
    train_loss /= N_BATCHES
    train_acc /= N_BATCHES

    val_loss = 0
    val_acc = 0
    for _ in range(N_VAL_BATCHES):
        X, y = next(val_batches)
        loss, acc = f_val(X, y)
        val_loss += loss
        val_acc += acc
    val_loss /= N_VAL_BATCHES
    val_acc /= N_VAL_BATCHES
    
    print('Epoch {:2d}, Train loss {:.03f}     (validation : {:.03f}) ratio {:.03f}'.format(
            epoch, train_loss, val_loss, val_loss/train_loss))
    print('          Train accuracy {:.03f} (validation : {:.03f})'.format(train_acc, val_acc))
print("DONE")

Epoch  0, Train loss 5657.252     (validation : 1882.568) ratio 0.333
          Train accuracy 0.558 (validation : 0.500)
Epoch  1, Train loss 1418.408     (validation : 2460.667) ratio 1.735
          Train accuracy 0.567 (validation : 0.500)
Epoch  2, Train loss 1645.487     (validation : 2769.371) ratio 1.683
          Train accuracy 0.633 (validation : 0.475)
Epoch  3, Train loss 1592.316     (validation : 5850.717) ratio 3.674
          Train accuracy 0.683 (validation : 0.525)
Epoch  4, Train loss 3713.718     (validation : 5020.209) ratio 1.352
          Train accuracy 0.567 (validation : 0.550)
Epoch  5, Train loss 2167.960     (validation : 2495.761) ratio 1.151
          Train accuracy 0.667 (validation : 0.450)
Epoch  6, Train loss 428.090     (validation : 1502.483) ratio 3.510
          Train accuracy 0.800 (validation : 0.625)
Epoch  7, Train loss 339.264     (validation : 1226.909) ratio 3.616
          Train accuracy 0.808 (validation : 0.425)
Epoch  8, Train loss 331.3

### Look at the Output after the Convolutional Layer 
Since the convolutional layer only has 3 filters, we can map these to red, green and blue for easier visualisation.


In [49]:
filtered = lasagne.layers.get_output(l_conv, X_sym)
f_filter = theano.function([X_sym], filtered)

In [50]:
# Filter the first few training examples
im = f_filter(X_train[:2])
print(im.shape)

TypeError: ('Bad input argument to theano function with name "<ipython-input-49-f2c60edf478e>:2" at index 0 (0-based)', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 158.,  146.,  137., ...,  192.,  192.,  193.],
       [ 255.,  255.,  255., ...,  255.,  255.,  255.]]))

In [44]:
# Rearrange dimension so we can plot the result as RGB images
im = np.rollaxis(np.rollaxis(im, 3, 1), 3, 1)

NameError: name 'im' is not defined

We can see that each filter detected different features in the images, i.e. horizontal / diagonal / vertical segments


In [None]:
plt.figure(figsize=(16,8))
for i in range(10):
    plt.subplot(1, 10, i+1)
    plt.imshow(im[i], interpolation='nearest')
    plt.axis('off')