
## Reproducibility
This experiment is a part of a reproducibility experiment, where different software platforms are tested, in order to investigatehow well they can support reproducibility. The experiment itself is presented to display how the different requirements can be represented, which is discussed in a seperate document. We have also put up a kernel with a script file which serves the same purpose here: https://www.kaggle.com/zoukon/keras2/code. 

## Overview
The purpose of this experiment to categorize the MNIST dataset by using a convolutional neural network (CNN). MNIST is a set of 70,000 labeled images 
of handwritten numbers, centered over 28x28 pixels. The purpose of the experiment is to train the neural network to recognize handwritten images 
on this form, and verify the accuracy through the test set. We expect that the network should be able to correctly assess more than 98% 
of the test set after training, similar to results we have gotten while running the experiment locally.
In order to make the CNN, I will be using Tensorflow and Keras, and basing myself on the example presented in 
the tensorflow guide for CNNs. https://www.tensorflow.org/tutorials/estimators/cnn as well as the CNN example from 
https://towardsdatascience.com/a-simple-2d-cnn-for-mnist-digit-recognition-a998dbc1e79a .


## CNN
A CNN is a type of deep neural network commonly used in image recognition. Very little preprocessing is usually used compared to other image
classification algorithms. The network typically consists of multiple hidden layers of three different types, each having a different purpose.

#### Convolutional layer: 
This layer applies convolution to the input. 
For each subregion, the layer performs a set of mathematical operations to produce a single value in the output feature map. 
Convolutional layers then typically apply a ReLU activation function to the output to introduce nonlinearities into the model.
#### Pooling layer:
The pooling layers combine the output of neuron clusters at one layer into a single neuron in the next layer. This is handled by a pooling
algorithm such as max, min or average pooling. The main purpose of this layer is to reduce processing time by discarding a set of the values. 
There is usually a pooling layer behind every convolutional layer in the network. 
#### Dense layer:
The dense layers or fully connected layer (FC) perform classification 
on the feature results from extraction and downsampling by the previous layers,
by connecting every neuron in one layer to every neutron in another layer. The principle of this 
layer is similar to a traditional multi-layer perceptron neural network. The dense layers are typically at the end of the network.

## Preprocessing
CNNs traditionally use very little preprocessing of the data. 
In this case we simply import the data as a numpy.ndarray,
normalize the values, and reshape the matrix to pass it into the network. 
We also convert the class vectors to binary class matrices for the classification.

In [None]:
import keras
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, MaxPooling2D, Conv2D, Flatten

In [None]:
batch_size = 128
num_epoch = 5

a = np.load("../input/mnist.npz")
X_test = a['x_test']
y_test = a['y_test']

X_train = a['x_train']
y_train = a['y_train']

X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)

In [None]:
#more reshaping
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape) #X_train shape: (60000, 28, 28, 1)

#set number of categories
num_category = 10
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_category)
y_test = keras.utils.to_categorical(y_test, num_category)

## Method
The first thing we do is to download the data and prepare it, so that we can pass it into the network. The full Mnist dataset is
available through in the keras datasets. In the notebook in the Keras-tensorflow workspace, we downloaded the 

This implementation is built using 2 convolutional layers both followed by a pooling layer. after the convolutions we performed
dropout to improve the models convergence. We then flatten the data to pass it on to the dense layers. Here we have 2 layers,
with a dropout inbetween. The last dense layer uses softmax as its loss function, while all previous layers use rectified linear regression as their 
activation function. 

This leaves us with the following structure of the network: 

`Conv(relu) -> Pool(Max) -> Conv(relu) -> Pool(Max) -> Dropout(0.25) -> Dense(relu) -> Flatten -> Dropout(0.5) -> Dense(softmax)`

For the first convolution, I chose an output space of 32 output, kernel size of 3x3, stride of 1x1 and no padding. 
The second convolution is identical, except it has an output space of 64. 
Pooling layers are both identical, and use MaxPooling with a pool size of 2x2, stride of 2x2 and no padding.

Rough overview of the code layout: 
```
Import Mnist dataset
Reshape arrays
convert class vectors to binary class matrices
Initialize model
Conv(relu) -> Pool(Max) -> Conv(relu) -> Pool(Max) -> 
Dropout(0.25) -> Dense(relu) -> Flatten -> Dropout(0.5) -> Dense(softmax)
Compile model
Evaluate model
Save results to file
Save model as HDF5
```


In [None]:
##model building
model = Sequential()
#convolutional layer with rectified linear unit activation
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
#32 convolution filters used each of size 3x3
#choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
#again
model.add(Conv2D(64, (3, 3), activation='relu'))
#64 convolution filters used each of size 3x3
#choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
#randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))
#flatten since too many dimensions, we only want a classification output
model.add(Flatten())
#fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
#one more dropout for convergence' sake  
model.add(Dropout(0.5))
#output a softmax to squash the matrix into output probabilities
model.add(Dense(num_category, activation='softmax'))
#We use adam as our optimizer
#categorical ce since we have multiple classes (10) 
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer="adam",
              metrics=['accuracy'])

#model training
model_log = model.fit(X_train, y_train, batch_size=batch_size, epochs=num_epoch,
          verbose=1, validation_data=(X_test, y_test))

#Print scores
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0]) #Test loss: 0.0296396646054
print('Test accuracy:', score[1]) #Test accuracy: 0.9904



In [None]:
#Save the classifications to a .csv file called results.csv
Y_predicted = model.predict(X_test)
pred_label = Y_predicted.argmax(axis = 1)
image_id = range(1,len(Y_predicted)+1)
df = {'ImageId':image_id,'Label':pred_label}
df = pd.DataFrame(df)
df.to_csv('results.csv',index = False)

#Save the model as a HDF5 file called model.h5
model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'