<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Deep-Learning-with-Raw-Pixel" data-toc-modified-id="Deep-Learning-with-Raw-Pixel-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Deep Learning with Raw Pixel</a></span><ul class="toc-item"><li><span><a href="#A-Simple-Network" data-toc-modified-id="A-Simple-Network-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>A Simple Network</a></span></li><li><span><a href="#A-More-Complex-Network" data-toc-modified-id="A-More-Complex-Network-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>A More Complex Network</a></span><ul class="toc-item"><li><span><a href="#Adding-More-Nodes" data-toc-modified-id="Adding-More-Nodes-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Adding More Nodes</a></span></li><li><span><a href="#Adding-More-Layers" data-toc-modified-id="Adding-More-Layers-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Adding More Layers</a></span></li><li><span><a href="#Adding-More-Nodes-and-More-Layers-to-the-Network" data-toc-modified-id="Adding-More-Nodes-and-More-Layers-to-the-Network-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Adding More Nodes and More Layers to the Network</a></span></li></ul></li><li><span><a href="#Discussion" data-toc-modified-id="Discussion-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Discussion</a></span></li></ul></li></ul></div>

# Deep Learning with Raw Pixel
This chapter contains the implementation and explaination of the neural network for the digit hand writing recognition. 

## A Simple Network 
First, we begin with import the libraries as well as training and testing data. 

In [1]:
import tensorflow as tf

In [2]:
import pandas as pd
from autograd import numpy as np

In [3]:
df_train = pd.read_csv("./traindata.csv", dtype=np.uint8)

In [4]:
df_train.head(3)

Unnamed: 0,id,0,1,2,3,4,5,6,7,8,...,775,776,777,778,779,780,781,782,783,label
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,5
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,4


In [5]:
df_train.shape

(60000, 786)

In [6]:
df_imgs = df_train.drop(['label', 'id'], axis=1)
x_original = df_imgs.as_matrix().T

print("x_original shape: ", x_original.shape)

x_original shape:  (784, 60000)


  


In [7]:
y_original = df_train['label'].as_matrix()

print("y_original shape: ", y_original.shape)

y_original shape:  (60000,)


  """Entry point for launching an IPython kernel.


In [8]:
num_sample = 5000
inds = np.random.permutation(y_original.shape[0])[:num_sample]
x_sample = x_original[:,inds].T
y_sample = y_original[inds]

In [9]:
print("x_sample shape: ", x_sample.shape)
print("y_sample shape: ", y_sample.shape)

x_sample shape:  (5000, 784)
y_sample shape:  (5000,)


In [10]:
x = x_original.T
y = y_original

print("x shape: ", x.shape)
print("y shape: ", y.shape)

x shape:  (60000, 784)
y shape:  (60000,)


We scale the input of the algorithm by dividing the input with its maximum value. 

In [11]:
xmax = np.max(x)
x_scale = x/xmax

In [12]:
input_dim = x_scale.shape[1]
nb_classes = 10

print("input_dim: ", input_dim)
print("nb_classes: ", nb_classes)

input_dim:  784
nb_classes:  10


We start with a simple network containing the 3 layers: 784 input nodes, a 512-node hidden layer, and the 10-node output. We also add the 0.2 dropout to avoid overfitting. 

In [13]:
model = tf.keras.models.Sequential()
# model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, 
                                input_dim=input_dim, 
                                activation=tf.nn.relu))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

In [14]:
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
keep_dims is deprecated, use keepdims instead


Train the network with the input and its label for 5 times. 

In [15]:
model.fit(x_scale, y, epochs=10, verbose=2)

Epoch 1/10
 - 21s - loss: 0.2191 - acc: 0.9351
Epoch 2/10
 - 19s - loss: 0.0970 - acc: 0.9702
Epoch 3/10
 - 20s - loss: 0.0703 - acc: 0.9785
Epoch 4/10
 - 19s - loss: 0.0530 - acc: 0.9837
Epoch 5/10
 - 19s - loss: 0.0423 - acc: 0.9861
Epoch 6/10
 - 20s - loss: 0.0373 - acc: 0.9876
Epoch 7/10
 - 19s - loss: 0.0303 - acc: 0.9903
Epoch 8/10
 - 20s - loss: 0.0287 - acc: 0.9905
Epoch 9/10
 - 21s - loss: 0.0241 - acc: 0.9920
Epoch 10/10
 - 21s - loss: 0.0227 - acc: 0.9923


<tensorflow.python.keras._impl.keras.callbacks.History at 0x7f728dd17ef0>

Read the test data, and process it. Next, we will evaluate our model with the test data.

In [16]:
df_testdata = pd.read_csv("./testdata.csv", dtype=np.uint8)

df_testdata.head(3)

Unnamed: 0,id,0,1,2,3,4,5,6,7,8,...,775,776,777,778,779,780,781,782,783,label
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,7
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
2,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [17]:
df_testimgs = df_testdata.drop(['label', 'id'], axis=1)
xtest = df_testimgs.as_matrix()

print("xtest shape:", xtest.shape)

xtest shape: (10000, 784)


  


In [18]:
ytest = df_testdata['label'].as_matrix()
# ytest = ytest.reshape((10000, 1)).T
print("ytest shape:", ytest.shape)

ytest shape: (10000,)


  """Entry point for launching an IPython kernel.


In [19]:
xtest_scale = xtest/xmax

In [20]:
model.evaluate(xtest_scale, ytest)




[0.076664467417354901, 0.98129999999999995]

As you can see from the output of the evalute method, the accuracy of the simple model is 98 percent. In this case, it is good enough to adopt this model as a classifier. 

## A More Complex Network
Although the accuracy of the model is pretty good, we still would like to extend this model if we can achieve the perfect accuracy. The extension of this model is to add more nodes or more layers in the hidden layer to the network until it almost overfit and then we start to prune it or use different technique to mitigate the overfitting problem.


### Adding More Nodes
In this section, we try to increase more nodes in the hidden layer.

In [23]:
model = tf.keras.models.Sequential()
# model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(input_dim, 
                                input_dim=input_dim, 
                                activation=tf.nn.relu))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

Once there are more nodes in the network, it also requires more times (epochs) to train so as to enhance the accuracy of the model. The number of epochs is choosen by trial and error.

In [24]:
model.fit(x_scale, y, epochs=10, verbose=2)

Epoch 1/10
 - 31s - loss: 0.2063 - acc: 0.9394
Epoch 2/10
 - 29s - loss: 0.0904 - acc: 0.9718
Epoch 3/10
 - 32s - loss: 0.0652 - acc: 0.9799
Epoch 4/10
 - 29s - loss: 0.0500 - acc: 0.9841
Epoch 5/10
 - 29s - loss: 0.0397 - acc: 0.9875
Epoch 6/10
 - 32s - loss: 0.0339 - acc: 0.9888
Epoch 7/10
 - 28s - loss: 0.0294 - acc: 0.9898
Epoch 8/10
 - 32s - loss: 0.0250 - acc: 0.9920
Epoch 9/10
 - 30s - loss: 0.0248 - acc: 0.9918
Epoch 10/10
 - 28s - loss: 0.0209 - acc: 0.9935


<tensorflow.python.keras._impl.keras.callbacks.History at 0x7f7268043160>

In [25]:
model.evaluate(xtest_scale, ytest)




[0.072015329911428852, 0.98340000000000005]

The accuracy of the classfication is increased insignificantly by enhancing the number of nodes in the network. 

### Adding More Layers
In this section, we will try to add more layers to the network in order to improve accuracy. The following example shows a network with one more hidden layer.

In [26]:
model = tf.keras.models.Sequential()
# model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, 
                                input_dim=input_dim, 
                                activation=tf.nn.relu))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(512,
                                activation=tf.nn.relu))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(nb_classes, activation=tf.nn.softmax))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

In [27]:
model.fit(x_scale, y, epochs=10, verbose=2)

Epoch 1/10
 - 32s - loss: 0.2127 - acc: 0.9352
Epoch 2/10
 - 37s - loss: 0.1057 - acc: 0.9675
Epoch 3/10
 - 35s - loss: 0.0836 - acc: 0.9744
Epoch 4/10
 - 34s - loss: 0.0683 - acc: 0.9785
Epoch 5/10
 - 35s - loss: 0.0581 - acc: 0.9812
Epoch 6/10
 - 34s - loss: 0.0518 - acc: 0.9842
Epoch 7/10
 - 35s - loss: 0.0462 - acc: 0.9853
Epoch 8/10
 - 36s - loss: 0.0434 - acc: 0.9870
Epoch 9/10
 - 36s - loss: 0.0394 - acc: 0.9875
Epoch 10/10
 - 35s - loss: 0.0358 - acc: 0.9886


<tensorflow.python.keras._impl.keras.callbacks.History at 0x7f71a75e0470>

In [28]:
model.evaluate(xtest_scale, ytest)




[0.084818751688119295, 0.98260000000000003]

Even though we add one more layer to the network, it does not improve the accuracy significantly. 

### Adding More Nodes and More Layers to the Network 
In this section, we add more layers and more nodes to the network.

In [29]:
model = tf.keras.models.Sequential()
# model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(input_dim, 
                                input_dim=input_dim, 
                                activation=tf.nn.relu))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(input_dim,
                                activation=tf.nn.relu))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(nb_classes, activation=tf.nn.softmax))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

In [30]:
model.fit(x_scale, y, epochs=10, verbose=2)

Epoch 1/10
 - 53s - loss: 0.2055 - acc: 0.9377
Epoch 2/10
 - 54s - loss: 0.1003 - acc: 0.9694
Epoch 3/10
 - 53s - loss: 0.0823 - acc: 0.9745
Epoch 4/10
 - 62s - loss: 0.0686 - acc: 0.9791
Epoch 5/10
 - 60s - loss: 0.0614 - acc: 0.9821
Epoch 6/10
 - 56s - loss: 0.0514 - acc: 0.9842
Epoch 7/10
 - 59s - loss: 0.0495 - acc: 0.9847
Epoch 8/10
 - 55s - loss: 0.0446 - acc: 0.9868
Epoch 9/10
 - 54s - loss: 0.0427 - acc: 0.9875
Epoch 10/10
 - 55s - loss: 0.0421 - acc: 0.9880


<tensorflow.python.keras._impl.keras.callbacks.History at 0x7f71a4ff5f28>

In [31]:
model.evaluate(xtest_scale, ytest)




[0.082450668865201443, 0.97860000000000003]

Adding more layers does not improve accuracy as we expected if we compare the accuracy with the simple model.

## Discussion
We have a simple model with one hidden layer, and more complex models by increasing the number of nodes in the hidden layer or adding a hidden layer. All three networks have the roughtly same accuracy (98%) as the simple network. This leads to questions that what is the proper structure of the network, the optimizer for the neuron network, and the loss function to opt for so as to achieve the optimal accuracy.