# Neural Networks

In this study, we are interested in evaluating the performance of a Neural Network using different parameters (the number of layers and the number of nodes per layer). In particular, the MNIST data set is used.

Four models will be tested and confronted.

## Import packages

In [2]:
# ensure common functions across Python 2 and 3
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from sklearn.preprocessing import StandardScaler

import time
#import tensorflow as tf
import tensorflow.compat.v1 as tf

import pandas as pd
import numpy as np

tf.disable_v2_behavior() 
RANDOM_SEED=1234


Instructions for updating:
non-resource variables are not supported in the long term


## Data preparation for the model

The MNIST data set is loaded from the file train.csv and test.csv provided. The data is scaled using a StandardScaler. A validation set composed of 4.000 entries is created starting from the training set.

In [3]:
# read data from MINST
# creating data frame 
mnist_tr_df = pd.read_csv('train.csv')
mnist_ts_df = pd.read_csv('test.csv')

# check the pandas DataFrame object MNIST
print('\n training DataFrame (first five rows):')
print(mnist_tr_df.head())
print('\n test DataFrame (first five rows):')
print(mnist_ts_df.head())

# basic info of the datframe
print('\nGeneral description of the training MNIST DataFrame:')
print(mnist_tr_df.info())

print('\nGeneral description of the test MNIST DataFrame:')
print(mnist_ts_df.info())

# basic info of the datframe
print('\nGeneral description of the training MNIST DataFrame:')
print(mnist_tr_df.describe())

print('\nGeneral description of the test MNIST DataFrame:')
print(mnist_ts_df.describe())


 training DataFrame (first five rows):
   label  pixel0  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  \
0      1       0       0       0       0       0       0       0       0   
1      0       0       0       0       0       0       0       0       0   
2      1       0       0       0       0       0       0       0       0   
3      4       0       0       0       0       0       0       0       0   
4      0       0       0       0       0       0       0       0       0   

   pixel8  ...  pixel774  pixel775  pixel776  pixel777  pixel778  pixel779  \
0       0  ...         0         0         0         0         0         0   
1       0  ...         0         0         0         0         0         0   
2       0  ...         0         0         0         0         0         0   
3       0  ...         0         0         0         0         0         0   
4       0  ...         0         0         0         0         0         0   

   pixel780  pixel781  pixel782  p

In [4]:
scaler = StandardScaler()

y_train_tot = mnist_tr_df.loc[:, 'label']
x_train = mnist_tr_df.loc[:,'pixel0':'pixel783']

x_test = mnist_ts_df.loc[:,'pixel0':'pixel783']

X_train_scl = scaler.fit_transform(x_train).astype(np.float32)
X_test = scaler.fit_transform(x_test).astype(np.float32)

X_train = X_train_scl[:38000]
X_val =  X_train_scl[-4000:]    

y_train = y_train_tot[:38000]
y_val =  y_train_tot[-4000:]   

print('\nX_train object:', type(X_train), X_train.shape)    
print('\ny_train object:', type(y_train),  y_train.shape)  
print('\nX_validation object:', type(X_val),  X_val.shape)  
print('\ny_validation object:', type(y_val),  y_val.shape)  
print('\nX_test object:', type(X_test),  X_test.shape)


X_train object: <class 'numpy.ndarray'> (38000, 784)

y_train object: <class 'pandas.core.series.Series'> (38000,)

X_validation object: <class 'numpy.ndarray'> (4000, 784)

y_validation object: <class 'pandas.core.series.Series'> (4000,)

X_test object: <class 'numpy.ndarray'> (28000, 784)


In [5]:
# Initialize metrics

metrics = {}

# Initialize metric names
names = ['Number of Hidden Layers', 'Nodes per Layer', 'Time in Seconds',
         'Training Set Accuracy', 'Validation Set Accuracy']

# Set fixed parameters for models
n_epochs = 20
batch_size = 50
learning_rate = 0.01

# Function that creates batch generator used in training
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

## Model 1

In this model, we will use 2 hidden layers composed of 100 nodes each. The accuracy of the training and validation set are displayed, along with the total execution time. The resulting prediction is stored in a CSV file to be uploaded in Kaggle for testing (score: 0.95928, UserId: Vittorio Pepe).

In [6]:
# Model 1: 2 Hidden Layers with 100 Nodes per Layer

print('\nModel 1')

# Start timer
start = time.clock()

n_hidden = 100

# Reset the session
#tf.disable_v2_behavior() 
tf.reset_default_graph()
tf.set_random_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Set X and y placeholders
X = tf.placeholder(tf.float32, shape=(None, 784), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden, name="hidden1",
                              activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden, name="hidden2",
                              activation=tf.nn.relu)
    logits = tf.layers.dense(hidden2, 10, name="outputs")
    y_proba = tf.nn.softmax(logits)

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_train, y: y_train})
        acc_val = accuracy.eval(feed_dict={X: X_val, y: y_val})
        print(epoch, "Train accuracy:", acc_train, "Val accuracy:", acc_val)
    save_path = tf.train.Saver().save(sess, "./my_model_final.ckpt")
    
#        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})

# Record the clock time it takes
duration = time.clock() - start
print('DURATION :', duration)
metrics['Model 1'] = [2, n_hidden, duration, acc_train, acc_val]


Model 1
Instructions for updating:
Use keras.layers.Dense instead.
Instructions for updating:
Please use `layer.__call__` method instead.
0 Train accuracy: 0.90544736 Val accuracy: 0.907
1 Train accuracy: 0.9317632 Val accuracy: 0.9245
2 Train accuracy: 0.94486845 Val accuracy: 0.9335
3 Train accuracy: 0.95321053 Val accuracy: 0.93925
4 Train accuracy: 0.9607895 Val accuracy: 0.94025
5 Train accuracy: 0.96507895 Val accuracy: 0.9455
6 Train accuracy: 0.96878946 Val accuracy: 0.94675
7 Train accuracy: 0.97239476 Val accuracy: 0.94825
8 Train accuracy: 0.974421 Val accuracy: 0.9525
9 Train accuracy: 0.9773421 Val accuracy: 0.952
10 Train accuracy: 0.9797895 Val accuracy: 0.95325
11 Train accuracy: 0.98128945 Val accuracy: 0.954
12 Train accuracy: 0.9829737 Val accuracy: 0.9545
13 Train accuracy: 0.9847105 Val accuracy: 0.956
14 Train accuracy: 0.98605263 Val accuracy: 0.958
15 Train accuracy: 0.98739475 Val accuracy: 0.95625
16 Train accuracy: 0.98839474 Val accuracy: 0.96
17 Train accu

The prediction are calculated for this model and saved in csv file for submission in Kaggle.

In [7]:
with tf.Session() as sess:
    tf.train.Saver().restore(sess, "./my_model_final.ckpt")
    Z = logits.eval(feed_dict={X: X_test})
    y_pred = np.argmax(Z, axis=1)
  
df = pd.DataFrame(y_pred, columns=['Label'])
df.index += 1 
df.to_csv('Subm_mod_1.csv', index_label='ImageId')

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


# Model 2

In this model, we will use 2 hidden layers composed of 200 nodes each.  The accuracy of the training and validation set are displayed, along with the total execution time. The resulting prediction is stored in a CSV file to be uploaded in Kaggle for testing (score: 0.96157, UserId: Vittorio Pepe).

In [8]:
# Model 2: 2 Hidden Layers with 100 Nodes per Layer

print('\nModel 2')

# Start timer
start = time.clock()

n_hidden = 200

# Reset the session
tf.reset_default_graph()
tf.set_random_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Set X and y placeholders
X = tf.placeholder(tf.float32, shape=(None, 784), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden, name="hidden1",
                              activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden, name="hidden2",
                              activation=tf.nn.relu)
    logits = tf.layers.dense(hidden2, 10, name="outputs")
    y_proba = tf.nn.softmax(logits)

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_train, y: y_train})
        acc_val = accuracy.eval(feed_dict={X: X_val, y: y_val})
        print(epoch, "Train accuracy:", acc_train, "Val accuracy:", acc_val)
    save_path = tf.train.Saver().save(sess, "./my_model_final.ckpt")
    
#        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})

# Record the clock time it takes
duration = time.clock() - start
print('DURATION :', duration)
metrics['Model 2'] = [2, n_hidden, duration, acc_train, acc_val]


Model 2
0 Train accuracy: 0.91492105 Val accuracy: 0.91075
1 Train accuracy: 0.93747365 Val accuracy: 0.93
2 Train accuracy: 0.9495 Val accuracy: 0.93475
3 Train accuracy: 0.9578684 Val accuracy: 0.94025
4 Train accuracy: 0.9641316 Val accuracy: 0.94475
5 Train accuracy: 0.9695263 Val accuracy: 0.94625
6 Train accuracy: 0.9725 Val accuracy: 0.95125
7 Train accuracy: 0.9755526 Val accuracy: 0.9505
8 Train accuracy: 0.9785789 Val accuracy: 0.953
9 Train accuracy: 0.9811842 Val accuracy: 0.95325
10 Train accuracy: 0.983 Val accuracy: 0.95625
11 Train accuracy: 0.9848158 Val accuracy: 0.956
12 Train accuracy: 0.98626316 Val accuracy: 0.95525
13 Train accuracy: 0.9876842 Val accuracy: 0.95725
14 Train accuracy: 0.9888684 Val accuracy: 0.95825
15 Train accuracy: 0.99021053 Val accuracy: 0.95825
16 Train accuracy: 0.9915263 Val accuracy: 0.95925
17 Train accuracy: 0.99226314 Val accuracy: 0.959
18 Train accuracy: 0.9930263 Val accuracy: 0.95975
19 Train accuracy: 0.9937895 Val accuracy: 0.95

The prediction are calculated for this model and saved in csv file for submission in Kaggle.

In [9]:
with tf.Session() as sess:
    tf.train.Saver().restore(sess, "./my_model_final.ckpt")
    Z = logits.eval(feed_dict={X: X_test})
    y_pred = np.argmax(Z, axis=1)
    
df = pd.DataFrame(y_pred, columns=['Label'])
df.index += 1 
df.to_csv('Subm_mod_2.csv', index_label='ImageId')

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


## Model 3

In this model, we will use 6 hidden layers composed of 100 nodes each. The accuracy of the training and validation set are displayed, along with the total execution time. The resulting prediction is stored in a CSV file to be uploaded in Kaggle for testing (score: 0.95600, UserId: Vittorio Pepe).

In [10]:
# Model 3: 3 Hidden Layers with 300 Nodes per Layer
print('\nModel 3')

# Start timer
start = time.clock()

n_hidden = 100

# Reset the session
tf.reset_default_graph()
tf.set_random_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Set X and y placeholders
X = tf.placeholder(tf.float32, shape=(None, 784), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden, name="hidden1",
                              activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden, name="hidden2",
                              activation=tf.nn.relu)
    hidden3 = tf.layers.dense(hidden2, n_hidden, name="hidden3",
                              activation=tf.nn.relu)
    hidden4 = tf.layers.dense(hidden3, n_hidden, name="hidden4",
                              activation=tf.nn.relu)
    hidden5 = tf.layers.dense(hidden4, n_hidden, name="hidden5",
                              activation=tf.nn.relu)
    hidden6 = tf.layers.dense(hidden5, n_hidden, name="hidden6",
                              activation=tf.nn.relu)    
    logits = tf.layers.dense(hidden6, 10, name="outputs")
    y_proba = tf.nn.softmax(logits)

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_train, y: y_train})
        acc_val = accuracy.eval(feed_dict={X: X_val, y: y_val})
        print(epoch, "Train accuracy:", acc_train, "Val accuracy:", acc_val)
    save_path = tf.train.Saver().save(sess, "./my_model_final.ckpt")
    
#        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})

# Record the clock time it takes
duration = time.clock() - start
print('DURATION :', duration)
metrics['Model 3'] = [6, n_hidden, duration, acc_train, acc_val]


Model 3
0 Train accuracy: 0.8893421 Val accuracy: 0.88375
1 Train accuracy: 0.9158684 Val accuracy: 0.91275
2 Train accuracy: 0.9477105 Val accuracy: 0.934
3 Train accuracy: 0.95721054 Val accuracy: 0.9405
4 Train accuracy: 0.96771055 Val accuracy: 0.94525
5 Train accuracy: 0.975 Val accuracy: 0.951
6 Train accuracy: 0.9793158 Val accuracy: 0.95175
7 Train accuracy: 0.9805526 Val accuracy: 0.94775
8 Train accuracy: 0.9853947 Val accuracy: 0.95375
9 Train accuracy: 0.98844737 Val accuracy: 0.95225
10 Train accuracy: 0.9907105 Val accuracy: 0.95475
11 Train accuracy: 0.9911579 Val accuracy: 0.9525
12 Train accuracy: 0.99352634 Val accuracy: 0.95425
13 Train accuracy: 0.9951579 Val accuracy: 0.95425
14 Train accuracy: 0.9942368 Val accuracy: 0.954
15 Train accuracy: 0.99739474 Val accuracy: 0.95625
16 Train accuracy: 0.99789476 Val accuracy: 0.95475
17 Train accuracy: 0.9980263 Val accuracy: 0.95425
18 Train accuracy: 0.999 Val accuracy: 0.9535
19 Train accuracy: 0.9990789 Val accuracy: 

In [11]:
with tf.Session() as sess:
    tf.train.Saver().restore(sess, "./my_model_final.ckpt")
    Z = logits.eval(feed_dict={X: X_test})
    y_pred = np.argmax(Z, axis=1)

df = pd.DataFrame(y_pred, columns=['Label'])
df.index += 1 
df.to_csv('Subm_mod_3.csv', index_label='ImageId')

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


# Model 4

In this model, we will use 6 hidden layers composed of 200 nodes each. The accuracy of the training and validation set are displayed, along with the total execution time. The resulting prediction is stored in a CSV file to be uploaded in Kaggle for testing (score: 0.95971, UserId: Vittorio Pepe).

In [12]:
# Model 4: 3 Hidden Layers with 200 Nodes per Layer
print('\nModel 4')

# Start timer
start = time.clock()

n_hidden = 200

# Reset the session
tf.reset_default_graph()
tf.set_random_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Set X and y placeholders
X = tf.placeholder(tf.float32, shape=(None, 784), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden, name="hidden1",
                              activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden, name="hidden2",
                              activation=tf.nn.relu)
    hidden3 = tf.layers.dense(hidden2, n_hidden, name="hidden3",
                              activation=tf.nn.relu)
    hidden4 = tf.layers.dense(hidden3, n_hidden, name="hidden4",
                              activation=tf.nn.relu)
    hidden5 = tf.layers.dense(hidden4, n_hidden, name="hidden5",
                              activation=tf.nn.relu)
    hidden6 = tf.layers.dense(hidden5, n_hidden, name="hidden6",
                              activation=tf.nn.relu)    
    logits = tf.layers.dense(hidden6, 10, name="outputs")
    y_proba = tf.nn.softmax(logits)

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_train, y: y_train})
        acc_val = accuracy.eval(feed_dict={X: X_val, y: y_val})
        print(epoch, "Train accuracy:", acc_train, "Val accuracy:", acc_val)
    save_path = tf.train.Saver().save(sess, "./my_model_final.ckpt")
    
#        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})

# Record the clock time it takes
duration = time.clock() - start
print('DURATION :', duration)
metrics['Model 4'] = [6, n_hidden, duration, acc_train, acc_val]




Model 4
0 Train accuracy: 0.9123684 Val accuracy: 0.90075
1 Train accuracy: 0.94239473 Val accuracy: 0.92625
2 Train accuracy: 0.95721054 Val accuracy: 0.9405
3 Train accuracy: 0.9673947 Val accuracy: 0.94825
4 Train accuracy: 0.97673684 Val accuracy: 0.953
5 Train accuracy: 0.9817632 Val accuracy: 0.95425
6 Train accuracy: 0.9847105 Val accuracy: 0.95675
7 Train accuracy: 0.9855263 Val accuracy: 0.95625
8 Train accuracy: 0.9919737 Val accuracy: 0.95825
9 Train accuracy: 0.99339473 Val accuracy: 0.95775
10 Train accuracy: 0.9950263 Val accuracy: 0.9585
11 Train accuracy: 0.99671054 Val accuracy: 0.96075
12 Train accuracy: 0.9973421 Val accuracy: 0.96175
13 Train accuracy: 0.9985 Val accuracy: 0.96125
14 Train accuracy: 0.9988684 Val accuracy: 0.96075
15 Train accuracy: 0.99897367 Val accuracy: 0.9615
16 Train accuracy: 0.9993684 Val accuracy: 0.96225
17 Train accuracy: 0.9993684 Val accuracy: 0.961
18 Train accuracy: 0.9996842 Val accuracy: 0.96225
19 Train accuracy: 0.9994737 Val acc

In [13]:
with tf.Session() as sess:
    tf.train.Saver().restore(sess, "./my_model_final.ckpt")
    Z = logits.eval(feed_dict={X: X_test})
    y_pred = np.argmax(Z, axis=1)
    
df = pd.DataFrame(y_pred, columns=['Label'])
df.index += 1 
df.to_csv('Subm_mod_4.csv', index_label='ImageId')

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


# Benchmark results

In the below table, a summary of the models, their characteristics, and performances are shown.

In [14]:
# Convert metrics dictionary to dataframe for display
results_summary = pd.DataFrame.from_dict(metrics, orient='index')
results_summary.columns = names

# Sort by model number
results_summary.reset_index(inplace=True)
results_summary.sort_values(by=['index'], axis=0, inplace=True)
results_summary.set_index(['index'], inplace=True)
results_summary.index.name = None

# Export to csv
results_summary.to_csv('results_summary.csv')
results_summary

Unnamed: 0,Number of Hidden Layers,Nodes per Layer,Time in Seconds,Training Set Accuracy,Validation Set Accuracy
Model 1,2,100,51.229236,0.990974,0.958
Model 2,2,200,73.878427,0.993789,0.95925
Model 3,6,100,65.446789,0.999079,0.95425
Model 4,6,200,128.955616,0.999474,0.96125


# Conclusions

Model 1 score: 0.95928
Model 2 score: 0.96157
Model 3 score: 0.95600
Model 4 score: 0.95971

The best model on the test set is Model 2. This model has 2 hidden layers and 200 nodes per layer. 

Confronting with the other models, increasing the numebr of layers per node increase the ccuracy of the model. On the other hand increasing the number of hidden layers increase training time and generate more overfitting. In general, and specially for the case of the network with more layers, the models  should be modified adding some regularization technique and dropout to decrease the overfitting problem.


 