### **This notebook compares different tensorflow neural netowrk designs using the MNIST dataset**

In [0]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [0]:
from functools import wraps
from time import time
def timing(f):
    @wraps(f)
    def wrap(*args, **kw):
        ts = time()
        result = f(*args, **kw)
        te = time()
        print(f"fun: {f.__name__}, args: [{args}, {kw}] took: {te-ts} sec")
        return result
    return wrap

### **Global Parameters**

Consistent for every model

In [0]:
# Global Parameters
batch_size = 100
display_step = 9

# Network Parameters
n_input = 784 # MNIST data input (28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

**Multilayer Perceptron Function**

In [0]:
# Create model
def multilayer_perceptron(x, weights, biases):
  
    # Use tf.matmul (broadcast)
    print( 'x:', x.get_shape(), 'W1:', weights['h1'].get_shape(), 'b1:', biases['b1'].get_shape())  
    
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) #(x*weights['h1']) + biases['b1']
    layer_1 = tf.nn.relu(layer_1)

    # Hidden layer with RELU activation
    print( 'layer_1:', layer_1.get_shape(), 'W2:', weights['h2'].get_shape(), 'b2:', biases['b2'].get_shape())        
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) # (layer_1 * weights['h2']) + biases['b2'] 
    layer_2 = tf.nn.relu(layer_2)

    # Output layer with linear activation
    print( 'layer_2:', layer_2.get_shape(), 'W3:', weights['out'].get_shape(), 'b3:', biases['out'].get_shape())        
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out'] # (layer_2 * weights['out']) + biases['out']    
    print('out_layer:',out_layer.get_shape())

    return out_layer

## **Model A**

*Try nodes equal to the square root of the input features*

In [0]:
# Model A learning Rate
learning_rate_A = 0.001
training_epochs_A = 100

# Model A Network Parameters
A_hidden_1 = 28 # 1st layer number of features
A_hidden_2 = 28 # 2nd layer number of features


In [0]:
# Store layers weight & bias for Model B
weights_A = {
    'h1': tf.Variable(tf.random_normal([n_input, A_hidden_1])),    #784x28
    'h2': tf.Variable(tf.random_normal([A_hidden_1, A_hidden_2])), #28x28
    'out': tf.Variable(tf.random_normal([A_hidden_2, n_classes]))  #28x10
}
biases_A = {
    'b1': tf.Variable(tf.random_normal([A_hidden_1])),             #28x1
    'b2': tf.Variable(tf.random_normal([A_hidden_2])),             #28x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model B
pred_A = multilayer_perceptron(x, weights_A, biases_A)

x: (?, 784) W1: (784, 28) b1: (28,)
layer_1: (?, 28) W2: (28, 28) b2: (28,)
layer_2: (?, 28) W3: (28, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model A
cost_A = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_A, labels=y))

# Optimizer for Model A
optimizer_A = tf.train.AdamOptimizer(learning_rate=learning_rate_A).minimize(cost_A)

In [0]:
# Model A
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_A():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_A):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_A, cost_A], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model A Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_A, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model A:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_A()

Epoch: 0001 cost= 33.519423692
Epoch: 0010 cost= 0.896983690
Epoch: 0019 cost= 0.364546590
Epoch: 0028 cost= 0.227477948
Epoch: 0037 cost= 0.170262680
Epoch: 0046 cost= 0.142203039
Epoch: 0055 cost= 0.116627858
Epoch: 0064 cost= 0.104153705
Epoch: 0073 cost= 0.092194962
Epoch: 0082 cost= 0.079351897
Epoch: 0091 cost= 0.071888410
Epoch: 0100 cost= 0.065250891
Model A Optimization Finished!
Accuracy of Model A: 0.9497
fun: training_loop_A, args: [(), {}] took: 136.6531891822815 sec


## **Model B**

*Same nodes and epochs as Model A, adjust learning rate*

In [0]:
# Model B learning Rate
learning_rate_B = 0.1
training_epochs_B = 100

# Model B Network Parameters
B_hidden_1 = 28 # 1st layer number of features
B_hidden_2 = 28 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model B
weights_B = {
    'h1': tf.Variable(tf.random_normal([n_input, B_hidden_1])),    #784x28
    'h2': tf.Variable(tf.random_normal([B_hidden_1, B_hidden_2])), #28x28
    'out': tf.Variable(tf.random_normal([B_hidden_2, n_classes]))  #28x10
}
biases_B = {
    'b1': tf.Variable(tf.random_normal([B_hidden_1])),             #28x1
    'b2': tf.Variable(tf.random_normal([B_hidden_2])),             #28x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model B
pred_B = multilayer_perceptron(x, weights_B, biases_B)

x: (?, 784) W1: (784, 28) b1: (28,)
layer_1: (?, 28) W2: (28, 28) b2: (28,)
layer_2: (?, 28) W3: (28, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model B
cost_B = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_B, labels=y))

# Model B Optimizer
optimizer_B = tf.train.AdamOptimizer(learning_rate=learning_rate_B).minimize(cost_B)

In [0]:
# Model B
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_B():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_B):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_B, cost_B], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model B Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_B, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model B:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_B()

Epoch: 0001 cost= 2.910077302
Epoch: 0010 cost= 1.733886617
Epoch: 0019 cost= 1.915577514
Epoch: 0028 cost= 1.860436428
Epoch: 0037 cost= 1.880057339
Epoch: 0046 cost= 1.845974970
Epoch: 0055 cost= 1.857720120
Epoch: 0064 cost= 1.860988697
Epoch: 0073 cost= 1.894884940
Epoch: 0082 cost= 1.894137257
Epoch: 0091 cost= 1.840635065
Epoch: 0100 cost= 1.834093482
Model B Optimization Finished!
Accuracy of Model B: 0.2094
fun: training_loop_B, args: [(), {}] took: 137.72603511810303 sec


## **Model C**

*increase hidden layer nodes to 300*

In [0]:
# Model C learning Rate
learning_rate_C = 0.001
training_epochs_C = 100

# Model C Network Parameters
C_hidden_1 = 300 # 1st layer number of features
C_hidden_2 = 300 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model C
weights_C = {
    'h1': tf.Variable(tf.random_normal([n_input, C_hidden_1])),    #784x300
    'h2': tf.Variable(tf.random_normal([C_hidden_1, C_hidden_2])), #300x300
    'out': tf.Variable(tf.random_normal([C_hidden_2, n_classes]))  #300x10
}
biases_C = {
    'b1': tf.Variable(tf.random_normal([C_hidden_1])),             #300x1
    'b2': tf.Variable(tf.random_normal([C_hidden_2])),             #300x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model C
pred_C = multilayer_perceptron(x, weights_C, biases_C)

x: (?, 784) W1: (784, 300) b1: (300,)
layer_1: (?, 300) W2: (300, 300) b2: (300,)
layer_2: (?, 300) W3: (300, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model C
cost_C = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_C, labels=y))

# Model C Optimizer
optimizer_C = tf.train.AdamOptimizer(learning_rate=learning_rate_C).minimize(cost_C)

In [0]:
# Model C
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_C():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_C):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_C, cost_C], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model C Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_C, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model C:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_C()

Epoch: 0001 cost= 177.714622806
Epoch: 0010 cost= 2.648205742
Epoch: 0019 cost= 0.482994352
Epoch: 0028 cost= 0.406804922
Epoch: 0037 cost= 0.212329539
Epoch: 0046 cost= 0.204479781
Epoch: 0055 cost= 0.210383427
Epoch: 0064 cost= 0.142848322
Epoch: 0073 cost= 0.138408895
Epoch: 0082 cost= 0.124967998
Epoch: 0091 cost= 0.117228675
Epoch: 0100 cost= 0.102824890
Model C Optimization Finished!
Accuracy of Model C: 0.9682
fun: training_loop_C, args: [(), {}] took: 155.43393301963806 sec


## **Model D**

*Same nodes as Model C, adjust learning rate*

In [0]:
# Model D learning Rate
learning_rate_D = 0.01
training_epochs_D = 100

# Model D Network Parameters
D_hidden_1 = 300 # 1st layer number of features
D_hidden_2 = 300 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model D
weights_D = {
    'h1': tf.Variable(tf.random_normal([n_input, D_hidden_1])),    #784x300
    'h2': tf.Variable(tf.random_normal([D_hidden_1, D_hidden_2])), #300x300
    'out': tf.Variable(tf.random_normal([D_hidden_2, n_classes]))  #300x10
}
biases_D = {
    'b1': tf.Variable(tf.random_normal([D_hidden_1])),             #300x1
    'b2': tf.Variable(tf.random_normal([D_hidden_2])),             #300x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model D
pred_D = multilayer_perceptron(x, weights_D, biases_D)

x: (?, 784) W1: (784, 300) b1: (300,)
layer_1: (?, 300) W2: (300, 300) b2: (300,)
layer_2: (?, 300) W3: (300, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model D
cost_D = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_D, labels=y))

# Model D Optimizer
optimizer_D = tf.train.AdamOptimizer(learning_rate=learning_rate_D).minimize(cost_D)

In [0]:
# Model D
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_D():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_D):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_D, cost_D], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model D Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_D, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model D:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_D()

Epoch: 0001 cost= 53.967102251
Epoch: 0010 cost= 1.687477235
Epoch: 0019 cost= 0.595191199
Epoch: 0028 cost= 0.209805833
Epoch: 0037 cost= 0.142061190
Epoch: 0046 cost= 0.120157408
Epoch: 0055 cost= 0.084393110
Epoch: 0064 cost= 0.085849609
Epoch: 0073 cost= 0.097922791
Epoch: 0082 cost= 0.102317256
Epoch: 0091 cost= 0.076751651
Epoch: 0100 cost= 0.076111232
Model D Optimization Finished!
Accuracy of Model D: 0.9697
fun: training_loop_D, args: [(), {}] took: 155.47146081924438 sec


## **Model E**

*increase nodes*

In [0]:
# Model E learning Rate
learning_rate_E = 0.001
training_epochs_E = 100

# Model E Network Parameters
E_hidden_1 = 500 # 1st layer number of features
E_hidden_2 = 500 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model E
weights_E = {
    'h1': tf.Variable(tf.random_normal([n_input, E_hidden_1])),    #784x500
    'h2': tf.Variable(tf.random_normal([E_hidden_1, E_hidden_2])), #500x500
    'out': tf.Variable(tf.random_normal([E_hidden_2, n_classes]))  #500x10
}
biases_E = {
    'b1': tf.Variable(tf.random_normal([E_hidden_1])),             #500x1
    'b2': tf.Variable(tf.random_normal([E_hidden_2])),             #500x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model E
pred_E = multilayer_perceptron(x, weights_E, biases_E)

x: (?, 784) W1: (784, 500) b1: (500,)
layer_1: (?, 500) W2: (500, 500) b2: (500,)
layer_2: (?, 500) W3: (500, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model E
cost_E = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_E, labels=y))

# Model E Optimizer
optimizer_E = tf.train.AdamOptimizer(learning_rate=learning_rate_E).minimize(cost_E)

In [0]:
# Model E
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_E():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_E):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_E, cost_E], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model E Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_E, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model E:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_E()

Epoch: 0001 cost= 245.892399372
Epoch: 0010 cost= 2.685946415
Epoch: 0019 cost= 1.043761824
Epoch: 0028 cost= 0.701647883
Epoch: 0037 cost= 0.515014493
Epoch: 0046 cost= 0.606561599
Epoch: 0055 cost= 0.341497255
Epoch: 0064 cost= 0.547253780
Epoch: 0073 cost= 0.439078821
Epoch: 0082 cost= 0.171760189
Epoch: 0091 cost= 0.244424880
Epoch: 0100 cost= 0.226764511
Model E Optimization Finished!
Accuracy of Model E: 0.974
fun: training_loop_E, args: [(), {}] took: 178.98494935035706 sec


## **Model F**

*various node volumes for 2 layers*

In [0]:
# Model F learning Rate
learning_rate_F = 0.001
training_epochs_F = 100

# Model F Network Parameters
F_hidden_1 = 150 # 1st layer number of features
F_hidden_2 = 300 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model F
weights_F = {
    'h1': tf.Variable(tf.random_normal([n_input, F_hidden_1])),    #784x150
    'h2': tf.Variable(tf.random_normal([F_hidden_1, F_hidden_2])), #150x300
    'out': tf.Variable(tf.random_normal([F_hidden_2, n_classes]))  #300x10
}
biases_F = {
    'b1': tf.Variable(tf.random_normal([F_hidden_1])),             #150x1
    'b2': tf.Variable(tf.random_normal([F_hidden_2])),             #300x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model F
pred_F = multilayer_perceptron(x, weights_F, biases_F)

x: (?, 784) W1: (784, 150) b1: (150,)
layer_1: (?, 150) W2: (150, 300) b2: (300,)
layer_2: (?, 300) W3: (300, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model F
cost_F = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_F, labels=y))

# Model F Optimizer
optimizer_F = tf.train.AdamOptimizer(learning_rate=learning_rate_F).minimize(cost_F)

In [0]:
# Model F
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_F():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_F, cost_F], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model F Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_F, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model F:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_F()

Epoch: 0001 cost= 159.146627426
Epoch: 0010 cost= 3.651651075
Epoch: 0019 cost= 0.571011755
Epoch: 0028 cost= 0.302446496
Epoch: 0037 cost= 0.169019668
Epoch: 0046 cost= 0.101171108
Epoch: 0055 cost= 0.130114581
Epoch: 0064 cost= 0.075273835
Epoch: 0073 cost= 0.080754856
Epoch: 0082 cost= 0.079123368
Epoch: 0091 cost= 0.068936129
Epoch: 0100 cost= 0.059727455
Model F Optimization Finished!
Accuracy of Model F: 0.9649
fun: training_loop_F, args: [(), {}] took: 148.33707761764526 sec


## **Model G **

*same nodes as Model F, increase training epochs*

In [0]:
# Model G learning Rate
learning_rate_G = 0.001
training_epochs_G = 200

# Model G Network Parameters
G_hidden_1 = 150 # 1st layer number of features
G_hidden_2 = 300 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model G
weights_G = {
    'h1': tf.Variable(tf.random_normal([n_input, G_hidden_1])),    #784x150
    'h2': tf.Variable(tf.random_normal([G_hidden_1, G_hidden_2])), #150x300
    'out': tf.Variable(tf.random_normal([G_hidden_2, n_classes]))  #300x10
}
biases_G = {
    'b1': tf.Variable(tf.random_normal([G_hidden_1])),             #150x1
    'b2': tf.Variable(tf.random_normal([G_hidden_2])),             #300x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model G
pred_G = multilayer_perceptron(x, weights_G, biases_G)

x: (?, 784) W1: (784, 150) b1: (150,)
layer_1: (?, 150) W2: (150, 300) b2: (300,)
layer_2: (?, 300) W3: (300, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model G
cost_G = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_G, labels=y))

# Model G Optimizer
optimizer_G = tf.train.AdamOptimizer(learning_rate=learning_rate_G).minimize(cost_G)

In [0]:
# Model G
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_G():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_G):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_G, cost_G], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model G Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_G, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model G:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_G()

Epoch: 0001 cost= 147.776164714
Epoch: 0010 cost= 3.741278299
Epoch: 0019 cost= 0.555338832
Epoch: 0028 cost= 0.309666292
Epoch: 0037 cost= 0.130260477
Epoch: 0046 cost= 0.110919527
Epoch: 0055 cost= 0.111610215
Epoch: 0064 cost= 0.088813144
Epoch: 0073 cost= 0.131664972
Epoch: 0082 cost= 0.089893032
Epoch: 0091 cost= 0.079846640
Epoch: 0100 cost= 0.082471903
Epoch: 0109 cost= 0.074195219
Epoch: 0118 cost= 0.070600756
Epoch: 0127 cost= 0.058322106
Epoch: 0136 cost= 0.050874336
Epoch: 0145 cost= 0.054817912
Epoch: 0154 cost= 0.048086062
Epoch: 0163 cost= 0.080321500
Epoch: 0172 cost= 0.058127463
Epoch: 0181 cost= 0.029928784
Epoch: 0190 cost= 0.033959699
Epoch: 0199 cost= 0.057468805
Model G Optimization Finished!
Accuracy of Model G: 0.9657
fun: training_loop_G, args: [(), {}] took: 297.589617729187 sec


## **Model H**

*same nodes as Model F and G, less training epochs*

In [0]:
# Model H learning Rate
learning_rate_H = 0.001
training_epochs_H = 50

# Model H Network Parameters
H_hidden_1 = 150 # 1st layer number of features
H_hidden_2 = 300 # 2nd layer number of features

In [0]:
# Store layers weight & bias for Model H
weights_H = {
    'h1': tf.Variable(tf.random_normal([n_input, H_hidden_1])),    #784x150
    'h2': tf.Variable(tf.random_normal([H_hidden_1, H_hidden_2])), #150x300
    'out': tf.Variable(tf.random_normal([H_hidden_2, n_classes]))  #300x10
}
biases_H = {
    'b1': tf.Variable(tf.random_normal([H_hidden_1])),             #150x1
    'b2': tf.Variable(tf.random_normal([H_hidden_2])),             #300x1
    'out': tf.Variable(tf.random_normal([n_classes]))              #10x1
}

# Construct Model H
pred_H = multilayer_perceptron(x, weights_H, biases_H)

x: (?, 784) W1: (784, 150) b1: (150,)
layer_1: (?, 150) W2: (150, 300) b2: (300,)
layer_2: (?, 300) W3: (300, 10) b3: (10,)
out_layer: (?, 10)


In [0]:
# Cross entropy loss function for Model H
cost_H = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred_H, labels=y))

# Model H Optimizer
optimizer_H = tf.train.AdamOptimizer(learning_rate=learning_rate_H).minimize(cost_H)

In [0]:
# Model H
# Initializing the variables
init = tf.global_variables_initializer()

@timing
def training_loop_H():
  with tf.Session() as sess:
    sess.run(init)
    
    # Training cycle
    for epoch in range(training_epochs_H):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer_H, cost_H], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Model G Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred_H, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # To keep sizes compatible with model
    print ("Accuracy of Model G:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
    
training_loop_H()

Epoch: 0001 cost= 157.657458832
Epoch: 0010 cost= 3.629901862
Epoch: 0019 cost= 0.562861310
Epoch: 0028 cost= 0.272639280
Epoch: 0037 cost= 0.166516822
Epoch: 0046 cost= 0.172898383
Model G Optimization Finished!
Accuracy of Model G: 0.9579
fun: training_loop_H, args: [(), {}] took: 74.94109678268433 sec


In [0]:
import pandas as pd

Models = ['A','B','C','D','E','F','G','H']
Layers = [2,2,2,2,2,2,2,2]
LR = [.001, .1, .001, .01, .001, .001, .001, .001]
Layer1_Nodes = [28,28,300,300,500,150,150,150]
Layer2_Nodes = [28,28,300,300,500,300,300,300]
Execution = [136.7, 137.7, 155.4, 155.5, 178.9, 148.3, 297.6, 74.9]
_Accuracy = ['95%','21%','97%','97%','97%','96%','97%','96%']
Loss = [0.065, 1.83, 0.103, 0.076, 0.220, 0.059, 0.057, 0.17]
Epochs = [100,100,100,100,100,100,200,50]

table = pd.DataFrame(data={"Learning Rate": LR, "NN Layers": Layers, "Layer 1 Nodes": Layer1_Nodes, "Layer 2 Nodes": Layer2_Nodes, "Execution Time": Execution, "Epochs": Epochs, "Accuracy": _Accuracy, "Loss": Loss}, index=Models)
table

Unnamed: 0,Accuracy,Epochs,Execution Time,Layer 1 Nodes,Layer 2 Nodes,Learning Rate,Loss,NN Layers
A,95%,100,136.7,28,28,0.001,0.065,2
B,21%,100,137.7,28,28,0.1,1.83,2
C,97%,100,155.4,300,300,0.001,0.103,2
D,97%,100,155.5,300,300,0.01,0.076,2
E,97%,100,178.9,500,500,0.001,0.22,2
F,96%,100,148.3,150,300,0.001,0.059,2
G,97%,200,297.6,150,300,0.001,0.057,2
H,96%,50,74.9,150,300,0.001,0.17,2


## **Recommendation**

*Learning rate*: Learning rate was adjusted through models A – D. The pairs of Models A-B and C-D each had the same node design and similar execution time. Within the two Model pairs, the Model with the smallest learning rate recorded the smallest loss.  As Models A and B had much smaller nodes, the learning rate’s impact to the model accuracy seemed to be greater. 

*Execution time:* The execution time seems to be most impacted by total nodes and of course the total training epochs. Models with similar nodes but different learning rate did not see much variance in execution time. The biggest variance in execution time can be observed between Models F – H, where the nodes and learning rate remained the same, but the training epochs changed.

*Loss*: Observing the loss of each model, Models E and B stick out. Model B with the largest loss value, had both a low number of layer nodes and a low learning rate. Model E, however, had the highest nodes and a low learning rate. This suggests there may be diminishing returns for node volume within the layers. 


Of the observed Models, **Model H is the most efficient and recommended model**. It achieves above average model accuracy and it is, by far, executed the quickest. 

