# Module 3 - TensorFlow Mechanics 101


This is step-by-step follow along based on Google's Tensorflow Tutorial.

[TensorFlow Mechanics 101](https://www.tensorflow.org/get_started/mnist/mechanics)

The tutorial is structured to go over python codes which was supposed to be run from the command line:

python fully_connected_feed.py

But there are many Python and TensorFlow programming concepts which I want to explore in greater detail. So I am re-doing this section in Jupyter Notebook.

Section 1 deals primarily with Python's argparse module, which is important for writing user-friendly Python scripts run from the command line.

This section essentially takes apart the command line codes provided by Google and rebuild it from the ground up. A lot of attention is paid to getting TensorBoard to work. 


In [None]:
"""Functions for downloading and reading MNIST data."""

# These 3 lines provides backward compatibility with older Python versions from Python 3 code
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# six is a package that helps in writing code that is compatible with both Python 2 and Python 3.
from six.moves import urllib
from six.moves import xrange  # pylint: disable=redefined-builtin

import gzip
import os
import sys
import time
import math
import tempfile
import argparse
import numpy
import tensorflow as tf

# The mnist read_data_sets() function will be used in full_connected_feed.py to download mnist dataset
# to your local training folder and to then unpack that data to return a dictionary of DataSet instances.
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.examples.tutorials.mnist import mnist

## SECTION 2 - Code Organization

The command line codes provided by Google is structured as followed:


|code| what it does |
|----|:----:|
|input_data.py  | import statements for downloading and reading MNIST data |
|mnist.py | The code that builds the computation graph for a fully-connected MNIST model |
|fully_connected_feed.py |The main code to train the built MNIST model against the downloaded dataset using a feed dictionary |


## input_data.py

The code is a bunch of import statements. The first half deals with Python version compatibility. The second half imports libraries such as numpy, tensorflow, read_data_set, and so on.

In [None]:
"""Functions for downloading and reading MNIST data."""

# These 3 lines provides backward compatibility with older Python versions from Python 3 code
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# six is a package that helps in writing code that is compatible with both Python 2 and Python 3.
from six.moves import urllib
from six.moves import xrange  # pylint: disable=redefined-builtin

import gzip
import os
import tempfile
import numpy
import tensorflow as tf

# The mnist read_data_sets() function will be used in full_connected_feed.py to download mnist dataset
# to your local training folder and to then unpack that data to return a dictionary of DataSet instances.
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets

## mnist.py and fully_connected_feed.py

The computation graph is built from the **mnist.py** file according to a 3-stage pattern: inference(), loss(), and training().

    inference() - Builds the graph as far as is required for running the network forward to make predictions.
    loss() - Adds to the inference graph the ops required to generate loss.
    training() - Adds to the loss graph the ops required to compute and apply gradients.

**fully_connected_feed.py** trains the built MNIST model against the downloaded dataset using a feed dictionary. It is written to be run from the command line 

```sh
$ python fully_connected_feed.py

$ python3 fully_connected_feed.py --help
usage: fully_connected_feed.py [-h] [--learning_rate LEARNING_RATE]
                               [--max_steps MAX_STEPS] [--hidden1 HIDDEN1]
                               [--hidden2 HIDDEN2] [--batch_size BATCH_SIZE]
                               [--input_data_dir INPUT_DATA_DIR]
                               [--log_dir LOG_DIR] [--fake_data]

optional arguments:
  -h, --help            show this help message and exit
  --learning_rate LEARNING_RATE
                        Initial learning rate.
  --max_steps MAX_STEPS
                        Number of steps to run trainer.
  --hidden1 HIDDEN1     Number of units in hidden layer 1.
  --hidden2 HIDDEN2     Number of units in hidden layer 2.
  --batch_size BATCH_SIZE
                        Batch size. Must divide evenly into the dataset sizes.
  --input_data_dir INPUT_DATA_DIR
                        Directory to put the input data.
  --log_dir LOG_DIR     Directory to put the log data.
  --fake_data           If true, uses fake data for unit testing.
```

## fully_connected_feed.py

We first work with fully_connected_feed.py.

First the import statements

In [None]:
import argparse
import os.path
import sys
import time
import math
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.examples.tutorials.mnist import mnist

### (1) Populate Model Parameters

Here we use argparse to populate model parameters from command-line arguments.

In [None]:
# Basic model parameters as external flags.
FLAGS = None

# Without a program name, ArgumentParser determine the command-line arguments from sys.argv
parser = argparse.ArgumentParser()
    
parser.add_argument(
      '--learning_rate',
      type=float,
      default=0.01,
      help='Initial learning rate.'
)

parser.add_argument(
      '--max_steps',
      type=int,
      default=10000,
      help='Number of steps to run trainer.'
)
    
parser.add_argument(
      '--hidden1',
      type=int,
      default=128,
      help='Number of units in hidden layer 1.'
)

parser.add_argument(
      '--hidden2',
      type=int,
      default=32,
      help='Number of units in hidden layer 2.'
)
    
parser.add_argument(
      '--batch_size',
      type=int,
      default=100,
      help='Batch size.  Must divide evenly into the dataset sizes.'
)

parser.add_argument(
      '--input_data_dir',
      type=str,
      default='/tmp/tensorflow/mnist/input_data',
      help='Directory to put the input data.'
)
    
parser.add_argument(
      '--log_dir',
      type=str,
      default='/home/lukeliem/TensorFlow/logs/fully_connected_feed',
      help='Directory to put the log data.'
)

parser.add_argument(
      '--fake_data',
      default=False,
      help='If true, uses fake data for unit testing.',
      action='store_true'
)



In [None]:
# Sometimes a script may only parse a few of the command-line arguments, passing the remaining arguments on to another 
# script or program. parse_known_args() returns a two item tuple containing the populated namespace (into FLAG) and the
# list of remaining argument strings.
FLAGS, unparsed = parser.parse_known_args(['--max_steps','10000', '--learning_rate','0.001'])

# FLAGS is the Namespace which stores all the parameters
print (FLAGS)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already


In [None]:
# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# InteractiveSession class allows you to interleave operations which build a computation graph with ones that
# run the graph. This is particularly convenient when working in interactive contexts like IPython.
# Otherwise, we should build the entire computation graph before starting a session and launching the graph.
sess = tf.InteractiveSession()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size))

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

### (2) Output Graph --> TensorBoard

The commands below output the computation graph to TensorBoard.

```sh
tensorboard --logdir='./logs/fully_connected_feed'
```

Go to URL "http://localhost:6006/"

![2 placeholders](images/Screenshot from 2017-06-15 15-27-06.png)

In [None]:
# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

summary_writer.close()  # Always remember to close the summary writer

### (3) Build the Inference Engine

The inference engine performs inferences (predictions). It takes the images placeholder as input and builds on top of it a pair of fully connected layers (hidden1 and hidden2) with ReLU activation followed by a ten node linear layer (softmax_linear) specifying the output logits.

    hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
    logits = tf.matmul(hidden2, weights) + biases

TensorBoard graph after adding the first hidden layer:

![2 placeholders + 1 hidden layer ](images/Screenshot from 2017-06-15 17-23-58.png)


In [None]:
import argparse
import os.path
import sys
import time
import math
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.examples.tutorials.mnist import mnist

# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# InteractiveSession class allows you to interleave operations which build a computation graph with ones that
# run the graph. This is particularly convenient when working in interactive contexts like IPython.
# Otherwise, we should build the entire computation graph before starting a session and launching the graph.
sess = tf.InteractiveSession()

# Remove all nodes from default graph
tf.reset_default_graph()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size))

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

# Hidden 1
with tf.name_scope('hidden1'):
    # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
    weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
        stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
    # Likewise, the unique name given to the biases variable would be "hidden1/biases".
    biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)
    
# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

summary_writer.close()  # Always remember to close the summary writer    

TensorBoard graph after adding the inference engine:

![Inference Engine](images/Screenshot from 2017-06-15 17-44-00.png)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# InteractiveSession class allows you to interleave operations which build a computation graph with ones that
# run the graph. This is particularly convenient when working in interactive contexts like IPython.
# Otherwise, we should build the entire computation graph before starting a session and launching the graph.
sess = tf.InteractiveSession()

# Remove all nodes from default graph
tf.reset_default_graph()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size))

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

# Hidden 1
with tf.name_scope('hidden1'):
    # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
    weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
        stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
    # Likewise, the unique name given to the biases variable would be "hidden1/biases".
    biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

# Hidden 2
with tf.name_scope('hidden2'):
    # "hidden2/weights"
    weights = tf.Variable(
        tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
        name='weights')
    # "hidden2/biases"
    biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

# Linear
with tf.name_scope('softmax_linear'):
    # "softmax_linear/weights"    
    weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
    # "softmax_linear/biases" 
    biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
    logits = tf.matmul(hidden2, weights) + biases
   

# Add the variable initializer Op.
tf.global_variables_initializer()  

# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

summary_writer.close()  # Always remember to close the summary writer  
sess.close()

### (4) Build the Loss Function

Here we add the loss function to the inference engine.

![Inference + Loss](images/Screenshot from 2017-06-19 14-20-42.png)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# InteractiveSession class allows you to interleave operations which build a computation graph with ones that
# run the graph. This is particularly convenient when working in interactive contexts like IPython.
# Otherwise, we should build the entire computation graph before starting a session and launching the graph.
sess = tf.InteractiveSession()

# Remove all nodes from default graph
tf.reset_default_graph()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size))

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

'''
The Inference Engine
'''

# Hidden 1
with tf.name_scope('hidden1'):
    # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
    weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
        stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
    # Likewise, the unique name given to the biases variable would be "hidden1/biases".
    biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

# Hidden 2
with tf.name_scope('hidden2'):
    # "hidden2/weights"
    weights = tf.Variable(
        tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
        name='weights')
    # "hidden2/biases"
    biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

# Linear
with tf.name_scope('softmax_linear'):
    # "softmax_linear/weights"    
    weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
    # "softmax_linear/biases" 
    biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
    logits = tf.matmul(hidden2, weights) + biases

'''
The Loss Function
'''

labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
# This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
# inference engine
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

# This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')



# Add the variable initializer Op.
tf.global_variables_initializer()  

# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
summary_writer.flush()
summary_writer.close()  # Always remember to close the summary writer 
sess.close()

### (5) Build the Training Op

Here we add the Training Op using the ADAM Optimizer.

![Inference + Loss + Train](images/Screenshot from 2017-06-19 14-52-37.png)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# InteractiveSession class allows you to interleave operations which build a computation graph with ones that
# run the graph. This is particularly convenient when working in interactive contexts like IPython.
# Otherwise, we should build the entire computation graph before starting a session and launching the graph.
sess = tf.InteractiveSession()

# Remove all nodes from default graph
tf.reset_default_graph()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size))

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

'''
The Inference Engine
'''

# Hidden 1
with tf.name_scope('hidden1'):
    # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
    weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
        stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
    # Likewise, the unique name given to the biases variable would be "hidden1/biases".
    biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

# Hidden 2
with tf.name_scope('hidden2'):
    # "hidden2/weights"
    weights = tf.Variable(
        tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
        name='weights')
    # "hidden2/biases"
    biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

# Linear
with tf.name_scope('softmax_linear'):
    # "softmax_linear/weights"    
    weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
    # "softmax_linear/biases" 
    biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
    logits = tf.matmul(hidden2, weights) + biases

'''
The Loss Function
'''

labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
# This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
# inference engine
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

# This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

'''
The Training Op
'''
# tf.summary.scalar is an op for generating summary values into the events file when used with a 
# tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
# summaries are written out.
tf.summary.scalar('loss', loss)

# Create the gradient descent optimizer with the given learning rate.
optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
# Create a variable to track the global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
# Use the optimizer to apply the gradients that minimize the loss
# (and also increment the global step counter) as a single training step.
train_op = optimizer.minimize(loss, global_step=global_step)


# Add the variable initializer Op.
tf.global_variables_initializer()  

# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

summary_writer.close()  # Always remember to close the summary writer 
sess.close()

### (6) Evaluation

Add the graph for the Evaluation Op which valuates the quality of the logits at predicting the label. Also, define name_scope for various ops and layers, so that the graph is more readable.

![Inference + Loss + Train + Eval](images/Screenshot from 2017-06-19 16-46-45.png)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# InteractiveSession class allows you to interleave operations which build a computation graph with ones that
# run the graph. This is particularly convenient when working in interactive contexts like IPython.
# Otherwise, we should build the entire computation graph before starting a session and launching the graph.
sess = tf.InteractiveSession()

# Remove all nodes from default graph
tf.reset_default_graph()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

'''
The Inference Engine
'''

# Hidden 1
with tf.name_scope('hidden1'):
    # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
    weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
        stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
    # Likewise, the unique name given to the biases variable would be "hidden1/biases".
    biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

# Hidden 2
with tf.name_scope('hidden2'):
    # "hidden2/weights"
    weights = tf.Variable(
        tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
        name='weights')
    # "hidden2/biases"
    biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

# Linear
with tf.name_scope('softmax_linear'):
    # "softmax_linear/weights"    
    weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
    # "softmax_linear/biases" 
    biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
    logits = tf.matmul(hidden2, weights) + biases

'''
The Loss Function
'''

with tf.name_scope('softmax'):
    labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
    # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
    # inference engine
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

    # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
    loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

'''
The Training Op
'''

with tf.name_scope('adam_optimizer'):

    # tf.summary.scalar is an op for generating summary values into the events file when used with a 
    # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
    # summaries are written out.
    tf.summary.scalar('loss', loss)

    # Create the gradient descent optimizer with the given learning rate.
    optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
    # Create a variable to track the global step.
    global_step = tf.Variable(0, name='global_step', trainable=False)
    # Use the optimizer to apply the gradients that minimize the loss
    # (and also increment the global step counter) as a single training step.
    train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

'''
The Evaluation Op
'''

with tf.name_scope('eval'):
    # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
    # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
    correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
    # Return the number of true entries.
    eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')

# Add the variable initializer Op.
tf.global_variables_initializer()  

# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

summary_writer.flush()
summary_writer.close()  # Always remember to close the summary writer 
sess.close()

### (7) The Session

Once the graph has been completed and all of the necessary ops generated, a tf.Session is created for running the graph.

Note that we stop using tf.Interactive_Session().

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # Delete everything in the log directory if it already exists

tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# Remove all nodes from default graph
tf.reset_default_graph()

images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

# verify the dimension of the placeholders
print (images_placeholder.get_shape())
print (labels_placeholder.get_shape())

'''
The Inference Engine
'''

# Hidden 1
with tf.name_scope('hidden1'):
    # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
    weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
        stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
    # Likewise, the unique name given to the biases variable would be "hidden1/biases".
    biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

# Hidden 2
with tf.name_scope('hidden2'):
    # "hidden2/weights"
    weights = tf.Variable(
        tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
        name='weights')
    # "hidden2/biases"
    biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

# Linear
with tf.name_scope('softmax_linear'):
    # "softmax_linear/weights"    
    weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
    # "softmax_linear/biases" 
    biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
    logits = tf.matmul(hidden2, weights) + biases

'''
The Loss Function
'''

with tf.name_scope('softmax'):
    labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
    # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
    # inference engine
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

    # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
    loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

'''
The Training Op
'''

with tf.name_scope('adam_optimizer'):

    # tf.summary.scalar is an op for generating summary values into the events file when used with a 
    # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
    # summaries are written out.
    tf.summary.scalar('loss', loss)

    # Create the gradient descent optimizer with the given learning rate.
    optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
    # Create a variable to track the global step.
    global_step = tf.Variable(0, name='global_step', trainable=False)
    # Use the optimizer to apply the gradients that minimize the loss
    # (and also increment the global step counter) as a single training step.
    train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

'''
The Evaluation Op
'''

with tf.name_scope('eval'):
    # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
    # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
    correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
    # Return the number of true entries.
    eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')


# Add the variable initializer Op.
init = tf.global_variables_initializer()  

# Build the summary Tensor based on the TF collection of Summaries.
summary = tf.summary.merge_all() 

'''
The Session
'''
# Create a session for running Ops on the Graph.
sess = tf.Session()

# Run the Op to initialize the variables.
sess.run(init)

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

summary_writer.close()  # Always remember to close the summary writer 

### (8) The Train Loop

We get out of using Interactive_Session and use the regular tf_Session() instead.

There appears to be a bug in TensorFlow which makes the output of summary data and graphs to TensorBoard very unreliable. We have to work around as followed:

* To get a correct graph representation, stop tensorboard and jupyter, delete the tensorflow logdir, restart jupyter, run the script, and then restart tensorboard.  

[Imanol Schlag's Blog on Using TensorBoard](http://ischlag.github.io/2016/06/04/how-to-use-tensorboard/)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # If log directory exists, delete everything in it
tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# Remove all nodes from default graph
tf.reset_default_graph()

with tf.Graph().as_default():

    images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
    labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

    # verify the dimension of the placeholders
    print (images_placeholder.get_shape())
    print (labels_placeholder.get_shape())

    '''
    The Inference Engine
    '''

    # Hidden 1
    with tf.name_scope('hidden1'):
        # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
        weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
            stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
        # Likewise, the unique name given to the biases variable would be "hidden1/biases".
        biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

    # Hidden 2
    with tf.name_scope('hidden2'):
        # "hidden2/weights"
        weights = tf.Variable(
            tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
            name='weights')
        # "hidden2/biases"
        biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    # Linear
    with tf.name_scope('softmax_linear'):
        # "softmax_linear/weights"    
        weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
        # "softmax_linear/biases" 
        biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
        logits = tf.matmul(hidden2, weights) + biases

    '''
    The Loss Function
    '''
    with tf.name_scope('softmax'):
        labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
        # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
        # inference engine
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

        # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
        loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

    '''
    The Training Op
    '''
    with tf.name_scope('ADAM'):

        # tf.summary.scalar is an op for generating summary values into the events file when used with a 
        # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
        # summaries are written out.
        tf.summary.scalar('loss', loss)

        # Create the gradient descent optimizer with the given learning rate.
        optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
        # Create a variable to track the global step.
        global_step = tf.Variable(0, name='global_step', trainable=False)
        # Use the optimizer to apply the gradients that minimize the loss
        # (and also increment the global step counter) as a single training step.
        train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

    '''
    The Evaluation Op
    '''
    with tf.name_scope('eval'):
        # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
        # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
        correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
        # Return the number of true entries.
        eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')


    # Add the variable initializer Op.
    init = tf.global_variables_initializer()  

    # Build the summary Tensor based on the TF collection of Summaries.
    summary = tf.summary.merge_all() 

    '''
    The Session
    '''
    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # Instantiate a SummaryWriter to output summaries and the Graph.
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
    
    # Run the Op to initialize the variables.
    sess.run(init)

    '''
    The Train Loop
    '''
    for step in xrange(FLAGS.max_steps):

        # Create the feed_dict for the placeholders filled with the next
        # `batch size` examples.
        images_feed, labels_feed = data_sets.train.next_batch(FLAGS.batch_size, FLAGS.fake_data)
        feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
        }

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  
        _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)


        # Print an overview every 100 steps
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.2f' % (step, loss_value))

    summary_writer.flush()  # Always remember to close the summary writer
    summary_writer.close()  # Always remember to close the summary writer 

### (9) Using tf.Summary

These tf.summary commands are used to output summary and graph to TensorBoard:

```python
        # tf.summary.scalar is an op for generating summary values into the events file when used with a 
        # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
        # summaries are written out.
        tf.summary.scalar('loss', loss)
        
    # All the summaries (in this case, only tf.summary.scalar('loss', loss)) are collected into a single Tensor 
    # during the graph building phase.
    summary = tf.summary.merge_all() 
    
    # After the session is created, a tf.summary.FileWriter is instantiated to write the events files, which 
    # contain both the graph and the values of the summaries. 
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
        
            # The events file is updated with new summary values every time the summary is evaluated 
            # and the output passed to the writer's add_summary() function.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()    
            
    summary_writer.close()  # Always remember to close the summary writer 
    
```       

![TensorBoard Scalar](images/Screenshot from 2017-06-20 16-17-33.png)
        

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # If log directory exists, delete everything in it
tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# Remove all nodes from default graph
tf.reset_default_graph()

with tf.Graph().as_default():

    images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
    labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

    # verify the dimension of the placeholders
    print (images_placeholder.get_shape())
    print (labels_placeholder.get_shape())

    '''
    The Inference Engine
    '''

    # Hidden 1
    with tf.name_scope('hidden1'):
        # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
        weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
            stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
        # Likewise, the unique name given to the biases variable would be "hidden1/biases".
        biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

    # Hidden 2
    with tf.name_scope('hidden2'):
        # "hidden2/weights"
        weights = tf.Variable(
            tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
            name='weights')
        # "hidden2/biases"
        biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    # Linear
    with tf.name_scope('softmax_linear'):
        # "softmax_linear/weights"    
        weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
        # "softmax_linear/biases" 
        biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
        logits = tf.matmul(hidden2, weights) + biases

    '''
    The Loss Function
    '''
    with tf.name_scope('softmax'):
        labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
        # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
        # inference engine
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

        # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
        loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

    '''
    The Training Op
    '''
    with tf.name_scope('ADAM'):

        # tf.summary.scalar is an op for generating summary values into the events file when used with a 
        # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
        # summaries are written out.
        tf.summary.scalar('loss', loss)

        # Create the gradient descent optimizer with the given learning rate.
        optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
        # Create a variable to track the global step.
        global_step = tf.Variable(0, name='global_step', trainable=False)
        # Use the optimizer to apply the gradients that minimize the loss
        # (and also increment the global step counter) as a single training step.
        train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

    '''
    The Evaluation Op
    '''
    with tf.name_scope('eval'):
        # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
        # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
        correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
        # Return the number of true entries.
        eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')


    # Add the variable initializer Op.
    init = tf.global_variables_initializer()  

    # All the summaries (in this case, only tf.summary.scalar('loss', loss)) are collected into a single Tensor 
    # during the graph building phase.
    summary = tf.summary.merge_all() 

    '''
    The Session
    '''
    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # After the session is created, a tf.summary.FileWriter is instantiated to write the events files, which 
    # contain both the graph and the values of the summaries. 
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
    
    # Run the Op to initialize the variables.
    sess.run(init)

    '''
    The Train Loop
    '''
    for step in xrange(FLAGS.max_steps):

        # Create the feed_dict for the placeholders filled with the next
        # `batch size` examples.
        images_feed, labels_feed = data_sets.train.next_batch(FLAGS.batch_size, FLAGS.fake_data)
        feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
        }

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  
        _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)


        # Print an overview every 100 steps
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.2f' % (step, loss_value))
            # The events file is updated with new summary values every time the summary is evaluated 
            # and the output passed to the writer's add_summary() function.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()
            
    summary_writer.close()  # Always remember to close the summary writer 

### (10) Evaluation

Every thousand steps, the code will attempt to evaluate the model against both the training, validation and test datasets. This is performed in the do_eval() function.

Inside the function, there is a loop for filling a feed_dict and calling sess.run() against the eval_correct op to evaluate the model on the given dataset(training, validation or test)

The eval_correct op simply generates a tf.nn.in_top_k op that automatically scores each model output as correct if the true label can be found in the K most-likely predictions (where k=1).

In [None]:
def do_eval(sess,
            eval_correct,
            images_placeholder,
            labels_placeholder,
            data_set):
  """Runs one evaluation against the full epoch of data.

  Args:
    sess: The session in which the model has been trained.
    eval_correct: The Tensor that returns the number of correct predictions.
    images_placeholder: The images placeholder.
    labels_placeholder: The labels placeholder.
    data_set: The set of images and labels to evaluate, from
      input_data.read_data_sets().
  """
  # And run one epoch of eval.
  true_count = 0  # Counts the number of correct predictions.
  steps_per_epoch = data_set.num_examples // FLAGS.batch_size
  num_examples = steps_per_epoch * FLAGS.batch_size
  for step in xrange(steps_per_epoch):
    images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size, FLAGS.fake_data)
    feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
    }    
    true_count += sess.run(eval_correct, feed_dict=feed_dict)
  precision = float(true_count) / num_examples
  print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' %
        (num_examples, true_count, precision))
    

# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # If log directory exists, delete everything in it
tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# Remove all nodes from default graph
tf.reset_default_graph()

with tf.Graph().as_default():

    images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
    labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

    # verify the dimension of the placeholders
    print (images_placeholder.get_shape())
    print (labels_placeholder.get_shape())

    '''
    The Inference Engine
    '''

    # Hidden 1
    with tf.name_scope('hidden1'):
        # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
        weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
            stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
        # Likewise, the unique name given to the biases variable would be "hidden1/biases".
        biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

    # Hidden 2
    with tf.name_scope('hidden2'):
        # "hidden2/weights"
        weights = tf.Variable(
            tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
            name='weights')
        # "hidden2/biases"
        biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    # Linear
    with tf.name_scope('softmax_linear'):
        # "softmax_linear/weights"    
        weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
        # "softmax_linear/biases" 
        biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
        logits = tf.matmul(hidden2, weights) + biases

    '''
    The Loss Function
    '''
    with tf.name_scope('softmax'):
        labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
        # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
        # inference engine
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

        # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
        loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

    '''
    The Training Op
    '''
    with tf.name_scope('ADAM'):

        # tf.summary.scalar is an op for generating summary values into the events file when used with a 
        # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
        # summaries are written out.
        tf.summary.scalar('loss', loss)

        # Create the gradient descent optimizer with the given learning rate.
        optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
        # Create a variable to track the global step.
        global_step = tf.Variable(0, name='global_step', trainable=False)
        # Use the optimizer to apply the gradients that minimize the loss
        # (and also increment the global step counter) as a single training step.
        train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

    '''
    The Evaluation Op
    '''
    with tf.name_scope('eval'):
        # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
        # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
        correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
        # Return the number of true entries.
        eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')


    # Add the variable initializer Op.
    init = tf.global_variables_initializer()  

    # All the summaries (in this case, only tf.summary.scalar('loss', loss)) are collected into a single Tensor 
    # during the graph building phase.
    summary = tf.summary.merge_all() 

    '''
    The Session
    '''
    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # After the session is created, a tf.summary.FileWriter is instantiated to write the events files, which 
    # contain both the graph and the values of the summaries. 
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
    
    # Run the Op to initialize the variables.
    sess.run(init)

    '''
    The Train Loop
    '''
    for step in xrange(FLAGS.max_steps):

        # Create the feed_dict for the placeholders filled with the next
        # `batch size` examples.
        images_feed, labels_feed = data_sets.train.next_batch(FLAGS.batch_size, FLAGS.fake_data)
        feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
        }

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  
        _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)


        # Print an overview every 100 steps
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.2f' % (step, loss_value))
            # The events file is updated with new summary values every time the summary is evaluated 
            # and the output passed to the writer's add_summary() function.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()
            
        # Evaluate the model periodically.
        if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:

            # Evaluate against the training set.
            print('Training Data Eval:')
            do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.train)
            # Evaluate against the validation set.
            print('Validation Data Eval:')
            do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.validation)
            # Evaluate against the test set.
            print('Test Data Eval:')
            do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.test)

    summary_writer.close()  # Always remember to close the summary writer 

### (11) Add 2nd Scalar Summary "Accuracy"

In addition to "loss", we now add a second scalar summary "Accuracy" to track the percent of correct predictions per batch every 100 steps.

![Graph](images/Screenshot from 2017-06-21 14-04-22.png)

![TensorBoard Scalar](images/Screenshot from 2017-06-20 17-49-49.png)

In [None]:
# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # If log directory exists, delete everything in it
tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# Remove all nodes from default graph
tf.reset_default_graph()

with tf.Graph().as_default():

    images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
    labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

    # verify the dimension of the placeholders
    print (images_placeholder.get_shape())
    print (labels_placeholder.get_shape())

    '''
    The Inference Engine
    '''

    # Hidden 1
    with tf.name_scope('hidden1'):
        # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
        weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
            stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
        # Likewise, the unique name given to the biases variable would be "hidden1/biases".
        biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

    # Hidden 2
    with tf.name_scope('hidden2'):
        # "hidden2/weights"
        weights = tf.Variable(
            tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
            name='weights')
        # "hidden2/biases"
        biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    # Linear
    with tf.name_scope('softmax_linear'):
        # "softmax_linear/weights"    
        weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
        # "softmax_linear/biases" 
        biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
        logits = tf.matmul(hidden2, weights) + biases

    '''
    The Loss Function
    '''
    with tf.name_scope('softmax'):
        labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
        # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
        # inference engine
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

        # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
        loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

    '''
    The Training Op
    '''
    with tf.name_scope('ADAM'):
      

        # Create the gradient descent optimizer with the given learning rate.
        optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
        # Create a variable to track the global step.
        global_step = tf.Variable(0, name='global_step', trainable=False)
        # Use the optimizer to apply the gradients that minimize the loss
        # (and also increment the global step counter) as a single training step.
        train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

    '''
    The Evaluation Op
    '''
    with tf.name_scope('eval'):
        # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
        # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
        correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
        # Return the number of true entries.
        eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')

    '''
    The Evaluation Op
    '''
    with tf.name_scope('Accuracy'):
        # Accuracy
        correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(tf.one_hot(labels, depth=10),1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        

    # tf.summary.scalar is an op for generating summary values into the events file when used with a 
    # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
    # summaries are written out.
    
    tf.summary.scalar('loss', loss)
    tf.summary.scalar('accuracy', accuracy)
    
    # Add the variable initializer Op.
    init = tf.global_variables_initializer()  

    # All the summaries (in this case, only tf.summary.scalar('loss', loss)) are collected into a single Tensor 
    # during the graph building phase.
    summary = tf.summary.merge_all() 

    '''
    The Session
    '''
    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # After the session is created, a tf.summary.FileWriter is instantiated to write the events files, which 
    # contain both the graph and the values of the summaries. 
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
    
    # Run the Op to initialize the variables.
    sess.run(init)

    '''
    The Train Loop
    '''
    for step in xrange(FLAGS.max_steps):

        # Create the feed_dict for the placeholders filled with the next
        # `batch size` examples.
        images_feed, labels_feed = data_sets.train.next_batch(FLAGS.batch_size, FLAGS.fake_data)
        feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
        }

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  
        _, loss_value, accuracy_value = sess.run([train_op, loss, accuracy], feed_dict=feed_dict)


        # Print an overview every 100 steps
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.3f accuracy = %.3f' % (step, loss_value, accuracy_value))
            # The events file is updated with new summary values every time the summary is evaluated 
            # and the output passed to the writer's add_summary() function.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()

    summary_writer.close()  # Always remember to close the summary writer 

### (12) Just for FUN!!!

Accuracy and Evaluation are two Ops that do more or less the same things.

Since the entire model is structured to process batches of data from the datasets, it makes sense to just add a extra operation within Evaluation Op to get the accuracy measurement:

```python
prediction = tf.div(tf.cast(eval_correct, tf.float32), FLAGS.batch_size)
```

That way, we can remove the Accuracy Op altogether!!!

![TensorBoard Graph](images/Screenshot from 2017-06-21 15-27-43.png)


In [None]:
def do_eval(sess,
            eval_correct,
            images_placeholder,
            labels_placeholder,
            data_set):
  """Runs one evaluation against the full epoch of data.

  Args:
    sess: The session in which the model has been trained.
    eval_correct: The Tensor that returns the number of correct predictions.
    images_placeholder: The images placeholder.
    labels_placeholder: The labels placeholder.
    data_set: The set of images and labels to evaluate, from
      input_data.read_data_sets().
  """
  # And run one epoch of eval.
  true_count = 0  # Counts the number of correct predictions.
  steps_per_epoch = data_set.num_examples // FLAGS.batch_size
  num_examples = steps_per_epoch * FLAGS.batch_size
  for step in xrange(steps_per_epoch):
    images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size, FLAGS.fake_data)
    feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
    }    
    true_count += sess.run(eval_correct, feed_dict=feed_dict)
  precision = float(true_count) / num_examples
  print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' %
        (num_examples, true_count, precision))
    

# Deal with the Log file - important for TensorBoard
if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)  # If log directory exists, delete everything in it
tf.gfile.MakeDirs(FLAGS.log_dir)  # Create the directory if it does not exist already

# Get the sets of images and labels for training, validation, and test on MNIST.
data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

# Remove all nodes from default graph
tf.reset_default_graph()

with tf.Graph().as_default():

    images_placeholder = tf.placeholder(tf.float32, shape=(FLAGS.batch_size, mnist.IMAGE_PIXELS), name = 'images')
    labels_placeholder = tf.placeholder(tf.int32, shape=(FLAGS.batch_size),name = 'truth_label')

    # verify the dimension of the placeholders
    print (images_placeholder.get_shape())
    print (labels_placeholder.get_shape())

    '''
    The Inference Engine
    '''

    # Hidden 1
    with tf.name_scope('hidden1'):
        # Created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".
        weights = tf.Variable(tf.truncated_normal([mnist.IMAGE_PIXELS, FLAGS.hidden1],
            stddev=1.0 / math.sqrt(float(mnist.IMAGE_PIXELS))), name='weights')
        # Likewise, the unique name given to the biases variable would be "hidden1/biases".
        biases = tf.Variable(tf.zeros([FLAGS.hidden1]), name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images_placeholder, weights) + biases)

    # Hidden 2
    with tf.name_scope('hidden2'):
        # "hidden2/weights"
        weights = tf.Variable(
            tf.truncated_normal([FLAGS.hidden1, FLAGS.hidden2],stddev=1.0 / math.sqrt(float(FLAGS.hidden1))),
            name='weights')
        # "hidden2/biases"
        biases = tf.Variable(tf.zeros([FLAGS.hidden2]), name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    # Linear
    with tf.name_scope('softmax_linear'):
        # "softmax_linear/weights"    
        weights = tf.Variable(tf.truncated_normal([FLAGS.hidden2, mnist.NUM_CLASSES],
                            stddev=1.0 / math.sqrt(float(FLAGS.hidden2))), name='weights')
        # "softmax_linear/biases" 
        biases = tf.Variable(tf.zeros([mnist.NUM_CLASSES]), name='biases')
        logits = tf.matmul(hidden2, weights) + biases

    '''
    The Loss Function
    '''
    with tf.name_scope('softmax'):
        labels = tf.to_int64(labels_placeholder) #typecasting in int64
    
        # This op produces 1-hot labels from the labels_placeholder and compare them against logits from the
        # inference engine
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

        # This op averages the cross entropy values across the batch dimension (the first dimension) as the total loss.
        loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

    '''
    The Training Op
    '''
    with tf.name_scope('ADAM'):
      

        # Create the gradient descent optimizer with the given learning rate.
        optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
        # Create a variable to track the global step.
        global_step = tf.Variable(0, name='global_step', trainable=False)
        # Use the optimizer to apply the gradients that minimize the loss
        # (and also increment the global step counter) as a single training step.
        train_op = optimizer.minimize(loss, global_step=global_step, name='minimize')

    '''
    The Evaluation Op
    '''
    with tf.name_scope('eval'):
        # For a classifier model, we can use the in_top_k Op. It returns a bool tensor with shape [batch_size] 
        # that is true for the examples where the label is in the top k (here k=1) of all logits for that example.
        correct = tf.nn.in_top_k(logits, labels, 1, name = 'top_k')
        # Return the number of true entries.
        eval_correct = tf.reduce_sum(tf.cast(correct, tf.int32), name = 'reduce_sum')
        prediction = tf.div(tf.cast(eval_correct, tf.float32), FLAGS.batch_size)

    # tf.summary.scalar is an op for generating summary values into the events file when used with a 
    # tf.summary.FileWriter. In this case, it will emit the snapshot value of the loss every time the
    # summaries are written out.
    
    tf.summary.scalar('loss', loss)
    tf.summary.scalar('prediction', prediction)
    
    # Add the variable initializer Op.
    init = tf.global_variables_initializer()  

    # All the summaries (in this case, only tf.summary.scalar('loss', loss)) are collected into a single Tensor 
    # during the graph building phase.
    summary = tf.summary.merge_all() 

    '''
    The Session
    '''
    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # After the session is created, a tf.summary.FileWriter is instantiated to write the events files, which 
    # contain both the graph and the values of the summaries. 
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
    
    # Run the Op to initialize the variables.
    sess.run(init)

    '''
    The Train Loop
    '''
    for step in xrange(FLAGS.max_steps):

        # Create the feed_dict for the placeholders filled with the next
        # `batch size` examples.
        images_feed, labels_feed = data_sets.train.next_batch(FLAGS.batch_size, FLAGS.fake_data)
        feed_dict = {
          images_placeholder: images_feed,
          labels_placeholder: labels_feed,
        }

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  
        _, loss_value, prediction_value = sess.run([train_op, loss, prediction], feed_dict=feed_dict)


        # Print an overview every 100 steps
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.3f accuracy = %.3f' % (step, loss_value, prediction_value))
            # The events file is updated with new summary values every time the summary is evaluated 
            # and the output passed to the writer's add_summary() function.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()
            
        # Evaluate the model every 1000 steps
        if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:

            # Evaluate against the training set.
            print('Training Data Eval:')
            do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.train)
            # Evaluate against the validation set.
            print('Validation Data Eval:')
            do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.validation)
            # Evaluate against the test set.
            print('Test Data Eval:')
            do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.test)

    summary_writer.close()  # Always remember to close the summary writer 

### (13) Parameter Optimization - Learning Rate

Learning rate for the ADAM Optimizer is a key parameter that we can optimize:

lr = 0.05 Test Data Precision: 0.94  
lr = 0.01 Test Data Precision: 0.974  
lr = 0.005 Test Data Precision: 0.976  
lr = 0.002 Test Data Precision: 0.978   
lr = 0.001 Test Data Precision: 0.977   
lr = 0.0005 Test Data Precision: 0.977   


In [None]:
# Sometimes a script may only parse a few of the command-line arguments, passing the remaining arguments on to another 
# script or program. parse_known_args() returns a two item tuple containing the populated namespace (into FLAG) and the
# list of remaining argument strings.
FLAGS, unparsed = parser.parse_known_args(['--max_steps','10000', '--learning_rate','0.05'])

# FLAGS is the Namespace which stores all the parameters
print (FLAGS)