In [None]:
import input_data
import tensorflow as tf
import matplotlib.pyplot as plt

In [None]:
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

In [None]:
learning_rate = 0.001
training_epochs = 20
batch_size = 100
display_step = 1
n_hidden_1 = 256
n_hidden_2 = 256
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10

<h2>Build the model</h2>
The input layer is the x tensor [1x784], which represents the image to classify:

In [None]:
x = tf.placeholder("float", [None, n_input])

The output tensor y is equal to the number of classes:

In [None]:
y = tf.placeholder("float", [None, n_classes])

In the middle, we have two hidden layers. The first layer is constituted by the h tensor of weights, whose size is [784x256], where 256 is the total number of nodes of the layer:

In [None]:
h = tf.Variable(tf.random_normal([n_input, n_hidden_1]))

For layer 1, so we have to define the respective biases tensor:

In [None]:
bias_layer_1 = tf.Variable(tf.random_normal([n_hidden_1]))

Each neuron receives the pixels of input image to be classified combined with the hij weight connections and added to the respective values of the biases tensor:

In [None]:
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x,h),bias_layer_1))

The second intermediate layer is represented by the shape of the weights tensor [256x256]:

In [None]:
w = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))

With the tensor of biases:

In [None]:
bias_layer_2 = tf.Variable(tf.random_normal([n_hidden_2]))

Each neuron in this second layer receives inputs from the neurons of layer 1, combined with the weight Wij connections and added to the respective biases of layer 2:

In [None]:
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1,w),bias_layer_2))

It sends its output to the next layer, namely the output layer:

In [None]:
output = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
bias_output = tf.Variable(tf.random_normal([n_classes]))
output_layer = tf.matmul(layer_2, output) + bias_output

The output layer receives as input n-stimuli (256) coming from layer 2, which is converted to the respective classes of probability for each number.
As for the logistic regression, we then define the cost function:

In [None]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output_layer, labels=y))

The TensorFlow function tf.nn.softmax_cross_entropy_with_logits computes the cost for a softmax layer. It is only used during training. The logits are the unnormalized log probabilities output the model (the values output before the softmax normalization is applied to them).

The corresponding optimizer that minimizes the cost function is:

In [None]:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

tf.train.AdamOptimizer uses Kingma and Ba's Adam algorithm to control the learning rate. Adam offers several advantages over the simple tf.train.GradientDescentOptimizer. In fact, it uses a larger effective step size, and the algorithm will converge to this step size without fine tuning.

A simple tf.train.GradientDescentOptimizer could equally be used in your MLP, but would require more hyper parameter tuning before it could converge as quickly.

<h2>Launch the session</h2>
The following are the steps to launch the session: 
1. Plot the settings:

In [None]:
avg_set = []
epoch_set=[]

2. Initialize the variables:

In [None]:
init = tf.global_variables_initializer()

3. Launch the graph:

In [None]:
with tf.Session() as sess:
    sess.run(init)
    #4. Define the training cycle:
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # 5. Loop over all the batches (100):
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # 6. Fit training using the batch data:
            sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
            # 7. Compute the average loss:
            avg_cost += sess.run(cost,feed_dict={x: batch_xs, y: batch_ys})/total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
            avg_set.append(avg_cost)
            epoch_set.append(epoch+1)
    print("Training phase finished")
    # 8. With these lines of codes, we plot the training phase:
    plt.plot(epoch_set,avg_set, 'o', label='MLP Training phase')
    plt.ylabel('cost')
    plt.xlabel('epoch')
    plt.legend()
    plt.show()
    # 9. Finally, we can test the MLP model:
    correct_prediction = tf.equal(tf.argmax(output_layer, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Model Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))