First of all, we have to import all the necessary libraries:

In [None]:
import input_data
import tensorflow as tf
import matplotlib.pyplot as plt

We use the input_data.read function introduced in Chapter 3, Starting with Machine Learning, in the MNIST dataset section, to upload the images to our problem:

In [None]:
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Then we set the total number of epochs for the training phase:

In [None]:
training_epochs = 25

We must also define other parameters that are necessary to build a model:

In [None]:
learning_rate = 0.01
batch_size = 100
display_step = 1

<h2>Building the model</h2>
Define x as the input tensor; it represents the MNIST data image of size 28 x 28 = 784 pixels:

In [None]:
x = tf.placeholder("float", [None, 784])

the output we're going to get will be an output tensor with 10 probabilities, each one corresponding to a digit (of course the sum of probabilities must be one):

In [None]:
y = tf.placeholder("float", [None, 10])

To assign probabilities to each image, we will use the so-called softmax activation function. The softmax function is specified in two main steps:
Calculate the evidence that a certain image belongs to a particular class Convert the evidence into probabilities of belonging to each of the 10 possible
classes
To evaluate the evidence, we first define the weights input tensor as W:

In [None]:
W = tf.Variable(tf.zeros([784, 10]))

For a given image, we can evaluate the evidence for each class i by simply multiplying the tensor W with the input tensor x. Using TensorFlow, we should have something like the following:
       evidence = tf.matmul(x, W)
       
In general, the models include an extra parameter representing the bias, which indicates a certain degree of uncertainty. In our case, the final formula for the evidence is as follows:
       evidence = tf.matmul(x, W) + b
       
It means that for every i (from 0 to 9) we have a Wi matrix elements 784 (28x28), where each element j of the matrix is multiplied by the corresponding component j of the input image (784 parts) is added and the corresponding bias element bi.
So to define the evidence, we must define the following tensor of biases:       

In [None]:
b = tf.Variable(tf.zeros([10]))

The second step is to finally use the softmax function to obtain the output vector of probabilities, namely activation:

In [None]:
activation = tf.nn.softmax(tf.matmul(x, W) + b)

In order to train our model and know when we have a good one, we must define how to define the accuracy of our model. Our goal is to try to get values of parameters W and b that minimize the value of the metric that indicates how bad the model is.
Different metrics calculated degree of error between the desired output and the training data outputs. A common measure of error is the mean squared error or the Squared Euclidean Distance. However, there are some research findings that suggest to use other metrics to a neural network like this.
In this example, we use the so-called cross-entropy error function. It is defined as:
       <b>cross_entropy = y*tf.lg(activation)</b>
In order to minimize cross_entropy, we can use the following combination of tf.reduce_mean and tf.reduce_sum to build the cost function:

In [None]:
cross_entropy = y*tf.log(activation)

In order to minimize cross_entropy, we can use the following combination of tf.reduce_mean and tf.reduce_sum to build the cost function:

In [None]:
cost = tf.reduce_mean(-tf.reduce_sum(cross_entropy, reduction_indices=1))

Then we must minimize it using the gradient descent optimization algorithm:

In [None]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

<h2>Launch the session</h2>
It's time to build the session and launch our neural net model. We fix the following lists to visualize the training session:

In [None]:
avg_set = []
epoch_set=[]

Then we initialize the TensorFlow variables:

In [None]:
init = tf.global_variables_initializer()

Start the session:

In [None]:
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
            # Compute average loss
            avg_cost += sess.run(cost,feed_dict={x: batch_xs, y: batch_ys})/total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1),"cost=", "{:.9f}".format(avg_cost))
        avg_set.append(avg_cost)
        epoch_set.append(epoch+1)
    print("Training phase finished")
    plt.plot(epoch_set,avg_set, 'o',label='Logistic Regression Training phase')
    plt.ylabel('cost')
    plt.xlabel('epoch')
    plt.legend()
    plt.show()
    # Test model
    correct_prediction = tf.equal(tf.argmax(activation, 1),tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Model accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

