## Mnist
MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:

![mnist digits](mnist_example_digits.png)

It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.

### Mnist Model
A model to take the raw images as the input and predict what digits they are. 

### What we do today
Our goal isn't to train a really elaborate model that achieves state-of-the-art performance -- (You can find it in tensorflow.org though) -- but rather to dip a toe into using TensorFlow. As such, we're going to start with a very simple model, called a Softmax Regression.

### Mnist data
The MNIST data is hosted on Yann LeCun's website. For your convenience, we've already included it in our tutorial. Each image is 28 pixels by 28 pixels. We interpret this as a big array of 28x28=784 float numbers. Below is a image for 1:

![mnist digits](mnist_example_array.png)

### Softmax regression
We want to produce a probability distribution for a given input to decide which digit it is. Such task is well handled by the softmax regression. The architecture is shown below (y is the final probability for each class):

![softmax regression](softmax_regression.png)

The math behind softmax:
![softmax regression](softmax_equation.png)


In [1]:
import tensorflow as tf

# Import data
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('data/', one_hot=True)

sess = tf.InteractiveSession()

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz


In [8]:
# Let's take a look at one of the images.
image_x, image_y = mnist.train.next_batch(1)
print image_x[0]



In [7]:

# Create the variables first
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))


In [8]:

# The whole model is just one line.
y = tf.nn.softmax(tf.matmul(x, W) + b)


In [10]:

# Define loss, which is what we want to optimize/minimize
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

# Define the training operation, which is to minimize the loss
train_op = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)


In [12]:

# Define the way to compute its accuracy, this is helpful to track how the training goes
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Initialize all the variables before training starts
tf.initialize_all_variables().run()

In [14]:
# Now we can start the training loop
for i in range(200):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  if i%50 == 0:
    train_accuracy = accuracy.eval(feed_dict={x:batch_xs, y_:batch_ys})
    print("step %6d, training accuracy %.2f"%(i, train_accuracy))
  train_op.run({x: batch_xs, y_: batch_ys})



In [15]:

# Test trained model with a different set of images
print "test accuracy %.2f" % accuracy.eval({x: mnist.test.images, y_: mnist.test.labels})

test accuracy 0.91
