Our goals:
- Learn about the MNIST data and softmax regressions
- Create a function that is a model for recognizing digits, based on looking at every pixel in the image
- Use TensorFlow to train the model to recognize digits by having it "look" at thousands of examples
- Check the model's accuracy with our test data

In [3]:
from tensorflow.examples.tutorials.mnist import input_data

In [4]:
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


- Every MNIST data point has two parts: an image of a handwritten digit and a corresponding label.
- Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers: a vector of 28x28 = 784 numbers.

- mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]
- The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.

- For the purposes of this tutorial, we're going to want our labels as "one-hot vectors". A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be [0,0,0,1,0,0,0,0,0,0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.

- When you want to assign probabilities to an object being one of several different things, you should consider 
softmax regression.
- Softmax gives us a list of values between 0 and 1 that add up to 1.
- This is a classic case where a softmax regression is a natural, simple model. 
- A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.
###### when we train more sophisticated models, the final step will be a layer of softmax.

- To tally up the evidence(remember a Linear regression) that a given image is in a particular class, we do a weighted sum of the pixel intensities. 
- We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input.

In [11]:
import tensorflow as tf

In [12]:
x = tf.placeholder(tf.float32, [None, 784])

In [14]:
W = tf.Variable(tf.zeros([784, 10]), dtype=tf.float32)
b = tf.Variable(tf.zeros([10]), dtype=tf.float32)
y = tf.nn.softmax(tf.matmul(x, W) + b) ##nn means neural network

In [16]:
#cross-entropy
y_ = tf.placeholder(tf.float32, [None,10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

In [19]:
#Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides many other optimization algorithms: using one is as 
#simple as tweaking one line.
train_step = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cross_entropy)

In [20]:
sess = tf.InteractiveSession()

In [21]:
tf.global_variables_initializer().run()

- All the variables are initialized. Our model is trained 10000 times by considering a subset of 1000 from the mnist dataset

In [23]:
for i in range(10000):
    batch_xs, batch_ys = mnist.train.next_batch(1000)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

In [25]:
#Evaluation of our model
#tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis.
#tf.equal to check if our prediction matches the truth.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

In [28]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [30]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9265
