# Tensorflow - Deep MNIST for Experts

The 1st part of this tutorial explains what is happening in the mnist_softmax.py code, which is a basic implementation of a Tensorflow model. 

The 2nd part shows some ways to improve the accuracy.

## Goals for this Tutorial

1. Create a softmax regression function that is a model for recognizing MNIST digits, based on looking at every pixel in the image

2. Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)

3. Check the model's accuracy with our test data

4. Build, train, and test a multilayer convolutional neural network to improve the results

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## Start TensorFlow InteractiveSession

TensorFlow relies on a highly efficient C++ backend to do its computation. 

The connection to this backend is called a **session**.

The common usage for TensorFlow programs is to first **create a graph** and then **launch it in a session**.

Here we instead use the convenient **InteractiveSession class**, which makes TensorFlow more flexible about how you structure your code. 

It allows you to interleave operations which build a computation graph with ones that run the graph. 

This is particularly convenient when working in interactive contexts like **IPython**. 

If you are not using an InteractiveSession, then you should build the entire computation graph before starting a session and launching the graph.

In [2]:
import tensorflow as tf
sess = tf.InteractiveSession()

### Computation Graph

To do efficient numerical computing in Python, we typically use libraries like **NumPy** that do expensive operations such as matrix multiplication outside Python, using highly efficient code implemented in another language. 

**Unfortunately, there can still be a lot of overhead from switching back to Python every operation.** This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data.

TensorFlow also does its heavy lifting outside Python, but it takes things a step further to avoid this overhead. **Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. This approach is similar to that used in Theano or Torch.**

The role of the Python code is therefore to **build this external computation graph** and to ** which parts of the computation graph should be run.** See the Computation Graph section of Basic Usage for more detail.

### Placeholders

In [3]:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

Placeholders are values we'll input when we ask TF to run a computation.

The **input images x** will consist of a 2d tensor of floating point numbers. Here we assign it a shape of [None, 784], where 784 is the dimensionality of a single flattened 28 by 28 pixel MNIST image, and None indicates that the first dimension, corresponding to the batch size, can be of any size. 

### Variables

In [4]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

We pass the initial value for each parameter in the call to tf.Variable. In this case, we initialize both W and b as tensors full of zeros. W is a 784x10 matrix (because we have 784 input features and 10 outputs) and b is a 10-dimensional vector (because we have 10 classes).

Before Variables can be used within a session, they must be **initialized** using that session. **This step takes the initial values (in this case tensors full of zeros) that have already been specified, and assigns them to each Variable. This can be done for all Variables at once:**

In [5]:
sess.run(tf.initialize_all_variables())

### Predicted Class and Loss Function

This is the regression model!

In [6]:
y = tf.matmul(x,W) + b

We can specify a loss function just as easily. **Loss indicates how bad the model's prediction was on a single example; we try to minimize that while training across all the examples.** 

Here, our loss function is the **cross-entropy between the target and the softmax activation function applied to the model's prediction.** As in the beginners tutorial, we use the stable formulation:

tf.nn.softmax_cross_entropy_with_logits internally applies the softmax on the model's unnormalized model prediction and sums across all classes, and tf.reduce_mean takes the average over these sums.

In [7]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))

With our regression model and cost function defined, let's train the neural network!

We'll use steepest gradient descent with a step size of 0.5 to decrease the cross entropy.

In [8]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Let's iterate 1000 times to lower the cost of the parameters!

In [9]:
for i in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

### Evaluate the Model

tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the true label. We can use tf.equal to check if our prediction matches the truth.

In [10]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

This returns a list of True / Falses. To determine the ratio that were predicted correctly (accuracy), we need to cast the list to floating point numbers and take the mean!

In [15]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.922


## Build a Multilayer Convolutional Network

### Weight Initialization

Initialize weights with a small amount of noise for symmetry breaking and to prevent 0 gradients.

We're using ReLU neurons so make sure to initialize them with slightly positival initial bias to avoid "dead neurons"

In [12]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

In [13]:
a = np.array([1,2,3,4])
print(a)

[1 2 3 4]
