# Logistic Regression

In [13]:
%reset -f
import tensorflow as tf
tf.reset_default_graph()
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
%matplotlib inline

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


First of all we need to create `placeholder` objects for our data. These are objects that we put inside the default Tensorflow Graph. The first point in our execution sequence.

In [14]:
X = tf.placeholder(tf.float32, shape = [None, 784])
Y = tf.placeholder(tf.float32, shape = [None, 10])

These are empty matrices which can be populated with data later on. `X` will contain our vectorised images, and `Y` the one hot class labels.

Now we can create similar placeholders for the model weights, termed `Variable` objects in Tensorflow. These will be initialized to random normal values.

In [15]:
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.random_normal([1,10]))

Now we form our predictions of what y is based on our weights W and the bias term b

In [16]:
y = tf.matmul(X,W) + b

How close are we to the predictions? We need a definition of logistic error. Since each image only has one class, the total probability over all the class is 1. Therefore we can use the `softmax` function.

In [17]:
cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y,Y))

Now we need to define the optimiser. Tensorflow handles much of the work here, we need only to specify the type of backpropagation we require. For example we could choose the the `GradientDescentOptimizer` or the `AdamOptimizer`.

In [18]:
optimiser = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

It also would be quite useful if we were also able to test the accuracy of our model on unseen data, for this we need a measure of accuracy.

In [19]:
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(y,1), tf.arg_max(Y,1)), tf.float32))

Before we train, we need to initialize all our variables

In [20]:
init = tf.initialize_all_variables()

Now we can train our network

We are using <b>Stochastic Gradient Descent</b>, this means that batches of input are fed into the network, and the network weights update accordingly. A single run through all the training instances is known as a full <b>epoch</b>. We will be using batches of size 100, and so the number of batches per epoch is the total number of training examples (55000), divided by the batch size (100), which equals 550.

In [21]:
batch_size = 100
n_training = len(mnist.train.labels)
n_batches = n_training/batch_size
print n_batches

550


In [22]:
len(mnist.train.labels)

55000

In [23]:
with tf.Session() as sess:
    sess.run(init)
    for ep in range(20):
        for b in range(n_batches):
            b_images, b_labels = mnist.train.next_batch(batch_size)
            _, c = sess.run([optimiser, cost], feed_dict={X:b_images, Y:b_labels})
        acc = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels})
        print "Cost: {}, Accuracy: {}".format(c, acc)
        
        
        
        
        

Cost: 44.4421463013, Accuracy: 0.864500105381
Cost: 59.498714447, Accuracy: 0.879700124264
Cost: 33.9364624023, Accuracy: 0.898000121117
Cost: 39.7270812988, Accuracy: 0.898500084877
Cost: 50.8886604309, Accuracy: 0.902200102806
Cost: 39.3054199219, Accuracy: 0.904600024223
Cost: 63.7918243408, Accuracy: 0.907300114632
Cost: 24.2948684692, Accuracy: 0.905900120735
Cost: 60.6184577942, Accuracy: 0.909900128841
Cost: 45.2948265076, Accuracy: 0.914600133896
Cost: 30.7770385742, Accuracy: 0.914900124073
Cost: 25.4723052979, Accuracy: 0.916400134563
Cost: 22.3743152618, Accuracy: 0.909700155258
Cost: 42.5977516174, Accuracy: 0.911200106144
Cost: 35.0974197388, Accuracy: 0.90180015564
Cost: 33.2313575745, Accuracy: 0.904500126839
Cost: 12.0502147675, Accuracy: 0.917600095272
Cost: 40.4304847717, Accuracy: 0.916300058365
Cost: 50.6873168945, Accuracy: 0.908900022507
Cost: 40.7444076538, Accuracy: 0.917300105095


As we can see, we don't do too badly. Our classifier is 91.8% accurate, however we can do much better. This is equivalent to doing logistic regression, which is still a linear model. We will be able to build non-linearity by combining multiple layers and non-linear activation functions.