# Introduction

In this tutorial, we'll be walking through the Tensorflow code behind creating a convolutional neural network. Understanding the code and concepts will require familiarity in creating neural networks with Tensorflow. If you want to review or learn about that, the notes from last week's workshop are here (TODO link).


In [14]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Let's definite some standard hyperparameters for our network. 

In [41]:
n_epochs = 20000
minibatch_size = 50
lr = 1e-4
keep = 0.5



# Inputs and Outputs

So, in this next step, we're just going to create a session. Your x and y_ are just going to place placeholders that basically just indicate the type of input you want in your CNN and the type of output. For each of these placeholders, you have to specify the type and the shape. 

In [16]:
x = tf.placeholder(tf.float32, shape=[None, 784], name='x-input')
y_ = tf.placeholder(tf.float32, shape=[None, 10],name='y-labels')

# Network Architecture

In [17]:
def weight_variable(shape):
    """Initializes weights randomly from a normal distribution
    Params: shape: list of dimensionality of the tensor to be initialized
    """
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    """Initializes the bias term randomly from a normal distribution.
    Params: shape: list of dimensionality for the bias term.
    """
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    """Performs a convolution over a given patch x with some filter W.
    Uses a stride of length 1 and SAME padding (padded with zeros at the edges)
    Params:
    x: tensor: the image to be convolved over
    W: the kernel (tensor) with which to convolve.
    """
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    """Performs a max pooling operation over a 2 x 2 region"""
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

In [18]:
x_image = tf.reshape(x, [-1,28,28,1]) # covert x to a 4-d tensor


In [19]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1))
h_pool1 = max_pool_2x2(h_conv1)


This just defines some methods to make the function calls a little nicer. 

In [20]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)


Fully connected layers

In [26]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)



In [27]:
W_fc2 = weight_variable([1024, 256])
b_fc2 = bias_variable([256])
h_fc2 = tf.nn.relu(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
h_fc2_drop = tf.nn.dropout(h_fc2, keep_prob)


In [28]:
W_fc3 = weight_variable([256, 10])
b_fc3 = bias_variable([10])
y_out = tf.matmul(h_fc2_drop, W_fc3) + b_fc3

In [35]:

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits = y_out, labels = y_)
train_step = tf.train.AdamOptimizer(lr).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_, axis = 1), tf.argmax(y_out, axis = 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()


In [None]:
with tf.Session() as sess:
    sess.run(init)
    for i in range(n_epochs):
        batch = mnist.train.next_batch(minibatch_size)
        if i % 100 == 0:
            print("epoch: {}".format(i))
            train_acc = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob : 1.0})
            print("training accuracy: {}".format(train_acc))
        sess.run([train_step], feed_dict = {x: batch[0], y_: batch[1], keep_prob : keep})
    test_acc = accuracy.eval(feed_dict = {x: mnist.test.images, y_: mnist.test.labels, keep_prob : 1.0})
    print("test accuracy: {}".format(test_acc))
    

epoch: 0
training accuracy: 0.1599999964237213


# Additional Resources

* CNN [tutorial](https://www.tensorflow.org/tutorials/deep_cnn) from the Tensorflow docs
* Stanford's [course](http://cs231n.github.io/convolutional-networks/) on CNNs
* Michael Nielson's [chapter](http://neuralnetworksanddeeplearning.com/chap6.html) on CNNs in his book
* Facebook's [video](https://www.facebook.com/Engineering/videos/10154673882797200/ ) on ML and CNNs