# TensorFlow Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: 

### Convolutional Neural Network

Build a 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> convolution (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> convolution (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (256 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some tips:
- The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

- Since CNNs a more complex than the logistic regression and MLP models you've worked with before, so you may find it helpful to use a more advanced optimizer. You're model will train faster if you use [`tf.train.AdamOptimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) instead of `tf.train.GradientDescentOptimizer`. A learning rate of 1e-4 is a good starting point.

In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tqdm import trange     

  from ._conv import register_converters as _register_converters


In [3]:
# Import MNIST Data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [4]:
# 1st hidden layer with max pooling (convolution)

# Image Input 
# Input for MNIST data is flat. Reshap input image into a 4D batched image input
x_flat = tf.placeholder(tf.float32, [None, 784])
X0 = tf.reshape(x_flat, [-1, 28, 28, 1])

# Create convolutional kernel variable
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))

# Create bias variable
b1 = tf.Variable(tf.zeros([32]))

# Apply convolutional layer
conv1_preact = tf.nn.conv2d(X0, W1, strides=[1, 1, 1, 1], padding="SAME") + b1
conv1 = tf.nn.relu(conv1_preact)

# Print input/output shape
print(X0.shape)
print(conv1.shape)

# Max Pooling 
# Max pool and then print new shape
max_pool1 = tf.nn.max_pool(conv1, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
print(max_pool1.shape)



(?, 28, 28, 1)
(?, 28, 28, 32)
(?, 14, 14, 32)


In [5]:
# 2nd hidden layer with max pooling (convolution)

# Create convolutional kernel variable
W2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))

# Create bias variable
b2 = tf.Variable(tf.zeros([64]))

# Apply convolutional layer
conv2_preact = tf.nn.conv2d(max_pool1, W2, strides=[1, 1, 1, 1], padding="SAME") + b2
conv2 = tf.nn.relu(conv2_preact)

# Print input/output shape
print(max_pool1.shape)
print(conv2.shape)

# Max Pooling 
# Max pool and then print new shape
max_pool2 = tf.nn.max_pool(conv2, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
print(max_pool2.shape)
# 

(?, 14, 14, 32)
(?, 14, 14, 64)
(?, 7, 7, 64)


In [6]:
# 3rd hidden layer (fully connected with 256 hidden units + ReLu activation)

# Flatten convolutional feature maps into a vector
h_flat = tf.reshape(max_pool2, [-1, 7*7*64])

# Print output shape
print(h_flat.shape)

# Create weight matrix variable
W3 = tf.Variable(tf.truncated_normal([7*7*64, 256], stddev=0.1))
b3 = tf.Variable(tf.truncated_normal([256], stddev=0.1))

# Apply fully connected layer
A_preact_3 = tf.matmul(h_flat, W3) + b3
A3 = tf.nn.relu(A_preact_3)

print(A3.shape)

(?, 3136)
(?, 256)


In [7]:
# 4th hidden layer (fully connected with 10 hidden units + soft max activation)

# Create weight matrix variable
W4 = tf.Variable(tf.truncated_normal([256, 10], stddev=0.1))
b4 = tf.Variable(tf.truncated_normal([10], stddev=0.1))

# Apply fully connected layer
A_preact_4 = tf.matmul(A3, W4) + b4
A4 = tf.nn.softmax(A_preact_4)

print(A4.shape)
# This completes forward propagation where A4 gives the probability that the model thinks 
# that the image is of each of the 10 categories. Model predicts the max of these probabilities. 

(?, 10)


In [8]:
# Need to learn optimal W and b. 
    # Define Loss : Use cross entrophy, which is a common loss function for a classification problem 
    # Compute gradient using back propagation 

In [9]:
# Define Labels to evaluate the Loss Function 
y = tf.placeholder(tf.float32, [None, 10])

In [10]:
# Loss
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=A_preact_4, labels=y))

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



In [11]:
# Optimizer
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)

In [12]:
# Session 
# To train, we simply call the optimizer op we defined above. 
# First though, we need to start a session and initialize our variables:

# Create a session object and initialize all graph variables
sess = tf.Session()
sess.run(tf.global_variables_initializer())

In [14]:
# Train the model
# trange is a tqdm function. It's the same as range, but adds a pretty progress bar
for epoch in trange(50):
    for which_batch in range(550):
        batch_xs = mnist.train.images[which_batch*100:(which_batch+1)*100]
        batch_ys = mnist.train.labels[which_batch*100:(which_batch+1)*100]
        #print(batch_xs.shape)
        #print(batch_ys.shape)
        sess.run(train_step, feed_dict={x_flat: batch_xs, y: batch_ys})
        
    # Test trained model 
    correct_prediction = tf.equal(tf.argmax(A_preact_4, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x_flat: mnist.test.images, y: mnist.test.labels})))
        


  0%|          | 0/50 [00:00<?, ?it/s][A
  2%|▏         | 1/50 [01:06<54:06, 66.26s/it][A

Test accuracy: 0.978600025177002



  4%|▍         | 2/50 [02:14<53:42, 67.14s/it][A

Test accuracy: 0.982699990272522



  6%|▌         | 3/50 [03:20<52:25, 66.93s/it][A

Test accuracy: 0.9861999750137329



  8%|▊         | 4/50 [04:27<51:11, 66.78s/it][A

Test accuracy: 0.987500011920929



 10%|█         | 5/50 [05:33<50:00, 66.69s/it][A

Test accuracy: 0.9884999990463257



 12%|█▏        | 6/50 [06:40<48:58, 66.79s/it][A

Test accuracy: 0.9883999824523926



 14%|█▍        | 7/50 [07:47<47:51, 66.78s/it][A

Test accuracy: 0.9882000088691711



 16%|█▌        | 8/50 [08:54<46:44, 66.77s/it][A

Test accuracy: 0.9883000254631042



 18%|█▊        | 9/50 [10:00<45:36, 66.74s/it][A

Test accuracy: 0.9876999855041504



 20%|██        | 10/50 [11:07<44:30, 66.77s/it][A

Test accuracy: 0.9876999855041504



 22%|██▏       | 11/50 [12:14<43:25, 66.81s/it][A

Test accuracy: 0.9872000217437744



 24%|██▍       | 12/50 [13:22<42:20, 66.86s/it][A

Test accuracy: 0.9879000186920166



 26%|██▌       | 13/50 [14:29<41:15, 66.89s/it][A

Test accuracy: 0.9886000156402588



 28%|██▊       | 14/50 [15:36<40:08, 66.91s/it][A

Test accuracy: 0.9886000156402588



 30%|███       | 15/50 [16:45<39:06, 67.06s/it][A

Test accuracy: 0.9889000058174133



 32%|███▏      | 16/50 [17:53<38:01, 67.10s/it][A

Test accuracy: 0.9889000058174133



 34%|███▍      | 17/50 [19:01<36:55, 67.14s/it][A

Test accuracy: 0.9886999726295471



 36%|███▌      | 18/50 [20:09<35:49, 67.17s/it][A

Test accuracy: 0.9887999892234802



 38%|███▊      | 19/50 [21:16<34:43, 67.20s/it][A

Test accuracy: 0.9886999726295471



 40%|████      | 20/50 [22:24<33:36, 67.22s/it][A

Test accuracy: 0.9886000156402588



 42%|████▏     | 21/50 [23:32<32:30, 67.25s/it][A

Test accuracy: 0.9890000224113464



 44%|████▍     | 22/50 [24:40<31:24, 67.29s/it][A

Test accuracy: 0.9894999861717224



 46%|████▌     | 23/50 [25:53<30:24, 67.56s/it][A

Test accuracy: 0.9907000064849854



 48%|████▊     | 24/50 [27:05<29:20, 67.72s/it][A

Test accuracy: 0.9904000163078308



 50%|█████     | 25/50 [28:15<28:15, 67.81s/it][A

Test accuracy: 0.9905999898910522



 52%|█████▏    | 26/50 [29:24<27:08, 67.86s/it][A

Test accuracy: 0.9905999898910522



 54%|█████▍    | 27/50 [30:31<26:00, 67.85s/it][A

Test accuracy: 0.9907000064849854



 56%|█████▌    | 28/50 [31:39<24:52, 67.83s/it][A

Test accuracy: 0.9908000230789185



 58%|█████▊    | 29/50 [32:46<23:43, 67.81s/it][A

Test accuracy: 0.9908000230789185



 60%|██████    | 30/50 [33:53<22:35, 67.79s/it][A

Test accuracy: 0.9912999868392944



 62%|██████▏   | 31/50 [35:00<21:27, 67.76s/it][A

Test accuracy: 0.9912999868392944



 64%|██████▍   | 32/50 [36:07<20:19, 67.74s/it][A

Test accuracy: 0.9912999868392944



 66%|██████▌   | 33/50 [37:14<19:11, 67.73s/it][A

Test accuracy: 0.9912999868392944



 68%|██████▊   | 34/50 [38:23<18:04, 67.76s/it][A

Test accuracy: 0.9915000200271606



 70%|███████   | 35/50 [39:32<16:56, 67.80s/it][A

Test accuracy: 0.991599977016449



 72%|███████▏  | 36/50 [40:44<15:50, 67.90s/it][A

Test accuracy: 0.9914000034332275



 74%|███████▍  | 37/50 [41:51<14:42, 67.89s/it][A

Test accuracy: 0.9915000200271606



 76%|███████▌  | 38/50 [42:59<13:34, 67.88s/it][A

Test accuracy: 0.9915000200271606



 78%|███████▊  | 39/50 [44:06<12:26, 67.86s/it][A

Test accuracy: 0.9915000200271606



 80%|████████  | 40/50 [45:14<11:18, 67.85s/it][A

Test accuracy: 0.9915000200271606



 82%|████████▏ | 41/50 [46:21<10:10, 67.84s/it][A

Test accuracy: 0.9914000034332275



 84%|████████▍ | 42/50 [47:28<09:02, 67.82s/it][A

Test accuracy: 0.9914000034332275



 86%|████████▌ | 43/50 [48:36<07:54, 67.82s/it][A

Test accuracy: 0.9912999868392944



 88%|████████▊ | 44/50 [49:43<06:46, 67.80s/it][A

Test accuracy: 0.9912999868392944



 90%|█████████ | 45/50 [50:53<05:39, 67.85s/it][A

Test accuracy: 0.9912999868392944



 92%|█████████▏| 46/50 [52:03<04:31, 67.90s/it][A

Test accuracy: 0.9912999868392944



 94%|█████████▍| 47/50 [53:10<03:23, 67.88s/it][A

Test accuracy: 0.9914000034332275



 96%|█████████▌| 48/50 [54:17<02:15, 67.87s/it][A

Test accuracy: 0.9914000034332275



 98%|█████████▊| 49/50 [55:24<01:07, 67.85s/it][A

Test accuracy: 0.9914000034332275



100%|██████████| 50/50 [56:31<00:00, 67.83s/it][A
[A

Test accuracy: 0.9914000034332275


In [None]:
### YOUR CODE HERE ###




















### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

CNN is more accurate than Logistic regression and MLP. 

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: By trainable parameters, I mean individual scalars. For example, a weight matrix that is 10x5 has 50.*

5*5*32 + 32 + 5*5*32*64 + 64 + 7*7*64*256 +  256 + 256*10 + 10 = 857738

3\. When would you use a CNN versus a logistic regression model or an MLP?

CNN is very useful for image data. Specially high resolution images with many channels. 