### Convolutional Neural Networks

###### Keywords: Local connection, Weight sharing, Pooling or Down-sampling.

In pattern recognition field, feature extraction always comes to a significant role. Before deep learning, Engineers and Scientists used SIFT, HoG algorithms for feature extraction, and SVM algorithms for image recognition.

CNN was proposed to get rid of the complicated feature extraction procedure. CNN inputs the original pixel values, and the structrue shares the weights so that weight sharing leads a reduction on number of hyperparameters. In other words, CNN can eliminates the overfitting and limited the complication of the model.

Generally, the convolutional neural network is made up of several convolutional layers. And in each layer, the manipulations would like to be:<br>
1. Image filtered by different convolutional kernel, and added with bias so that the local features extracted. Obvious, each kernel gives a new 2-D image;<br>
2. Non-linear activated of filtered results by functions such as ReLU, or Sigmoid;<br>
3. Pooling for activated results, in other word, down-sampling. Max-Pooling used for restoring the most significant features.<br>
there are more tricks applied in CNN, such as LRN (Local Response Normalization) and Batch Normalization.

Other feature of the CNN is locally-connect, which was inspired by receptive feild of the biological retina. Shared-weights benefit the dramatically decreasing in hyperparams, since there are no more worries about the how large the image is or how many the hidden nodes are, the hyperparams are cordinates with the kernel size itself.

However a single convolutional kernel is aimed to extract one certain feature which formed a feature map, we can use more kernels to extract more feature. 

The hyperparams have been decreased while the hidden nodes are depends on the step size of the kernel.

#### Case Study: LeNet5

Invented in 1994 by Yann LeCun. Some of features are still used in state-of-the-art CNN.<br>
* Each conv-layer: Convolution, Pooling, and Non-linear Activation.
* Convolution to estract spacial features.
* Average pooling for subsample.
* Tanh or Sigmoid for activation.
* MLP as final classifier (Euclidean Radial Basis Function).
* Sparse conncet between layers.

Next I am going to implement a simple CNN with two conv-layers, and one fully connected layer.

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
sess = tf.InteractiveSession()

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In order to create the weights and bias, it is important to add random noise for initialized weights to destroy symmetry, for example, truncated normal distributed with std=0.1. Since we are using ReLU, dead neurons should be avoided by adding small value 0.1 to bias.

In [3]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')

def max_pool2by2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1,28,28,1])

# 1st conv layer
W_conv1 = weight_variable([5,5,1,32]) # kernel size 5*5, cahnnel 1, 32 kernels
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool2by2(h_conv1)

# 2nd conv layer
W_conv2 = weight_variable([5,5,32,64]) # 32 kernels
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool2by2(h_conv2)

# fully connected layer
W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1)+b_fc1)

# dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Softmax after dropout
W_fc2 = weight_variable([1024,10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# loss function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_conv),reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

In [None]:
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

tf.global_variables_initializer().run()
for i in range(2000):
    batch = mnist.train.next_batch(50)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
        print("Step %d, taining accuracy %g"%(i,train_accuracy))
    train_step.run(feed_dict={x:batch[0], y_:mnist.test.labels, keep_prob:0.5})
    
#print("test accuracy %g"%accuracy.eval(feed_dict={x:mnist.test.images, y_=mnist.test.labels, keep_prob:1.0}))

Since the weight share, convolutional neural netwrk did not get hyperparameters increassd dramatically, and limits the compuatation or overfiting.