https://www.tensorflow.org/tutorials/mnist/pros/

TF依赖高效的C++后端（backend）来进行计算。和backend的连接被称作session。TF的使用通常是先创建图，然后在一个session中启动它。
使用tf.InteractiveSession()可以交互地创建operation运行图，也就是说不用完全构建出完整的computation graph就可以开始一个session，启动图。

In [36]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [23]:
import tensorflow as tf
sess = tf.InteractiveSession()

In [24]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)
    # tf.Variable是图中的一个tensor，可以理解为图中的一种参数类型

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

2D convolution  
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)  
strides: 1-D of size 4. The stride of the sliding window for each dimension of input.

inpur tensor的shape [batch, in_height, in_width, in_channels]  
filter tensor的shape [filter_height, filter_width, in_channels, out_channels]  

conv2d的处理过程   
1.将filter flatten成2-D矩阵， shape [filter_height \* filter_width \* in_channels, output_channels]  
2.从输入tensor中提取image patches，构成虚拟tensor， shape [batch, out_height, out_width, filter_height \* filter_width \* in_channels]  
3.对于每一个patch， 右乘filter matrix和image patch vector  

In NHWC format （default）：  
$output[b, i, j, k] = sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *                   filter[di, dj, q, k]$  
必须有strides[0]=strides[3]=1， strides = [1, stride, stride, 1],通常水平和垂直的stride一样

In [25]:
#Abstract the conlution and pooing operations into  functions
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1, 2, 2, 1], padding='SAME')

In [26]:
x = tf.placeholder(dtype=tf.float32, shape=[None, 784])
y_ = tf.placeholder(dtype=tf.float32, shape=[None, 10])

In [39]:
# layer 1
W_conv1 = weight_variable([5, 5, 1, 32]) # [path_height, patch_width, in_channels, out_channels]
b_conv1 = bias_variable([32])

# 我们需要对将x reshape成4D tensor。中间两维对应图片height和width，最后一维对应color channels
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# layer 2
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# fc layer
W_fc3 = weight_variable([7*7*64, 1024])
b_fc3 = bias_variable([1024])

h_flat3 = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc3 = tf.nn.relu(tf.matmul(h_flat3, W_fc3) + b_fc3)

In [40]:
# 加入dropout科技，防止过拟合,一般加在全连接层。 卷积层用于提取特征
keep_prob = tf.placeholder(tf.float32) # 训练时使用dropout，测试时关闭
# tf.nn.dropout操作可以自动地缩放神经元的输出，达到mask的目的
h_fc3_dropout = tf.nn.dropout(h_fc3, keep_prob)

In [41]:
W_fc4 = weight_variable([1024, 10])
b_fc4 = bias_variable([10])

y_conv = tf.matmul(h_fc3_dropout, W_fc4) + b_fc4

In [42]:
learning_rate = 1e-4
training_steps =  1000
batch_size = 64

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_, axis=1)) # 得到bool向量
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.global_variables_initializer())
for i in range(training_steps):
    batch = mnist.train.next_batch(batch_size)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_:batch[1], keep_prob: 1.0})
        # eval 执行一个operation
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g" % accuracy.eval(feed_dict = {x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.078125
step 100, training accuracy 0.859375
step 200, training accuracy 0.90625
step 300, training accuracy 0.953125
step 400, training accuracy 0.953125
step 500, training accuracy 0.96875
step 600, training accuracy 0.921875
step 700, training accuracy 0.96875
step 800, training accuracy 0.96875
step 900, training accuracy 0.953125
test accuracy 0.967
