## 循环神经网络（RNN）
这个例子来自于Aymeric Damien的[Tensorflow实例](https://github.com/aymericdamien/TensorFlow-Examples/ ),使用的RNN模块为LSTM。大家对LSTM不了解的话可以参考这篇博客：[理解LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)。

在这个例子中我们把minist中的每个实例（28*28的一个图像）看作一个序列，这个序列的长度（timestep）是28。每个序列元素的维度（input dimension）也是28。

如果把这应用到自然语言处理中来处理句子，那么序列的长度就是最长句子的长度（其它短的句子需要补0）。而序列元素的维度就是单词对应的向量的值（word embedding）。

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

In [24]:
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

lr = 0.001
training_iters = 100000
batch_size = 128

n_inputs = 28   # MNIST data input (img shape: 28*28)
n_steps = 28    # time steps
n_hidden_units = 128   # neurons in hidden layer
n_classes = 10      # MNIST classes (0-9 digits)

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_classes])

# Define weights
weights = {
    # (28, 128)
    'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units])),
    # (128, 10)
    'out': tf.Variable(tf.random_normal([n_hidden_units, n_classes]))
}
biases = {
    # (128, )
    'in': tf.Variable(tf.constant(0.1, shape=[n_hidden_units, ])),
    # (10, )
    'out': tf.Variable(tf.constant(0.1, shape=[n_classes, ]))
}

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [27]:
def RNN(X, weights, biases):

    # hidden layer for input to cell
    ########################################

    # transpose the inputs shape from
    # X ==> (128 batch * 28 steps, 28 inputs)
    X = tf.reshape(X, [-1, n_inputs])

    # into hidden
    # X_in = (128 batch * 28 steps, 128 hidden)
    X_in = tf.matmul(X, weights['in']) + biases['in']
    # X_in ==> (128 batch, 28 steps, 128 hidden)
    X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])

    # cell
    ##########################################

    # basic LSTM Cell.
    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
    # lstm cell is divided into two parts (c_state, h_state)
    # 注意在第一步的时候我们一般把初始的cell state设为全0
    init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)

    # 下面我们有两个选择：
    # 1: tf.nn.rnn(cell, inputs);
    # 2: tf.nn.dynamic_rnn(cell, inputs).
    # 如果使用第一个，那么input就是一个list，list的长度为timestep，list中的每个元素都是一个tensor，shape为[batch_size, input_size]
    # 也可以看 https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py
    # 这儿我们使用的是第二种
    # inputs是一个tensor：如果time_major=False，shape为 (batch, steps, inputs) 否则为 (steps, batch, inputs).

    outputs, final_state = tf.nn.dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=False)

    # 隐藏层的状态是作为输出状态使用的，这些状态在输出之后一般还要经过一个全连接层，在这个例子中我们只需要
    # 最后一个状态。一般最后一个状态都会直接返回给我们，我们一个可以自己把最后一个状态提取出来
    #############################################
    # 直接利用最后一个隐藏状态
    # results = tf.matmul(final_state[1], weights['out']) + biases['out']

    # # 或者
    # unpack to list [(batch, outputs)..] * steps
    # 注意我们首先需要把batch 和 steps 换一下位置，因为unpack后返回的是一个list。
    outputs = tf.unpack(tf.transpose(outputs, [1, 0, 2]))    # states is the last outputs
    results = tf.matmul(outputs[-1], weights['out']) + biases['out']

    return results

In [28]:
pred = RNN(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
train_op = tf.train.AdamOptimizer(lr).minimize(cost)

correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [29]:
# Launch the graph
with tf.Session() as sess:
    if int((tf.__version__).split('.')[1]) < 12:
        init = tf.initialize_all_variables()
    else:
        init = tf.global_variables_initializer()
    sess.run(init)
    step = 0
    while step * batch_size < training_iters:
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        batch_xs = batch_xs.reshape([batch_size, n_steps, n_inputs])
        sess.run([train_op], feed_dict={
            x: batch_xs,
            y: batch_ys,
        })
        if step % 20 == 0:
            print(sess.run(accuracy, feed_dict={
            x: batch_xs,
            y: batch_ys,
        }))
        step += 1


0.195312
0.585938
0.710938
0.765625
0.859375
0.820312
0.914062
0.953125
0.90625
0.90625
0.875
0.90625
0.9375
0.914062
0.945312
0.976562
0.90625
0.945312
0.921875
0.9375
0.90625
0.953125
0.960938
0.9375
0.976562
0.960938
0.96875
0.96875
0.976562
0.953125
0.9375
0.96875
0.984375
0.929688
0.984375
0.960938
0.921875
0.960938
0.960938
0.976562
