# Recurrent Neural Network Example

Build a recurrent neural network (LSTM) with TensorFlow 2.0.

- Author: Aymeric Damien
- Project: https://github.com/aymericdamien/TensorFlow-Examples/

## RNN Overview

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" alt="nn" style="width: 600px;"/>

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.

More info: http://yann.lecun.com/exdb/mnist/

In [1]:
from __future__ import absolute_import, division, print_function

#导入tf2.0
import tensorflow as tf
from tensorflow.keras import Model, layers
import numpy as np

In [2]:
# MNIST数据集参数
num_classes = 10 # total classes (0-9 digits).
num_features = 784 # data features (img shape: 28*28).

# 训练参数
learning_rate = 0.001
training_steps = 1000
batch_size = 32
display_step = 100

# 网络参数
# MNIST图像形状是28*28px,然后我们将为每个样本处理28个时间步长的28个序列
num_input = 28 # 序列数
timesteps = 28 # 时间步长
num_units = 32 # LSTM层的神经元数量

In [3]:
# 准备MNIST数据集
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 转换为float32类型
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)
# 将图片展平为784个特征的一维矢量(28*28).
x_train, x_test = x_train.reshape([-1, 28, 28]), x_test.reshape([-1, num_features])
# 将图像值从[0, 255]标准化为[0, 1].
x_train, x_test = x_train / 255., x_test / 255.

In [4]:
# 使用tf.data API随机播放和批量处理数据
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)

In [5]:
# 创建LSTM模型
class LSTM(Model):
    # 设置图层
    def __init__(self):
        super(LSTM, self).__init__()
        # RNN (LSTM) 隐藏层
        self.lstm_layer = layers.LSTM(units=num_units)
        self.out = layers.Dense(num_classes)

    # 设置向前通过
    def call(self, x, is_training=False):
        # LSTM 层
        x = self.lstm_layer(x)
        # 输出层 (num_classes).
        x = self.out(x)
        if not is_training:
            # tf 交叉熵期望logits 没有softmax, 因此仅不训练时应用softmax
            x = tf.nn.softmax(x)
        return x

# 建立LSTM模型
lstm_net = LSTM()

In [6]:
# 交叉熵损失
# 注意，这会将softmax应用于logit
def cross_entropy_loss(x, y):
    # 为tf交叉熵函数将标签转换为int 64
    y = tf.cast(y, tf.int64)
    # 将softmax应用于logits并且计算交叉熵
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=x)
    # 批次中的平均损失
    return tf.reduce_mean(loss)

# 精度指标
def accuracy(y_pred, y_true):
    # 预测类是预测向量中最高得分的索引 (即argmax).
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)

# Adam优化器
optimizer = tf.optimizers.Adam(learning_rate)

In [7]:
# 优化过程. 
def run_optimization(x, y):
    # 将计算包装在GradientTape内以自动区分
    with tf.GradientTape() as g:
        # 设置向前通过
        pred = lstm_net(x, is_training=True)
        # 计算损失
        loss = cross_entropy_loss(pred, y)
        
    # 要更新的变量，即可训练的变量
    trainable_variables = lstm_net.trainable_variables

    # 计算gradients.
    gradients = g.gradient(loss, trainable_variables)
    
    # 按照梯度更新权重
    optimizer.apply_gradients(zip(gradients, trainable_variables))

In [8]:
# 按照给定的步骤数进行训练
for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
    # 运行优化以更新W 和b 的值
    run_optimization(batch_x, batch_y)
    
    if step % display_step == 0:
        pred = lstm_net(batch_x, is_training=True)
        loss = cross_entropy_loss(pred, batch_y)
        acc = accuracy(pred, batch_y)
        print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))

step: 100, loss: 1.663173, accuracy: 0.531250
step: 200, loss: 1.034144, accuracy: 0.750000
step: 300, loss: 0.775579, accuracy: 0.781250
step: 400, loss: 0.840327, accuracy: 0.781250
step: 500, loss: 0.344379, accuracy: 0.937500
step: 600, loss: 0.884484, accuracy: 0.718750
step: 700, loss: 0.569674, accuracy: 0.875000
step: 800, loss: 0.401931, accuracy: 0.906250
step: 900, loss: 0.530193, accuracy: 0.812500
step: 1000, loss: 0.265871, accuracy: 0.968750
