# 使用递归神经网络对序列数据建模

- 介绍序列数据
- 用于序列建模的RNN
- 长短期记忆LSTM
- 延时间截断反向传播(T-BPTT)
- 在TensorFlow实现一个用于序列建模的多层RNN
- 项目1 - 用RNN对IMDb电影评论数据集进行情感分析
- 项目2 - 使用来自莎士比亚《哈姆雷特》的文本数据，使用LSTM单元进行RNN字符级语言建模
- 使用梯度削波，以避免爆炸的梯度

## 介绍序列数据

### 序列数据建模 - 顺序关系

### 表示序列

![1](1.png)

RNN和CNN,MLP不同的地方在于:

RNN具有记忆过去信息的能力并对新数据进行相应的处理

### 不同种类的序列建模

![2](2.png)

- __Many-to-one__: 输入是一个序列但是输出是一个固定大小的向量,例如情感分析,输入是文本,输出是类标签
- __One-to-many__: 输入是一个标准形式但是输出是序列,例如图像描述,输入是图片,输入是一个英文短语
- __Many-to-many__: 输入输出都是序列,这个类别更进一步为基于输入输出是否同步,若同步,例如视频分类(视频每一帧都是有标签的).若不同步则比如将一种语言翻译成另一种语言

## 用于序列建模的RNN

### 理解RNN的结构和流

![3](3.png)

![4](4.png)

### 在RNN计算激活项

- $W_{xh}$: 输入层和隐藏层之间的权重矩阵
- $W_{hh}$: 递归边缘相关联的权重矩阵
- $W_{hy}$: 隐藏层和输出层之间的权重矩阵

![5](5.png)

net输入<br>
$z_h^{(t)} = W_{xh}x^{(t)} + W_{hh}h^{(h-1)} + b_n$

隐藏层的激活项为
\begin{equation}
\boldsymbol{h}^{(t)}=\phi_{h}\left(z_{h}^{(t)}\right)=\phi_{h}\left(\boldsymbol{W}_{x h} \boldsymbol{x}^{(t)}+\boldsymbol{W}_{h h} \boldsymbol{h}^{(t-1)}+\boldsymbol{b}_{h}\right)
\end{equation}

\begin{equation}
\boldsymbol{h}^{(t)}=\phi_{h}\left(\left[\boldsymbol{W}_{x h} ; \boldsymbol{W}_{h h}\right]\left[\begin{array}{c}{\boldsymbol{x}^{(t)}} \\ {\boldsymbol{h}^{(t-1)}}\end{array}\right]+\boldsymbol{b}_{h}\right)
\end{equation}

\begin{equation}
\boldsymbol{y}^{(t)}=\phi_{y}\left(\boldsymbol{W}_{h y} \boldsymbol{h}^{(t)}+\boldsymbol{b}_{y}\right)
\end{equation}

![6](6.png)

### 长期交互学习的挑战

所谓的vanishing或者exploding梯度问题

![7](7.png)

two solutions:
- TBPTT
- LSTM

### LSTM单元

![8](8.png)

$\odot$ refers to the element-wise product (element-wise multiplication) 
and $\oplus$ means element-wise summation (element-wise addition)

- forge gate($f_t$) 允许记忆单元重置细胞状态而不会无限期增长
$$
f_t=\sigma\left(W_x f x^{(t)}+W_{h f} h^{(t-1)}+b_{f}\right)
$$
- input gate($i_t$)和input node($g_t$)用于更新细胞状态
$$
i_{t}=\sigma\left(W_{x i} x^{(t)}+W_{h i} h^{(t-1)}+b_{i}\right)
$$
$$
g_{t}=\tanh \left(W_{x g} x^{(t)}+W_{h g} h^{(t-1)}+b_{g}\right)
$$
$$
C^{(t)}=\left(C^{(t-1)} \odot f_{t}\right) \oplus\left(i_{t} \odot g_{t}\right)
$$
- output gate($o_T$)决定隐藏层单元值的更新
$$
o_{t}=\sigma\left(W_{x o} x^{(t)}+W_{h o} h^{(t-1)}+b_{o}\right)
$$
$$
h_{(t)}=o_t \odot \tanh(C^{(t)})
$$

## 在TensorFlow实现一个用于序列建模的多层RNN

two common problems tasks:
- Sentiment analysis
- Language modeling

## 项目1 - 用RNN对IMDb电影评论数据集进行情感分析

### 准备数据

In [2]:
import pyprind
import pandas as pd
from string import punctuation
import re
import numpy as np

df = pd.read_csv('movie_data.csv', encoding='utf-8')

In [3]:
df.head()

Unnamed: 0,review,sentiment
0,"In 1974, the teenager Martha Moxley (Maggie Gr...",1
1,OK... so... I really like Kris Kristofferson a...,0
2,"***SPOILER*** Do not read this, if you think a...",0
3,hi for all the people who have seen this wonde...,1
4,"I recently bought the DVD, forgetting just how...",0


In [4]:
# Preprocessing the data:
# separate words and
# count each word's occurence
from collections import Counter

counts = Counter()
pbar = pyprind.ProgBar(len(df['review']), title='Counting words occurences')

for i, review in enumerate(df['review']):
    text = ''.join([c if c not in punctuation else ' ' +
                    c + ' ' for c in review]).lower()
    df.loc[i, 'review'] = text
    pbar.update()
    counts.update(text.split())

# create a mapping
# map each unique word to a integer
word_counts = sorted(counts, key=counts.get, reverse=True)
print(word_counts[:5])
word_to_int = {word: ii for ii, word in enumerate(word_counts, 1)}

mapped_reviews = []
pbar = pyprind.ProgBar(len(df['review']), title='Map reviews to int')

for review in df['review']:
    mapped_reviews.append([word_to_int[word] for word in review.split()])
    pbar.update()

Counting words occurences
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:02:23
Map reviews to int


['the', '.', ',', 'and', 'a']


0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


> 为了生成匹配RNN架构的输入数据,我们需要保证所有的sequences有相同的长度

![9](9.png)

In [9]:
# 持久化 mapped_reviews
import json

with open('mapped_reviews.txt', 'w') as f:
    f.write(json.dumps(mapped_reviews))
f.close()

In [10]:
import json
with open('mapped_reviews.txt', 'r') as f:
    mapped_reviews = json.loads(f.read())
f.close()

In [11]:
# sequence_length是一个可以被调优的超参

# Define same-length sequences
# if sequence length < 200: left_pad with zero
# if sequence length > 200: use the last 200 elements
sequence_length = 200
sequences = np.zeros((len(mapped_reviews), sequence_length), dtype=int)

for i,row in enumerate(mapped_reviews):
    review_arr = np.array(row)
    sequences[i,-len(row):] = review_arr[-sequence_length:]

In [12]:
X_train = sequences[:25000, :]
y_train = df.loc[:25000, 'sentiment'].values
X_test = sequences[25000:, :]
y_test = df.loc[25000:, 'sentiment'].values

In [13]:
# mini batch
np.random.seed(123)

# define a function to generate mini-batches


def create_batch_generator(x, y=None, batch_size=64):
    n_batches = len(x)//batch_size
    x = x[:n_batches*batch_size]
    if y is not None:
        y = y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        if y is not None:
            yield x[ii:ii+batch_size], y[ii:ii+batch_size]
        else:
            yield x[ii:ii+batch_size]

### embedding

embedding相比于one-hot的优点:
- 减小了特征空间的维数,降低了维数诅咒的效果
- 神经网络在embedding层进行主要特征的提取的过程是可训练的

![10](10.png)

创建一个embedding层需要两步:
- 创建一个$[n\_words \times embedding\_size]$的tensor,并用[-1,1]之间的随机浮点数来初始化它
```python
embedding = tf.Variable(tf.random_uniform(shape=(n_words, embedding_size),
                                         minval=-1, maxval=1))
```
- 使用`tf.nn.embedding_lookup`函数查找`tf_x`中每一个元素在embedding矩阵所关联的行
```
embed_x = tf.nn.embedding_lookup(embedding, tf_x)
```

### 建立一个RNN模型

SentimentRNN
- constructor
- build method
- train method
- predict method

In [21]:
import tensorflow as tf


class SentimentRNN(object):
    def __init__(self, n_words, seq_len=200,
                 lstm_size=256, num_layers=1, batch_size=64,
                 learning_rate=0.0001, embed_size=200):
        self.n_words = n_words
        self.seq_len = seq_len
        self.lstm_size = lstm_size  # number of hidden units
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.embed_size = embed_size

        self.g = tf.Graph()
        with self.g.as_default():
            tf.set_random_seed(123)
            self.build()
            self.saver = tf.train.Saver()
            self.init_op = tf.global_variables_initializer()

    def build(self):
        # Define the placeholders
        tf_x = tf.placeholder(tf.int32,
                              shape=(self.batch_size, self.seq_len),
                              name='tf_x')
        tf_y = tf.placeholder(tf.float32,
                              shape=(self.batch_size),
                              name='tf_y')
        tf_keepprob = tf.placeholder(tf.float32,
                                     name='tf_keepprob')
        # Create the embedding layer
        embedding = tf.Variable(
            tf.random_uniform(
                (self.n_words, self.embed_size),
                minval=-1, maxval=1),
            name='embedding')
        embed_x = tf.nn.embedding_lookup(
            embedding, tf_x,
            name='embeded_x')

        # Define LSTM cell and stack them together
        cells = tf.contrib.rnn.MultiRNNCell(
            [tf.contrib.rnn.DropoutWrapper(
                tf.contrib.rnn.BasicLSTMCell(self.lstm_size),
                output_keep_prob=tf_keepprob)
             for i in range(self.num_layers)])

        # Define the initial state:
        self.initial_state = cells.zero_state(
            self.batch_size, tf.float32)
        print('  << initial state >> ', self.initial_state)

        lstm_outputs, self.final_state = tf.nn.dynamic_rnn(
            cells, embed_x,
            initial_state=self.initial_state)
        # Note: lstm_outputs shape:
        ##  [batch_size, max_time, cells.output_size]
        print('\n  << lstm_output   >> ', lstm_outputs)
        print('\n  << final state   >> ', self.final_state)

        # Apply a FC layer after on top of RNN output:
        logits = tf.layers.dense(
            inputs=lstm_outputs[:, -1],
            units=1, activation=None,
            name='logits')

        logits = tf.squeeze(logits, name='logits_squeezed')
        print('\n  << logits        >> ', logits)

        y_proba = tf.nn.sigmoid(logits, name='probabilities')
        predictions = {
            'probabilities': y_proba,
            'labels': tf.cast(tf.round(y_proba), tf.int32,
                              name='labels')
        }
        print('\n  << predictions   >> ', predictions)

        # Define the cost function
        cost = tf.reduce_mean(
            tf.nn.sigmoid_cross_entropy_with_logits(
                labels=tf_y, logits=logits),
            name='cost')

        # Define the optimizer
        optimizer = tf.train.AdamOptimizer(self.learning_rate)
        train_op = optimizer.minimize(cost, name='train_op')

    def train(self, X_train, y_train, num_epochs):
        with tf.Session(graph=self.g) as sess:
            sess.run(self.init_op)
            iteration = 1
            for epoch in range(num_epochs):
                state = sess.run(self.initial_state)

                for batch_x, batch_y in create_batch_generator(
                        X_train, y_train, self.batch_size):
                    feed = {'tf_x:0': batch_x,
                            'tf_y:0': batch_y,
                            'tf_keepprob:0': 0.5,
                            self.initial_state: state}
                    loss, _, state = sess.run(
                        ['cost:0', 'train_op',
                         self.final_state],
                        feed_dict=feed)

                    if iteration % 20 == 0:
                        print("Epoch: %d/%d Iteration: %d "
                              "| Train loss: %.5f" % (
                                  epoch + 1, num_epochs,
                                  iteration, loss))

                    iteration += 1
                if (epoch+1) % 10 == 0:
                    self.saver.save(sess,
                                    "model/sentiment-%d.ckpt" % epoch)

    def predict(self, X_data, return_proba=False):
        preds = []
        with tf.Session(graph=self.g) as sess:
            self.saver.restore(
                sess, tf.train.latest_checkpoint('model/'))
            test_state = sess.run(self.initial_state)
            for ii, batch_x in enumerate(
                create_batch_generator(
                    X_data, None, batch_size=self.batch_size), 1):
                feed = {'tf_x:0': batch_x,
                        'tf_keepprob:0': 1.0,
                        self.initial_state: test_state}
                if return_proba:
                    pred, test_state = sess.run(
                        ['probabilities:0', self.final_state],
                        feed_dict=feed)
                else:
                    pred, test_state = sess.run(
                        ['labels:0', self.final_state],
                        feed_dict=feed)

                preds.append(pred)

        return np.concatenate(preds)

### 实例化SentimentRNN类

In [22]:
n_words = max(list(word_to_int.values())) + 1
rnn = SentimentRNN(n_words=n_words, seq_len=sequence_length, embed_size=256, lstm_size=128,
                   num_layers=1, batch_size=100, learning_rate=0.001)

  << initial state >>  (LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros:0' shape=(100, 128) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(100, 128) dtype=float32>),)

  << lstm_output   >>  Tensor("rnn/transpose_1:0", shape=(100, 200, 128), dtype=float32)

  << final state   >>  (LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_3:0' shape=(100, 128) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_4:0' shape=(100, 128) dtype=float32>),)

  << logits        >>  Tensor("logits_squeezed:0", shape=(100,), dtype=float32)

  << predictions   >>  {'probabilities': <tf.Tensor 'probabilities:0' shape=(100,) dtype=float32>, 'labels': <tf.Tensor 'labels:0' shape=(100,) dtype=int32>}


### 训练和优化情感分析RNN模型

In [23]:
rnn.train(X_train, y_train, num_epochs=40)

Epoch: 1/40 Iteration: 20 | Train loss: 0.68144
Epoch: 1/40 Iteration: 40 | Train loss: 0.62962
Epoch: 1/40 Iteration: 60 | Train loss: 0.64887
Epoch: 1/40 Iteration: 80 | Train loss: 0.62561
Epoch: 1/40 Iteration: 100 | Train loss: 0.57090
Epoch: 1/40 Iteration: 120 | Train loss: 0.53748
Epoch: 1/40 Iteration: 140 | Train loss: 0.47973
Epoch: 1/40 Iteration: 160 | Train loss: 0.43372
Epoch: 1/40 Iteration: 180 | Train loss: 0.47417
Epoch: 1/40 Iteration: 200 | Train loss: 0.61672
Epoch: 1/40 Iteration: 220 | Train loss: 0.34326
Epoch: 1/40 Iteration: 240 | Train loss: 0.45520
Epoch: 2/40 Iteration: 260 | Train loss: 0.41717
Epoch: 2/40 Iteration: 280 | Train loss: 0.37721
Epoch: 2/40 Iteration: 300 | Train loss: 0.32072
Epoch: 2/40 Iteration: 320 | Train loss: 0.42077
Epoch: 2/40 Iteration: 340 | Train loss: 0.32731
Epoch: 2/40 Iteration: 360 | Train loss: 0.23789
Epoch: 2/40 Iteration: 380 | Train loss: 0.36921
Epoch: 2/40 Iteration: 400 | Train loss: 0.35867
Epoch: 2/40 Iteration: 4

Epoch: 14/40 Iteration: 3300 | Train loss: 0.00145
Epoch: 14/40 Iteration: 3320 | Train loss: 0.00062
Epoch: 14/40 Iteration: 3340 | Train loss: 0.00173
Epoch: 14/40 Iteration: 3360 | Train loss: 0.00309
Epoch: 14/40 Iteration: 3380 | Train loss: 0.03511
Epoch: 14/40 Iteration: 3400 | Train loss: 0.00164
Epoch: 14/40 Iteration: 3420 | Train loss: 0.00097
Epoch: 14/40 Iteration: 3440 | Train loss: 0.00041
Epoch: 14/40 Iteration: 3460 | Train loss: 0.00139
Epoch: 14/40 Iteration: 3480 | Train loss: 0.00182
Epoch: 14/40 Iteration: 3500 | Train loss: 0.00250
Epoch: 15/40 Iteration: 3520 | Train loss: 0.00712
Epoch: 15/40 Iteration: 3540 | Train loss: 0.00053
Epoch: 15/40 Iteration: 3560 | Train loss: 0.00096
Epoch: 15/40 Iteration: 3580 | Train loss: 0.01974
Epoch: 15/40 Iteration: 3600 | Train loss: 0.00230
Epoch: 15/40 Iteration: 3620 | Train loss: 0.01154
Epoch: 15/40 Iteration: 3640 | Train loss: 0.00033
Epoch: 15/40 Iteration: 3660 | Train loss: 0.02580
Epoch: 15/40 Iteration: 3680 | 

Epoch: 27/40 Iteration: 6520 | Train loss: 0.00016
Epoch: 27/40 Iteration: 6540 | Train loss: 0.00002
Epoch: 27/40 Iteration: 6560 | Train loss: 0.00008
Epoch: 27/40 Iteration: 6580 | Train loss: 0.00021
Epoch: 27/40 Iteration: 6600 | Train loss: 0.00007
Epoch: 27/40 Iteration: 6620 | Train loss: 0.00012
Epoch: 27/40 Iteration: 6640 | Train loss: 0.00005
Epoch: 27/40 Iteration: 6660 | Train loss: 0.00037
Epoch: 27/40 Iteration: 6680 | Train loss: 0.00004
Epoch: 27/40 Iteration: 6700 | Train loss: 0.00004
Epoch: 27/40 Iteration: 6720 | Train loss: 0.00006
Epoch: 27/40 Iteration: 6740 | Train loss: 0.00010
Epoch: 28/40 Iteration: 6760 | Train loss: 0.00005
Epoch: 28/40 Iteration: 6780 | Train loss: 0.00006
Epoch: 28/40 Iteration: 6800 | Train loss: 0.00010
Epoch: 28/40 Iteration: 6820 | Train loss: 0.00002
Epoch: 28/40 Iteration: 6840 | Train loss: 0.00018
Epoch: 28/40 Iteration: 6860 | Train loss: 0.00005
Epoch: 28/40 Iteration: 6880 | Train loss: 0.00063
Epoch: 28/40 Iteration: 6900 | 

Epoch: 39/40 Iteration: 9740 | Train loss: 0.00189
Epoch: 40/40 Iteration: 9760 | Train loss: 0.00707
Epoch: 40/40 Iteration: 9780 | Train loss: 0.06127
Epoch: 40/40 Iteration: 9800 | Train loss: 0.00416
Epoch: 40/40 Iteration: 9820 | Train loss: 0.00063
Epoch: 40/40 Iteration: 9840 | Train loss: 0.00288
Epoch: 40/40 Iteration: 9860 | Train loss: 0.00260
Epoch: 40/40 Iteration: 9880 | Train loss: 0.08514
Epoch: 40/40 Iteration: 9900 | Train loss: 0.00508
Epoch: 40/40 Iteration: 9920 | Train loss: 0.00743
Epoch: 40/40 Iteration: 9940 | Train loss: 0.00075
Epoch: 40/40 Iteration: 9960 | Train loss: 0.00038
Epoch: 40/40 Iteration: 9980 | Train loss: 0.00307
Epoch: 40/40 Iteration: 10000 | Train loss: 0.00121


In [24]:
preds = rnn.predict(X_test)
y_true = y_test[:len(preds)]
print('Test Acc.: {:.3f}'.format(np.sum(preds==y_true)/len(y_true)))

Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from model/sentiment-39.ckpt
Test Acc.: 0.857


In [25]:
proba = rnn.predict(X_test, return_proba=True)
print(proba)

INFO:tensorflow:Restoring parameters from model/sentiment-39.ckpt
[1.7583370e-06 9.9999797e-01 3.8330173e-01 ... 3.4272671e-06 3.3229589e-04
 9.9958938e-01]
