# Sentiment Analysis with an RNN

In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. Using an RNN rather than a feedfoward network is more accurate since we can include information about the *sequence* of words. Here we'll use a dataset of movie reviews, accompanied by labels. 采用rnn效果比普通神经网络效果更好

The architecture for this network is shown below. 采用lstm单元 ，并采用了词嵌入来输入

<img src="assets/network_diagram.png" width=400px>

Here, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the word2vec lesson. You can actually train up an embedding with word2vec and use it here. But it's good enough to just have an embedding layer and let the network learn the embedding table on it's own. 采用词嵌入的效果比简单热独编码好,即首先是将评论文本变成词嵌入，然后输入lstm单元进行rnn计算

From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. We're using the sigmoid because we're trying to predict if this text has positive or negative sentiment. The output layer will just be a single unit then, with a sigmoid activation function.输出层采用sigmoid函数，因为是positive和negitive的二值表示，只需要0-1之间的概率表示就行

We don't care about the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.我们计算多步时长的rnn，拿最后一个时间步长的结果

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
with open(r'F:\uav+ai\神经网络test\第三课循环神经网络\6情感预测\sentiment-rnn\reviews.txt', 'r') as f:
    reviews1 = f.read()
with open(r'F:\uav+ai\神经网络test\第三课循环神经网络\6情感预测\sentiment-rnn\labels.txt', 'r') as f:
    labels = f.read()  #注意readlines 和readline和read的三者区别

In [3]:
len(reviews1)# /n代表换行符，因为是电影评论列表一个一行一行的存储

33678267

In [4]:
reviews1[:4000]

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   \nstory of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terrific example of absurd comedy . a formal orchestra audience is tu

## Data preprocessing 去掉标点符号等，按照单词分割

The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.
处理换行符 ，把所有的评论混合在一起
You can see an example of the reviews data above. We'll want to get rid of those periods. Also, you might notice that the reviews are delimited with newlines `\n`. To deal with those, I'm going to split the text into each review using `\n` as the delimiter. Then I can combined all the reviews back together into one big string.

First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.

In [9]:
from string import punctuation
string.punctuation #各类标点符号库

NameError: name 'string' is not defined

In [10]:
print('\n' in punctuation) #换行符不属于标点符号库
print(',' in punctuation)#属于标点符号

False
True


In [11]:
from string import punctuation
text = ''.join([c for c in reviews1 if c not in punctuation])#去掉标点符号
reviews = text.split('\n')#按照换行分割字符串,每条评论一条字符串
all_text = ' '.join(reviews)#多条字符串组合在一起
words = all_text.split()#按照空格分割

In [12]:
len(reviews)#一共有25001条评论

25001

In [13]:
reviews[:2]

['bromwell high is a cartoon comedy  it ran at the same time as some other programs about school life  such as  teachers   my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers   the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students  when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled          at           high  a classic line inspector i  m here to sack one of your teachers  student welcome to bromwell high  i expect that many adults of my age think that bromwell high is far fetched  what a pity that it isn  t   ',
 'story of a man who has unnatural feelings for a pig  starts out with a opening scene that is a terrific example of absurd comedy  a formal orchestra audience is turned into an insane  viol

In [14]:
all_text[:4000]

'bromwell high is a cartoon comedy  it ran at the same time as some other programs about school life  such as  teachers   my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers   the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students  when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled          at           high  a classic line inspector i  m here to sack one of your teachers  student welcome to bromwell high  i expect that many adults of my age think that bromwell high is far fetched  what a pity that it isn  t    story of a man who has unnatural feelings for a pig  starts out with a opening scene that is a terrific example of absurd comedy  a formal orchestra audience is turned into an insane  violent m

In [15]:
words[:10]

['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', 'it', 'ran', 'at', 'the']

### Encoding the words  对输入评论序列进行编码，输出为二维list，每条评论一个list

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

> **Exercise:** Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
> Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

In [16]:
from collections import Counter
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)#得到了按照词频排序的单词表
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)} #得到单词表和索引号 
reviews_ints = []
for each in reviews:
    reviews_ints.append([vocab_to_int[word] for word in each.split()])

In [17]:
len(vocab)

74072

In [18]:
(vocab_to_int['bromwell'],vocab_to_int['high'],vocab_to_int['is'])

(21641, 308, 6)

In [19]:
reviews_ints[:2]#所有评论编码成向量 二维数组

[[21641,
  308,
  6,
  3,
  1050,
  207,
  8,
  2138,
  32,
  1,
  171,
  57,
  15,
  49,
  81,
  5828,
  44,
  382,
  110,
  140,
  15,
  5236,
  60,
  154,
  9,
  1,
  4989,
  5902,
  475,
  71,
  5,
  260,
  12,
  21641,
  308,
  13,
  1980,
  6,
  74,
  2406,
  5,
  614,
  73,
  6,
  5236,
  1,
  24827,
  5,
  1983,
  10337,
  1,
  5827,
  1502,
  36,
  51,
  66,
  204,
  145,
  67,
  1203,
  5236,
  20600,
  1,
  39834,
  4,
  1,
  221,
  883,
  31,
  2990,
  71,
  4,
  1,
  5847,
  10,
  687,
  2,
  67,
  1502,
  54,
  10,
  216,
  1,
  384,
  9,
  62,
  3,
  1407,
  3693,
  783,
  5,
  3500,
  180,
  1,
  382,
  10,
  1212,
  13629,
  32,
  308,
  3,
  349,
  341,
  2913,
  10,
  143,
  127,
  5,
  7761,
  30,
  4,
  129,
  5236,
  1407,
  2331,
  5,
  21641,
  308,
  10,
  528,
  12,
  109,
  1452,
  4,
  60,
  543,
  102,
  12,
  21641,
  308,
  6,
  227,
  4159,
  48,
  3,
  2215,
  12,
  8,
  215,
  23],
 [63,
  4,
  3,
  125,
  36,
  47,
  7566,
  1396,
  16,
  3,
  4191,
 

In [20]:
np.shape(reviews_ints) #25001条list，每条评论一条list,每条list长度不一致 ，所有无法显示二维数组的维度

(25001,)

### Encoding the labels   对标签编码

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.

> **Exercise:** Convert labels from `positive` and `negative` to 1 and 0, respectively.

In [21]:
labels[:100]#\n为分隔符

'positive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nn'

In [22]:
labels = labels.split('\n')
# Convert labels to 1s and 0s for 'positive' and 'negative'
labels = np.array([1 if each == 'positive' else 0 for each in labels])

In [23]:
print(len(labels))
print(labels[:100])

25001
[1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0]


If you built `labels` correctly, you should see the next output.

In [24]:
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])#对review长度进行排序，并用counter统计
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 1
Maximum review length: 2514


In [156]:
review_lens

Counter({0: 1,
         10: 1,
         11: 1,
         12: 2,
         13: 1,
         14: 1,
         15: 2,
         16: 2,
         17: 2,
         19: 2,
         20: 3,
         22: 4,
         24: 3,
         25: 2,
         26: 2,
         27: 8,
         28: 2,
         29: 6,
         30: 9,
         31: 5,
         32: 6,
         33: 4,
         34: 17,
         35: 10,
         36: 13,
         37: 19,
         38: 11,
         39: 20,
         40: 23,
         41: 20,
         42: 25,
         43: 32,
         44: 45,
         45: 36,
         46: 35,
         47: 41,
         48: 39,
         49: 37,
         50: 51,
         51: 43,
         52: 48,
         53: 47,
         54: 45,
         55: 44,
         56: 56,
         57: 47,
         58: 41,
         59: 55,
         60: 50,
         61: 47,
         62: 42,
         63: 56,
         64: 42,
         65: 39,
         66: 36,
         67: 42,
         68: 54,
         69: 45,
         70: 47,
         71: 43,
   

Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. Let's truncate to 200 steps. For reviews shorter than 200, we'll pad with 0s. For reviews longer than 200, we can truncate them to the first 200 characters.
####  截断评论，只保留200个字符长度评论，不足200个字符长度的前面补0，长于200字符长度的后面丢弃
> **Exercise:** First, remove the review with zero length from the `reviews_ints` list.

In [197]:
# Filter out that review with 0 length
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]#保留了非零长度的评论的索引号
len(non_zero_idx)

25000

> **Exercise:** Now, create an array `features` that contains the data we'll pass to the network. The data should come from `review_ints`, since we want to feed integers to the network. Each row should be 200 elements long. For reviews shorter than 200 words, left pad with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. For reviews longer than 200, use on the first 200 words as the feature vector.

This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.



In [198]:
 for i,x in enumerate(reviews_ints[:3]):
        print(x)
        print(i)

[21641, 308, 6, 3, 1050, 207, 8, 2138, 32, 1, 171, 57, 15, 49, 81, 5828, 44, 382, 110, 140, 15, 5236, 60, 154, 9, 1, 4989, 5902, 475, 71, 5, 260, 12, 21641, 308, 13, 1980, 6, 74, 2406, 5, 614, 73, 6, 5236, 1, 24827, 5, 1983, 10337, 1, 5827, 1502, 36, 51, 66, 204, 145, 67, 1203, 5236, 20600, 1, 39834, 4, 1, 221, 883, 31, 2990, 71, 4, 1, 5847, 10, 687, 2, 67, 1502, 54, 10, 216, 1, 384, 9, 62, 3, 1407, 3693, 783, 5, 3500, 180, 1, 382, 10, 1212, 13629, 32, 308, 3, 349, 341, 2913, 10, 143, 127, 5, 7761, 30, 4, 129, 5236, 1407, 2331, 5, 21641, 308, 10, 528, 12, 109, 1452, 4, 60, 543, 102, 12, 21641, 308, 6, 227, 4159, 48, 3, 2215, 12, 8, 215, 23]
0
[63, 4, 3, 125, 36, 47, 7566, 1396, 16, 3, 4191, 505, 45, 17, 3, 622, 134, 12, 6, 3, 1280, 457, 4, 1721, 207, 3, 10759, 7399, 300, 6, 667, 83, 35, 2120, 1087, 2992, 34, 1, 898, 66524, 4, 8, 13, 5117, 464, 8, 2669, 1721, 1, 221, 57, 17, 58, 794, 1300, 833, 228, 8, 43, 98, 123, 1470, 59, 147, 38, 1, 963, 142, 29, 667, 123, 1, 13657, 410, 61, 95, 178

In [233]:
seq_len = 600
features = np.zeros((len(reviews_ints), seq_len), dtype=int)#feature现在规整为25001*200
for i ,row in enumerate(reviews_ints):

    if len(row)!=0:
        features[i,-len(row):]=np.array(row)[:seq_len]


If you build features correctly, it should look like that cell output below.

In [234]:
np.shape(features) 

(25001, 600)

In [235]:
features[:10,:100]

array([[    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     

## Training, Validation, Test 拆分训练和验证



With our data in nice shape, we'll split it into training, validation, and test sets.

> **Exercise:** Create the training, validation, and test sets here. You'll need to create sets for the features and the labels, `train_x` and `train_y` for example. Define a split fraction, `split_frac` as the fraction of data to keep in the training set. Usually this is set to 0.8 or 0.9. The rest of the data will be split in half to create the validation and testing data.
这里因为是评论数据，所以直接按照顺序拆分即可

In [236]:
split_frac = 0.8 #训练集比例
split_num=int(np.shape(features)[0]*split_frac)
train_x, val_x = features[:split_num],features[split_num:]
train_y, val_y = labels[:split_num],labels[split_num:]

num2=int(np.shape(val_x)[0]*0.5)
val_x, test_x = val_x[:num2],val_x[num2:]
val_y, test_y = val_y[:num2],val_y[num2:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(20000, 600) 
Validation set: 	(2500, 600) 
Test set: 		(2501, 600)


With train, validation, and text fractions of 0.8, 0.1, 0.1, the final shapes should look like:
```
                    Feature Shapes:
Train set: 		 (20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		  (2500, 200)
```

## Build the graph 创建网络

Here, we'll build the graph. First up, defining the hyperparameters.

* `lstm_size`: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `lstm_layers`: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.
* `batch_size`: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.
* `learning_rate`: Learning rate

In [253]:
lstm_size = 256
lstm_layers = 1
batch_size = 600
learning_rate = 0.0001

For the network itself, we'll be passing in our 200 element long review vectors. Each batch will be `batch_size` vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability. 使用dropout来防止过拟合

> **Exercise:** Create the `inputs_`, `labels_`, and drop out `keep_prob` placeholders using `tf.placeholder`. `labels_` needs to be two-dimensional to work with some functions later.  Since `keep_prob` is a scalar (a 0-dimensional tensor), you shouldn't provide a size to `tf.placeholder`.

In [254]:
n_words = len(vocab_to_int)
print(n_words)
# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
    inputs_ = tf.placeholder(tf.int32, [None, None], name='inputs')
    labels_ = tf.placeholder(tf.int32, [None, None], name='labels')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')

74072


### Embedding
 #### 采用词嵌入来实现降维，即需要词嵌入层
Now we'll add an embedding layer. We need to do this because there are 74000 words in our vocabulary. It is massively inefficient to one-hot encode our classes here. You should remember dealing with this problem from the word2vec lesson. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.

> **Exercise:** Create the embedding lookup matrix as a `tf.Variable`. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell with [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup). This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer has 300 units, the function will return a tensor with size [batch_size, 300].



In [255]:
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 500   

with graph.as_default():#random_uniform均匀分布-1至1之间
    embedding = tf.Variable(tf.random_uniform((n_words,embed_size),-1,1))##词嵌入层的转换矩阵  即权重矩阵 74072*300   
    embed = tf.nn.embedding_lookup(embedding,inputs_)#嵌入层的输出，采用词查找得到输出 batch_size*200*300

In [256]:
np.shape(embedding),np.shape(embed)

(TensorShape([Dimension(74072), Dimension(500)]),
 TensorShape([Dimension(None), Dimension(None), Dimension(500)]))

#### 输入是batch_size$\times$200,这里的200个单词实际上应该是是200$\times$1,热独编码应该是200$\times$74072
#### 相当于输入是batch_size $\times$ 200 $\times$74072 经过查找函数74072  $\times$300后输出相当于输入是batch_size  $\times$200  $\times$ 300

### LSTM cell

<img src="assets/network_diagram.png" width=400px>

Next, we'll create our LSTM cells to use in the recurrent network ([TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn)). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.

To create a basic LSTM cell for the graph, you'll want to use `tf.contrib.rnn.BasicLSTMCell`. Looking at the function documentation:
通过以下代码 创建lstm单元
```
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
```
num_units
you can see it takes a parameter called `num_units`, the number of units in the cell, called `lstm_size` in this code. So then, you can write something like num_units本质上就是lstm_size，代表四类门的数量，即隐层单元数

```
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

to create an LSTM cell with `num_units`. Next, you can add dropout to the cell with `tf.contrib.rnn.DropoutWrapper`. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like 需要给cell添加dropout层

```
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
```

Most of the time, your network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with `tf.contrib.rnn.MultiRNNCell`:  通过以下代码创建多层lstm

```
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
```

Here, `[drop] * lstm_layers` creates a list of cells (`drop`) that is `lstm_layers` long. The `MultiRNNCell` wrapper builds this into multiple layers of RNN cells, one for each cell in the list.

So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an architectural viewpoint, just a more complicated graph in the cell. 每层的lstm均带有dropout层，然后一层堆积在一层上面

> **Exercise:** Below, use `tf.contrib.rnn.BasicLSTMCell` to create an LSTM cell. Then, add drop out to it with `tf.contrib.rnn.DropoutWrapper`. Finally, create multiple LSTM layers with `tf.contrib.rnn.MultiRNNCell`.

Here is [a tutorial on building RNNs](https://www.tensorflow.org/tutorials/recurrent) that will help you out.


In [257]:
with graph.as_default():
    # Your basic LSTM cell
    
    def build_cell(lstm_size, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)#LSTM的维度lstm_size
        
        # Add dropout to the cell
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return drop
    
    #lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)#256个隐层节点
    
    # Add dropout to the cell
    #drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob) 
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(lstm_layers)])

    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)
    #需要对每个cell进行初始化，即每条评论有对应一个cell处理，cell处理200*1成200*300

In [258]:
''' 另外一种写法
def build_cell(lstm_size, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)#LSTM的维度lstm_size
        
        # Add dropout to the cell
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return drop
    
    
    # Stack up multiple LSTM layers, for deep learning  创造多层LSTM
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
'''

' 另外一种写法\ndef build_cell(lstm_size, keep_prob):\n        # Use a basic LSTM cell\n        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)#LSTM的维度lstm_size\n        \n        # Add dropout to the cell\n        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)\n        return drop\n    \n    \n    # Stack up multiple LSTM layers, for deep learning  创造多层LSTM\n    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])\n'

### RNN forward pass

<img src="assets/network_diagram.png" width=400px>

Now we need to actually run the data through the RNN nodes. You can use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) to do this. You'd pass in the RNN cell you created (our multiple layered LSTM `cell` for instance), and the inputs to the network.

```
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
```
采用tf.nn.dynamic_rnn来实现基于时间的前向传播，initial_state变成final_state
Above I created an initial state, `initial_state`, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. `tf.nn.dynamic_rnn` takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.

> **Exercise:** Use `tf.nn.dynamic_rnn` to add the forward pass through the RNN. Remember that we're actually passing in vectors from the embedding layer, `embed`.



In [259]:
with graph.as_default():
    outputs, final_state = tf.nn.dynamic_rnn(cell, embed,
                                             initial_state=initial_state)#输入是设置好的cell和词嵌入的数据

#### 1输入序列是batch_size$\times$time_steps即相等于500$\times$200 （其中一条评论200代表200个单词）
#### 2在词嵌入的转换中batch_size$\times$time_steps 被扩展成batch_size$\times$time_steps$\times$vocab_num即500$\times$200$\times$70074 ，即热独编码和查找表实现
#### 3然后又被转换成batch_size$\times$time_steps$\times$embed_size 即500$\times$200$\times$300
####  相当于词嵌入是batch_size$\times$time_steps$\times$embed_size 即相当于500$\times$200$\times$300
#### 4通过dynamic_rnn得到的是500$\times$200$\times$300维度,相当于这里是将300个输入点变成256个隐节点，并且有4个控制门，总权重数为300$\times$256$\times$4
####  lstm输出output是batch_size$\times$time_steps$\times$LSTM_sizes 即相当于500$\times$200$\times$256
#### 5然后去最后一个时间步长的输出 为batch_size$\times$LSTM_sizes 即相当于500$\times$256
#### 6全连接层的输出batch_size$\times$1  即500$\times$1

#### outputs 的维度是batch_size$\times$time_steps$\times$LSTM_sizes

In [260]:
print(np.shape(embed)) #500*200*300
print(np.shape(outputs))#500*200*256 

(?, ?, 500)
(600, ?, 256)


### Output

We only care about the final output, we'll be using that as our sentiment prediction. So we need to grab the last output with `outputs[:, -1]`, the calculate the cost from that and `labels_`. 关心最后一个时间步长的数据，即200个单词最后一个单词输入后的数据  
然后再通过全连接层将batch_size$\times$LSTM_size 转成batch_size

In [261]:
outputs[:,-1]#去了最后一个时间步长的数据

<tf.Tensor 'strided_slice:0' shape=(600, 256) dtype=float32>

In [262]:
with graph.as_default():
    predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)#通过全连接层
    cost = tf.losses.mean_squared_error(labels_, predictions)
    
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

### Validation accuracy
设计验证集精度
Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.

In [263]:
with graph.as_default():
    correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)#取整后还需要转换格式
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

### Batching 通过迭代器生成batch

This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the `x` and `y` arrays and returns slices out of those arrays with size `[batch_size]`.

In [264]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

In [265]:
batch=get_batches(val_x,val_y,100)

In [266]:
x,y=next(batch)
print(x)
print(np.shape(x))
print(np.shape(y))

[[    0     0     0 ...  1514    45     4]
 [    0     0     0 ...    49   887  4326]
 [    0     0     0 ... 11391     7     7]
 ...
 [    0     0     0 ...  1559  2336   144]
 [    0     0     0 ...     3   383    18]
 [    0     0     0 ...   368    21  2493]]
(100, 600)
(100,)


## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. Before you run this, make sure the `checkpoints` directory exists.

In [None]:
epochs = 20

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    iteration = 1
    state = sess.run(initial_state)
    for e in range(epochs):
        #state = sess.run(initial_state)
        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
            feed = {inputs_: x, #   x是500*200
                    labels_: y[:, None], #500*1
                    keep_prob: 0.5,
                    initial_state: state}
            loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
            
            if iteration%5==0:
                print("Epoch: {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss))

            if iteration%25==0:
                val_acc = []
                val_state = sess.run(cell.zero_state(batch_size, tf.float32))
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed = {inputs_: x,
                            labels_: y[:, None],
                            keep_prob: 1,
                            initial_state: val_state}
                    batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)# 每个batch的批次计算精度求和后取平均
                    val_acc.append(batch_acc)
                print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1
    saver.save(sess, "checkpoints/sentiment.ckpt")

Epoch: 0/20 Iteration: 5 Train loss: 0.253
Epoch: 0/20 Iteration: 10 Train loss: 0.249
Epoch: 0/20 Iteration: 15 Train loss: 0.256
Epoch: 0/20 Iteration: 20 Train loss: 0.249
Epoch: 0/20 Iteration: 25 Train loss: 0.247
Val acc: 0.577
Epoch: 0/20 Iteration: 30 Train loss: 0.242
Epoch: 1/20 Iteration: 35 Train loss: 0.236
Epoch: 1/20 Iteration: 40 Train loss: 0.239
Epoch: 1/20 Iteration: 45 Train loss: 0.236
Epoch: 1/20 Iteration: 50 Train loss: 0.233
Val acc: 0.609
Epoch: 1/20 Iteration: 55 Train loss: 0.234
Epoch: 1/20 Iteration: 60 Train loss: 0.230
Epoch: 1/20 Iteration: 65 Train loss: 0.223
Epoch: 2/20 Iteration: 70 Train loss: 0.217
Epoch: 2/20 Iteration: 75 Train loss: 0.209
Val acc: 0.663
Epoch: 2/20 Iteration: 80 Train loss: 0.212
Epoch: 2/20 Iteration: 85 Train loss: 0.182


## Testing

In [270]:
test_acc = []
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
        feed = {inputs_: x,
                labels_: y[:, None],
                keep_prob: 1,
                initial_state: test_state}
        batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
        test_acc.append(batch_acc)
    print("Test accuracy: {:.3f}".format(np.mean(test_acc)))

INFO:tensorflow:Restoring parameters from checkpoints\sentiment.ckpt
Test accuracy: 0.832
