# Sentiment Analysis with an RNN

In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. Using an RNN rather than a feedfoward network is more accurate since we can include information about the *sequence* of words. Here we'll use a dataset of movie reviews, accompanied by labels.

The architecture for this network is shown below.

<img src="../img/network_diagram.png" width=400px>

Here, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the word2vec lesson. You can actually train up an embedding with word2vec and use it here. But it's good enough to just have an embedding layer and let the network learn the embedding table on it's own.

From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. We're using the sigmoid because we're trying to predict if this text has positive or negative sentiment. The output layer will just be a single unit then, with a sigmoid activation function.

We don't care about the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
with open('../data/reviews.txt', 'r') as f:
    reviews = f.read()
with open('../data/labels.txt', 'r') as f:
    labels = f.read()

In [3]:
reviews[:2000]

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   \nstory of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terrific example of absurd comedy . a formal orchestra audience is tu

## Data preprocessing

The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.

You can see an example of the reviews data above. We'll want to get rid of those periods. Also, you might notice that the reviews are delimited with newlines `\n`. To deal with those, I'm going to split the text into each review using `\n` as the delimiter. Then I can combined all the reviews back together into one big string.

First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.

In [4]:
from string import punctuation

# Remove all punctuation (string)
all_text = ''.join([c for c in reviews if c not in punctuation])

# Split text where there is '\n' (list)
reviews = all_text.split('\n')

# Create one single string with the items of the list, separating them
# with a blank space (string)
all_text = ' '.join(reviews)

# Split text where there is blank space (list)
words = all_text.split()

In [5]:
all_text[:2000]

'bromwell high is a cartoon comedy  it ran at the same time as some other programs about school life  such as  teachers   my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers   the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students  when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled          at           high  a classic line inspector i  m here to sack one of your teachers  student welcome to bromwell high  i expect that many adults of my age think that bromwell high is far fetched  what a pity that it isn  t    story of a man who has unnatural feelings for a pig  starts out with a opening scene that is a terrific example of absurd comedy  a formal orchestra audience is turned into an insane  violent m

In [6]:
words[:100]

['bromwell',
 'high',
 'is',
 'a',
 'cartoon',
 'comedy',
 'it',
 'ran',
 'at',
 'the',
 'same',
 'time',
 'as',
 'some',
 'other',
 'programs',
 'about',
 'school',
 'life',
 'such',
 'as',
 'teachers',
 'my',
 'years',
 'in',
 'the',
 'teaching',
 'profession',
 'lead',
 'me',
 'to',
 'believe',
 'that',
 'bromwell',
 'high',
 's',
 'satire',
 'is',
 'much',
 'closer',
 'to',
 'reality',
 'than',
 'is',
 'teachers',
 'the',
 'scramble',
 'to',
 'survive',
 'financially',
 'the',
 'insightful',
 'students',
 'who',
 'can',
 'see',
 'right',
 'through',
 'their',
 'pathetic',
 'teachers',
 'pomp',
 'the',
 'pettiness',
 'of',
 'the',
 'whole',
 'situation',
 'all',
 'remind',
 'me',
 'of',
 'the',
 'schools',
 'i',
 'knew',
 'and',
 'their',
 'students',
 'when',
 'i',
 'saw',
 'the',
 'episode',
 'in',
 'which',
 'a',
 'student',
 'repeatedly',
 'tried',
 'to',
 'burn',
 'down',
 'the',
 'school',
 'i',
 'immediately',
 'recalled',
 'at',
 'high']

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

> **Exercise:** Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
> Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

In [7]:
# Create dictionary that maps vocab words to integers
from collections import Counter
words_count = Counter(words)
vocab = sorted(words_count, key=str.lower)
vocab_to_int = {word: i for i, word in enumerate(vocab, 1)}

# Convert the reviews to integers, same shape as reviews list, but with 
# integers
reviews_ints = []
for review in reviews:
    reviews_ints.append([vocab_to_int[w] for w in review.split()])

### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.

> **Exercise:** Convert labels from `positive` and `negative` to 1 and 0, respectively.

In [8]:
print(labels)

positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
n

In [9]:
# Convert labels to 1s and 0s for 'positive' and 'negative'
labels = labels.split('\n')
labels = np.array([0 if label == 'negative' else 1 for label in labels])

In [10]:
print(labels)

[1 0 1 ... 1 0 1]


If you built `labels` correctly, you should see the next output.

In [11]:
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 1
Maximum review length: 2514


Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. Let's truncate to 200 steps. For reviews shorter than 200, we'll pad with 0s. For reviews longer than 200, we can truncate them to the first 200 characters.

> **Exercise:** First, remove the review with zero length from the `reviews_ints` list.

In [12]:
# Filter out that review with 0 length
aux = [i for i, review in enumerate(reviews_ints) if len(review) != 0]
reviews_ints = [reviews_ints[i] for i in aux]
labels = np.array([labels[i] for i in aux])

> **Exercise:** Now, create an array `features` that contains the data we'll pass to the network. The data should come from `review_ints`, since we want to feed integers to the network. Each row should be 200 elements long. For reviews shorter than 200 words, left pad with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. For reviews longer than 200, use on the first 200 words as the feature vector.

This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.



In [13]:
seq_len = 200
features = np.zeros((len(reviews_ints), seq_len), dtype=int)

for i, row in enumerate(reviews_ints):
    features[i, -len(row):] = np.array(row)[:seq_len]

If you build features correctly, it should look like that cell output below.

In [14]:
features[:10,:100]

array([[    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,  8210, 29951, 33665,
            1,  9819, 12477, 33768, 52591,  3689, 65543, 56451, 66137,
         3397, 60807, 46549, 51107,   181, 57199, 37775, 63328,  3397,
        64947, 43774, 73400, 32081, 65543, 64949, 51048, 37169, 40892,
        66340,  5656, 65523,  8210, 29951, 56131, 56749, 33665, 43370,
        11909],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     

## Training, Validation, Test



With our data in nice shape, we'll split it into training, validation, and test sets.

> **Exercise:** Create the training, validation, and test sets here. You'll need to create sets for the features and the labels, `train_x` and `train_y` for example. Define a split fraction, `split_frac` as the fraction of data to keep in the training set. Usually this is set to 0.8 or 0.9. The rest of the data will be split in half to create the validation and testing data.

In [15]:
split_frac = 0.8
split_index = int(len(features) * split_frac)

train_x, val_x = features[:split_index], features[split_index:]
train_y, val_y = labels[:split_index], labels[split_index:]

split_frac = 0.5
split_index = int(len(val_x) * split_frac)

val_x, test_x = val_x[:split_index], val_x[split_index:]
val_y, test_y = val_y[:split_index], val_y[split_index:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		(2500, 200)


With train, validation, and text fractions of 0.8, 0.1, 0.1, the final shapes should look like:
```
                    Feature Shapes:
Train set: 		 (20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		  (2500, 200)
```

## Build the graph

Here, we'll build the graph. First up, defining the hyperparameters.

* `lstm_size`: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `lstm_layers`: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.
* `batch_size`: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.
* `learning_rate`: Learning rate

In [147]:
lstm_size = 512
lstm_layers = 1
batch_size = 300
learning_rate = 0.00001

For the network itself, we'll be passing in our 200 element long review vectors. Each batch will be `batch_size` vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.

> **Exercise:** Create the `inputs_`, `labels_`, and drop out `keep_prob` placeholders using `tf.placeholder`. `labels_` needs to be two-dimensional to work with some functions later.  Since `keep_prob` is a scalar (a 0-dimensional tensor), you shouldn't provide a size to `tf.placeholder`.

In [148]:
n_words = len(vocab_to_int) + 1 # Adding 1 because we use 0's for padding, dictionary started at 1

# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
    inputs_ = tf.placeholder(tf.int32, [None, None], name='inputs')
    labels_ = tf.placeholder(tf.int32, [None, None], name='labels')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')

### Embedding

Now we'll add an embedding layer. We need to do this because there are 74000 words in our vocabulary. It is massively inefficient to one-hot encode our classes here. You should remember dealing with this problem from the word2vec lesson. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.

> **Exercise:** Create the embedding lookup matrix as a `tf.Variable`. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell with [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup). This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer has 200 units, the function will return a tensor with size [batch_size, 200].



In [149]:
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 300 

with graph.as_default():
    embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1))
    embed = tf.nn.embedding_lookup(embedding, inputs_)

### LSTM cell

<img src="../img/network_diagram.png" width=400px>

Next, we'll create our LSTM cells to use in the recurrent network ([TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn)). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.

To create a basic LSTM cell for the graph, you'll want to use `tf.contrib.rnn.BasicLSTMCell`. Looking at the function documentation:

```
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
```

you can see it takes a parameter called `num_units`, the number of units in the cell, called `lstm_size` in this code. So then, you can write something like 

```
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

to create an LSTM cell with `num_units`. Next, you can add dropout to the cell with `tf.contrib.rnn.DropoutWrapper`. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like

```
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
```

Most of the time, your network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with `tf.contrib.rnn.MultiRNNCell`:

```
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
```

Here, `[drop] * lstm_layers` creates a list of cells (`drop`) that is `lstm_layers` long. The `MultiRNNCell` wrapper builds this into multiple layers of RNN cells, one for each cell in the list.

So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an architectural viewpoint, just a more complicated graph in the cell.

> **Exercise:** Below, use `tf.contrib.rnn.BasicLSTMCell` to create an LSTM cell. Then, add drop out to it with `tf.contrib.rnn.DropoutWrapper`. Finally, create multiple LSTM layers with `tf.contrib.rnn.MultiRNNCell`.

Here is [a tutorial on building RNNs](https://www.tensorflow.org/tutorials/recurrent) that will help you out.


In [150]:
with graph.as_default():
    # Your basic LSTM cell
    lstms = [tf.contrib.rnn.BasicLSTMCell(lstm_size) for _ in 
            range(lstm_layers)]
    
    # Add dropout to the cell
    drops = [tf.contrib.rnn.DropoutWrapper(lstm, 
                                           output_keep_prob=keep_prob) 
             for lstm in lstms]
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell(drops)
    
    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)

### RNN forward pass

<img src="../img/network_diagram.png" width=400px>

Now we need to actually run the data through the RNN nodes. You can use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) to do this. You'd pass in the RNN cell you created (our multiple layered LSTM `cell` for instance), and the inputs to the network.

```
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
```

Above I created an initial state, `initial_state`, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. `tf.nn.dynamic_rnn` takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.

> **Exercise:** Use `tf.nn.dynamic_rnn` to add the forward pass through the RNN. Remember that we're actually passing in vectors from the embedding layer, `embed`.



In [151]:
with graph.as_default():
    outputs, final_state = tf.nn.dynamic_rnn(cell, embed, 
                                             initial_state=initial_state)

### Output

We only care about the final output, we'll be using that as our sentiment prediction. So we need to grab the last output with `outputs[:, -1]`, the calculate the cost from that and `labels_`.

In [152]:
with graph.as_default():
    predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
    cost = tf.losses.mean_squared_error(labels_, predictions)
    
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

### Validation accuracy

Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.

In [153]:
with graph.as_default():
    correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

### Batching

This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the `x` and `y` arrays and returns slices out of those arrays with size `[batch_size]`.

In [154]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. Before you run this, make sure the `checkpoints` directory exists.

In [155]:
epochs = 100

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    iteration = 1
    for e in range(epochs):
        state = sess.run(initial_state)
        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
            feed = {inputs_: x,
                    labels_: y[:, None],
                    keep_prob: 0.5,
                    initial_state: state}
            loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
            
            if iteration%5==0:
                print("Epoch: {}/{}".format(e+1, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss))

            if iteration%25==0:
                val_acc = []
                val_state = sess.run(cell.zero_state(batch_size, tf.float32))
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed = {inputs_: x,
                            labels_: y[:, None],
                            keep_prob: 1,
                            initial_state: val_state}
                    batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)
                    val_acc.append(batch_acc)
                print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1
    saver.save(sess, "checkpoints/sentiment.ckpt")

Epoch: 1/100 Iteration: 5 Train loss: 0.252
Epoch: 1/100 Iteration: 10 Train loss: 0.251
Epoch: 1/100 Iteration: 15 Train loss: 0.249
Epoch: 1/100 Iteration: 20 Train loss: 0.252
Epoch: 1/100 Iteration: 25 Train loss: 0.251
Val acc: 0.534
Epoch: 1/100 Iteration: 30 Train loss: 0.252
Epoch: 1/100 Iteration: 35 Train loss: 0.250
Epoch: 1/100 Iteration: 40 Train loss: 0.246
Epoch: 1/100 Iteration: 45 Train loss: 0.253
Epoch: 1/100 Iteration: 50 Train loss: 0.253
Val acc: 0.542
Epoch: 1/100 Iteration: 55 Train loss: 0.252
Epoch: 1/100 Iteration: 60 Train loss: 0.246
Epoch: 1/100 Iteration: 65 Train loss: 0.250
Epoch: 2/100 Iteration: 70 Train loss: 0.245
Epoch: 2/100 Iteration: 75 Train loss: 0.252
Val acc: 0.549
Epoch: 2/100 Iteration: 80 Train loss: 0.249
Epoch: 2/100 Iteration: 85 Train loss: 0.246
Epoch: 2/100 Iteration: 90 Train loss: 0.252
Epoch: 2/100 Iteration: 95 Train loss: 0.247
Epoch: 2/100 Iteration: 100 Train loss: 0.246
Val acc: 0.559
Epoch: 2/100 Iteration: 105 Train loss: 

Epoch: 13/100 Iteration: 840 Train loss: 0.068
Epoch: 13/100 Iteration: 845 Train loss: 0.070
Epoch: 13/100 Iteration: 850 Train loss: 0.069
Val acc: 0.698
Epoch: 13/100 Iteration: 855 Train loss: 0.067
Epoch: 14/100 Iteration: 860 Train loss: 0.269
Epoch: 14/100 Iteration: 865 Train loss: 0.208
Epoch: 14/100 Iteration: 870 Train loss: 0.140
Epoch: 14/100 Iteration: 875 Train loss: 0.114
Val acc: 0.795
Epoch: 14/100 Iteration: 880 Train loss: 0.099
Epoch: 14/100 Iteration: 885 Train loss: 0.087
Epoch: 14/100 Iteration: 890 Train loss: 0.086
Epoch: 14/100 Iteration: 895 Train loss: 0.073
Epoch: 14/100 Iteration: 900 Train loss: 0.056
Val acc: 0.707
Epoch: 14/100 Iteration: 905 Train loss: 0.058
Epoch: 14/100 Iteration: 910 Train loss: 0.055
Epoch: 14/100 Iteration: 915 Train loss: 0.055
Epoch: 14/100 Iteration: 920 Train loss: 0.054
Epoch: 15/100 Iteration: 925 Train loss: 0.302
Val acc: 0.641
Epoch: 15/100 Iteration: 930 Train loss: 0.233
Epoch: 15/100 Iteration: 935 Train loss: 0.158


Epoch: 25/100 Iteration: 1650 Train loss: 0.043
Val acc: 0.697
Epoch: 26/100 Iteration: 1655 Train loss: 0.198
Epoch: 26/100 Iteration: 1660 Train loss: 0.270
Epoch: 26/100 Iteration: 1665 Train loss: 0.225
Epoch: 26/100 Iteration: 1670 Train loss: 0.215
Epoch: 26/100 Iteration: 1675 Train loss: 0.182
Val acc: 0.717
Epoch: 26/100 Iteration: 1680 Train loss: 0.172
Epoch: 26/100 Iteration: 1685 Train loss: 0.200
Epoch: 26/100 Iteration: 1690 Train loss: 0.183
Epoch: 26/100 Iteration: 1695 Train loss: 0.189
Epoch: 26/100 Iteration: 1700 Train loss: 0.199
Val acc: 0.717
Epoch: 26/100 Iteration: 1705 Train loss: 0.189
Epoch: 26/100 Iteration: 1710 Train loss: 0.190
Epoch: 26/100 Iteration: 1715 Train loss: 0.171
Epoch: 27/100 Iteration: 1720 Train loss: 0.158
Epoch: 27/100 Iteration: 1725 Train loss: 0.168
Val acc: 0.720
Epoch: 27/100 Iteration: 1730 Train loss: 0.172
Epoch: 27/100 Iteration: 1735 Train loss: 0.199
Epoch: 27/100 Iteration: 1740 Train loss: 0.183
Epoch: 27/100 Iteration: 174

Val acc: 0.867
Epoch: 38/100 Iteration: 2455 Train loss: 0.128
Epoch: 38/100 Iteration: 2460 Train loss: 0.108
Epoch: 38/100 Iteration: 2465 Train loss: 0.094
Epoch: 38/100 Iteration: 2470 Train loss: 0.068
Epoch: 38/100 Iteration: 2475 Train loss: 0.063
Val acc: 0.858
Epoch: 38/100 Iteration: 2480 Train loss: 0.042
Epoch: 38/100 Iteration: 2485 Train loss: 0.035
Epoch: 38/100 Iteration: 2490 Train loss: 0.032
Epoch: 38/100 Iteration: 2495 Train loss: 0.032
Epoch: 38/100 Iteration: 2500 Train loss: 0.028
Val acc: 0.678
Epoch: 38/100 Iteration: 2505 Train loss: 0.030
Epoch: 39/100 Iteration: 2510 Train loss: 0.318
Epoch: 39/100 Iteration: 2515 Train loss: 0.224
Epoch: 39/100 Iteration: 2520 Train loss: 0.086
Epoch: 39/100 Iteration: 2525 Train loss: 0.098
Val acc: 0.803
Epoch: 39/100 Iteration: 2530 Train loss: 0.083
Epoch: 39/100 Iteration: 2535 Train loss: 0.058
Epoch: 39/100 Iteration: 2540 Train loss: 0.048
Epoch: 39/100 Iteration: 2545 Train loss: 0.035
Epoch: 39/100 Iteration: 255

Epoch: 50/100 Iteration: 3260 Train loss: 0.042
Epoch: 50/100 Iteration: 3265 Train loss: 0.029
Epoch: 50/100 Iteration: 3270 Train loss: 0.027
Epoch: 50/100 Iteration: 3275 Train loss: 0.019
Val acc: 0.786
Epoch: 50/100 Iteration: 3280 Train loss: 0.021
Epoch: 50/100 Iteration: 3285 Train loss: 0.027
Epoch: 50/100 Iteration: 3290 Train loss: 0.028
Epoch: 50/100 Iteration: 3295 Train loss: 0.028
Epoch: 50/100 Iteration: 3300 Train loss: 0.027
Val acc: 0.711
Epoch: 51/100 Iteration: 3305 Train loss: 0.229
Epoch: 51/100 Iteration: 3310 Train loss: 0.077
Epoch: 51/100 Iteration: 3315 Train loss: 0.074
Epoch: 51/100 Iteration: 3320 Train loss: 0.105
Epoch: 51/100 Iteration: 3325 Train loss: 0.073
Val acc: 0.845
Epoch: 51/100 Iteration: 3330 Train loss: 0.059
Epoch: 51/100 Iteration: 3335 Train loss: 0.065
Epoch: 51/100 Iteration: 3340 Train loss: 0.042
Epoch: 51/100 Iteration: 3345 Train loss: 0.032
Epoch: 51/100 Iteration: 3350 Train loss: 0.036
Val acc: 0.810
Epoch: 51/100 Iteration: 335

Epoch: 62/100 Iteration: 4065 Train loss: 0.027
Epoch: 62/100 Iteration: 4070 Train loss: 0.024
Epoch: 62/100 Iteration: 4075 Train loss: 0.033
Val acc: 0.793
Epoch: 62/100 Iteration: 4080 Train loss: 0.020
Epoch: 62/100 Iteration: 4085 Train loss: 0.019
Epoch: 62/100 Iteration: 4090 Train loss: 0.017
Epoch: 63/100 Iteration: 4095 Train loss: 0.204
Epoch: 63/100 Iteration: 4100 Train loss: 0.071
Val acc: 0.861
Epoch: 63/100 Iteration: 4105 Train loss: 0.034
Epoch: 63/100 Iteration: 4110 Train loss: 0.028
Epoch: 63/100 Iteration: 4115 Train loss: 0.016
Epoch: 63/100 Iteration: 4120 Train loss: 0.029
Epoch: 63/100 Iteration: 4125 Train loss: 0.018
Val acc: 0.856
Epoch: 63/100 Iteration: 4130 Train loss: 0.012
Epoch: 63/100 Iteration: 4135 Train loss: 0.014
Epoch: 63/100 Iteration: 4140 Train loss: 0.026
Epoch: 63/100 Iteration: 4145 Train loss: 0.025
Epoch: 63/100 Iteration: 4150 Train loss: 0.017
Val acc: 0.773
Epoch: 63/100 Iteration: 4155 Train loss: 0.020
Epoch: 64/100 Iteration: 416

Epoch: 74/100 Iteration: 4870 Train loss: 0.035
Epoch: 74/100 Iteration: 4875 Train loss: 0.034
Val acc: 0.787
Epoch: 74/100 Iteration: 4880 Train loss: 0.025
Epoch: 75/100 Iteration: 4885 Train loss: 0.226
Epoch: 75/100 Iteration: 4890 Train loss: 0.087
Epoch: 75/100 Iteration: 4895 Train loss: 0.026
Epoch: 75/100 Iteration: 4900 Train loss: 0.027
Val acc: 0.860
Epoch: 75/100 Iteration: 4905 Train loss: 0.014
Epoch: 75/100 Iteration: 4910 Train loss: 0.024
Epoch: 75/100 Iteration: 4915 Train loss: 0.031
Epoch: 75/100 Iteration: 4920 Train loss: 0.033
Epoch: 75/100 Iteration: 4925 Train loss: 0.022
Val acc: 0.814
Epoch: 75/100 Iteration: 4930 Train loss: 0.020
Epoch: 75/100 Iteration: 4935 Train loss: 0.027
Epoch: 75/100 Iteration: 4940 Train loss: 0.023
Epoch: 75/100 Iteration: 4945 Train loss: 0.020
Epoch: 75/100 Iteration: 4950 Train loss: 0.019
Val acc: 0.783
Epoch: 76/100 Iteration: 4955 Train loss: 0.123
Epoch: 76/100 Iteration: 4960 Train loss: 0.038
Epoch: 76/100 Iteration: 496

Epoch: 86/100 Iteration: 5675 Train loss: 0.016
Val acc: 0.787
Epoch: 87/100 Iteration: 5680 Train loss: 0.147
Epoch: 87/100 Iteration: 5685 Train loss: 0.046
Epoch: 87/100 Iteration: 5690 Train loss: 0.032
Epoch: 87/100 Iteration: 5695 Train loss: 0.018
Epoch: 87/100 Iteration: 5700 Train loss: 0.008
Val acc: 0.867
Epoch: 87/100 Iteration: 5705 Train loss: 0.029
Epoch: 87/100 Iteration: 5710 Train loss: 0.024
Epoch: 87/100 Iteration: 5715 Train loss: 0.013
Epoch: 87/100 Iteration: 5720 Train loss: 0.010
Epoch: 87/100 Iteration: 5725 Train loss: 0.014
Val acc: 0.795
Epoch: 87/100 Iteration: 5730 Train loss: 0.016
Epoch: 87/100 Iteration: 5735 Train loss: 0.009
Epoch: 87/100 Iteration: 5740 Train loss: 0.013
Epoch: 88/100 Iteration: 5745 Train loss: 0.203
Epoch: 88/100 Iteration: 5750 Train loss: 0.067
Val acc: 0.852
Epoch: 88/100 Iteration: 5755 Train loss: 0.023
Epoch: 88/100 Iteration: 5760 Train loss: 0.017
Epoch: 88/100 Iteration: 5765 Train loss: 0.010
Epoch: 88/100 Iteration: 577

Val acc: 0.850
Epoch: 99/100 Iteration: 6480 Train loss: 0.016
Epoch: 99/100 Iteration: 6485 Train loss: 0.013
Epoch: 99/100 Iteration: 6490 Train loss: 0.012
Epoch: 99/100 Iteration: 6495 Train loss: 0.014
Epoch: 99/100 Iteration: 6500 Train loss: 0.020
Val acc: 0.860
Epoch: 99/100 Iteration: 6505 Train loss: 0.023
Epoch: 99/100 Iteration: 6510 Train loss: 0.010
Epoch: 99/100 Iteration: 6515 Train loss: 0.005
Epoch: 99/100 Iteration: 6520 Train loss: 0.010
Epoch: 99/100 Iteration: 6525 Train loss: 0.009
Val acc: 0.803
Epoch: 99/100 Iteration: 6530 Train loss: 0.004
Epoch: 100/100 Iteration: 6535 Train loss: 0.220
Epoch: 100/100 Iteration: 6540 Train loss: 0.074
Epoch: 100/100 Iteration: 6545 Train loss: 0.016
Epoch: 100/100 Iteration: 6550 Train loss: 0.012
Val acc: 0.861
Epoch: 100/100 Iteration: 6555 Train loss: 0.004
Epoch: 100/100 Iteration: 6560 Train loss: 0.010
Epoch: 100/100 Iteration: 6565 Train loss: 0.014
Epoch: 100/100 Iteration: 6570 Train loss: 0.018
Epoch: 100/100 Itera

## Testing

In [156]:
test_acc = []
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
        feed = {inputs_: x,
                labels_: y[:, None],
                keep_prob: 1,
                initial_state: test_state}
        batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
        test_acc.append(batch_acc)
    print("Test accuracy: {:.3f}".format(np.mean(test_acc)))

INFO:tensorflow:Restoring parameters from checkpoints/sentiment.ckpt
Test accuracy: 0.737
