# Recurring Neural Networks with Keras

## Sentiment analysis from movie reviews (imdb)

https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification

So we are going to use an RNN to do sentiment analysis on full-text movie reviews!

We're going to train an artificial neural network how to "read" movie reviews and guess whether the author liked the movie or not from them.

In [1]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb  # 영화 평점 및 리뷰 데이터셋

Using TensorFlow backend.


In [2]:
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=20000)

Loading data...


In [3]:
x_train[0]  # bag of words

[1,
 14,
 22,
 16,
 43,
 530,
 973,
 1622,
 1385,
 65,
 458,
 4468,
 66,
 3941,
 4,
 173,
 36,
 256,
 5,
 25,
 100,
 43,
 838,
 112,
 50,
 670,
 2,
 9,
 35,
 480,
 284,
 5,
 150,
 4,
 172,
 112,
 167,
 2,
 336,
 385,
 39,
 4,
 172,
 4536,
 1111,
 17,
 546,
 38,
 13,
 447,
 4,
 192,
 50,
 16,
 6,
 147,
 2025,
 19,
 14,
 22,
 4,
 1920,
 4613,
 469,
 4,
 22,
 71,
 87,
 12,
 16,
 43,
 530,
 38,
 76,
 15,
 13,
 1247,
 4,
 22,
 17,
 515,
 17,
 12,
 16,
 626,
 18,
 19193,
 5,
 62,
 386,
 12,
 8,
 316,
 8,
 106,
 5,
 4,
 2223,
 5244,
 16,
 480,
 66,
 3785,
 33,
 4,
 130,
 12,
 16,
 38,
 619,
 5,
 25,
 124,
 51,
 36,
 135,
 48,
 25,
 1415,
 33,
 6,
 22,
 12,
 215,
 28,
 77,
 52,
 5,
 14,
 407,
 16,
 82,
 10311,
 8,
 4,
 107,
 117,
 5952,
 15,
 256,
 4,
 2,
 7,
 3766,
 5,
 723,
 36,
 71,
 43,
 530,
 476,
 26,
 400,
 317,
 46,
 7,
 4,
 12118,
 1029,
 13,
 104,
 88,
 4,
 381,
 15,
 297,
 98,
 32,
 2071,
 56,
 26,
 141,
 6,
 194,
 7486,
 18,
 4,
 226,
 22,
 21,
 134,
 476,
 26,
 480,
 5,
 144,
 30,

In [4]:
y_train[0]   # 1 = like

1

In [5]:
# limit the reviews to their first 80 words
x_train = sequence.pad_sequences(x_train, maxlen=80)
x_test = sequence.pad_sequences(x_test, maxlen=80)

In [6]:
model = Sequential()
model.add(Embedding(20000, 128))            # input
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))   # classification of 0 or 1

In [7]:
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Now we will actually train our model. RNN's, like CNN's, are very resource heavy. Keeping the batch size relatively small is the key to enabling this to run on your PC at all. In the real word of course, you'd be taking advantage of GPU's installed across many computers on a cluster to make this scale a lot better.

## Warning

This will take a very long time to run, even on a fast PC! Don't execute the next block unless you're prepared to tie up your computer for an hour or more.

In [8]:
model.fit(x_train, y_train,
          batch_size=32,
          epochs=15,
          verbose=2)      
        # validation_data=(x_test, y_test)

Epoch 1/15
 - 164s - loss: 0.4581 - acc: 0.7820
Epoch 2/15
 - 168s - loss: 0.2961 - acc: 0.8800
Epoch 3/15
 - 171s - loss: 0.2130 - acc: 0.9177
Epoch 4/15
 - 172s - loss: 0.1507 - acc: 0.9442
Epoch 5/15
 - 173s - loss: 0.1069 - acc: 0.9605
Epoch 6/15
 - 175s - loss: 0.0766 - acc: 0.9737
Epoch 7/15
 - 176s - loss: 0.0550 - acc: 0.9813
Epoch 8/15
 - 199s - loss: 0.0477 - acc: 0.9839
Epoch 9/15
 - 184s - loss: 0.0303 - acc: 0.9903
Epoch 10/15
 - 190s - loss: 0.0275 - acc: 0.9919
Epoch 11/15
 - 181s - loss: 0.0193 - acc: 0.9938
Epoch 12/15
 - 174s - loss: 0.0152 - acc: 0.9948
Epoch 13/15
 - 173s - loss: 0.0135 - acc: 0.9952
Epoch 14/15
 - 172s - loss: 0.0136 - acc: 0.9955
Epoch 15/15
 - 172s - loss: 0.0114 - acc: 0.9962


<keras.callbacks.History at 0x2b5b91e3b70>

In [9]:
score, acc = model.evaluate(x_test, y_test,
                            batch_size=32,
                            verbose=2)
print('Test score:', score)
print('Test accuracy:', acc)

Test score: 1.1437684313173593
Test accuracy: 0.81012


81%. Not too bad, considering we limited ourselves to just the first 80 words of each review.

But again - stop and think about what we just made here! A neural network that can "read" reviews and deduce whether the author liked the movie or not based on that text. And it takes the context of each word and its position in the review into account - and setting up the model itself was just a few lines of code! It's pretty incredible what you can do with Keras.