# RNN using LSTM

## Sentimental Analysis

**Importing all the required packages**

In [1]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding, Flatten, LSTM
from keras.datasets import imdb
import keras
from numpy import array

Using TensorFlow backend.


**Declaring variables**

In [2]:
max_features = 10000
maxlen = 250
batch_size = 20
embedding_dims = 16
epochs = 15

**Loading data**

In [3]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = max_features)

In [4]:
x_train

array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),
       list([1, 194, 1153, 194, 8255, 78, 228,

**Padding the dataset**

In [5]:
x_train = sequence.pad_sequences(x_train, maxlen=maxlen, padding='post', truncating='post', value=0, dtype='int32')
x_test = sequence.pad_sequences(x_test, maxlen=maxlen, padding='post', truncating='post', value=0, dtype='int32')

Here we are `padding` zero at the end if length of array is less then maxlen and `truncating` it from the end if it is greater then maxlength

In [6]:
x_train

array([[   1,   14,   22, ...,    0,    0,    0],
       [   1,  194, 1153, ...,    0,    0,    0],
       [   1,   14,   47, ...,    0,    0,    0],
       ...,
       [   1,   11,    6, ...,    0,    0,    0],
       [   1, 1446, 7079, ...,    0,    0,    0],
       [   1,   17,    6, ...,    0,    0,    0]])

As we can see all arrays are of same length now

**Peek at the reviews**

In [7]:
word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+3) for k,v in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
id_to_word = {value:key for key,value in word_to_id.items()}

In [8]:
print(' '.join(id_to_word[id] for id in x_train[0] ))

<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for wha

id = 1 denotes the start of the review and 0 denotes the padded values. Further, all the unknown charaters are replaced by UNK

## Applying Sequential Model

In [9]:
model = Sequential()
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
#model.add(Flatten())
model.add(LSTM(128))
model.add(Dense(128, activation='relu'))
model.add(Dense(1,activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

In [10]:
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 250, 16)           160000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               74240     
_________________________________________________________________
dense_1 (Dense)              (None, 128)               16512     
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 129       
Total params: 250,881
Trainable params: 250,881
Non-trainable params: 0
_________________________________________________________________
None


**Fitting the model**

In [11]:
model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size = batch_size, epochs=epochs, verbose = 1)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 25000 samples, validate on 25000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.callbacks.History at 0x26528dfe748>

***Testing something***

In [12]:
bad = "this movie was not good and acting was very bad it was a total waste of time"
good = "this is very good and amazing"
for review in [good,bad]:
    tmp = []
    for word in review.split(" "):
        tmp.append(word_to_id[word])
    tmp_padded = sequence.pad_sequences([tmp], maxlen=maxlen) 
    print("%s.--> Sentiment: %s" % (review,model.predict(array([tmp_padded][0]))[0][0]))

this is very good and amazing.--> Sentiment: 0.9622151
this movie was not good and acting was very bad it was a total waste of time.--> Sentiment: 0.037425973
