# **RNN**
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.

IMDB sentiment classification task

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. IMDB provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

You can download the dataset from http://ai.stanford.edu/~amaas/data/sentiment/  or you can directly use 
" from keras.datasets import imdb " to import the dataset.

Few points to be noted:
Modules like SimpleRNN, LSTM, Activation layers, Dense layers, Dropout can be directly used from keras
For preprocessing, you can use required 

In [1]:
#load the imdb dataset 
from keras.datasets import imdb

vocabulary_size = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)
print('Loaded dataset with {} training samples, {} test samples'.format(len(X_train), len(X_test)))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
Loaded dataset with 25000 training samples, 25000 test samples


In [2]:
# the review is stored as a sequence of integers. 
# These are word IDs that have been pre-assigned to individual words, and the label is an integer

print('---review---')
print(X_train[0])
print('---label---')
print(y_train[0])

# to get the actual review
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print('---review with words---')
print([id2word.get(i, ' ') for i in X_train[0]])
print('---label---')
print(y_train[0])

---review---
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]
---label---
1
Downloading data from https://storage.googleapis

In [3]:
# pad sequences (write your code here)
from keras.preprocessing import sequence

# Parameter
mxlen = 500

X_train = sequence.pad_sequences(X_train, maxlen = mxlen)
X_test = sequence.pad_sequences(X_test, maxlen = mxlen)


In [4]:
#design a RNN model (write your code)

from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout, SimpleRNN

model = Sequential()

model.add(Embedding(vocabulary_size, 32, input_length=mxlen))
model.add(SimpleRNN(100))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 500, 32)           160000    
                                                                 
 simple_rnn (SimpleRNN)      (None, 100)               13300     
                                                                 
 dense (Dense)               (None, 1)                 101       
                                                                 
Total params: 173,401
Trainable params: 173,401
Non-trainable params: 0
_________________________________________________________________
None


In [5]:
# train and evaluate your model
# choose your loss function and optimizer and mention the reason to choose that particular loss function and optimizer
# use accuracy as the evaluation metric

model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [6]:
batch_size = 64
num_epochs = 10

X_valid, y_valid = X_train[:batch_size], y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], y_train[batch_size:]
model.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f7f88f1d950>

In [7]:
#evaluate the model using model.evaluate()
scores = model.evaluate(X_test, y_test, batch_size=64)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Test loss: 0.5411338806152344
Test accuracy: 0.7382799983024597


# **LSTM**

Instead of using a RNN, now try using a LSTM model and compare both of them. Which of those performed better and why ?


In [8]:
model2=Sequential()

model2.add(Embedding(vocabulary_size, 32, input_length=mxlen))
model2.add(LSTM(100))
model2.add(Dense(1, activation='sigmoid'))
print(model2.summary())

model2.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

batch_size = 64
num_epochs = 10
X_valid, y_valid = X_train[:batch_size], y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], y_train[batch_size:]
model2.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs)

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 500, 32)           160000    
                                                                 
 lstm (LSTM)                 (None, 100)               53200     
                                                                 
 dense_1 (Dense)             (None, 1)                 101       
                                                                 
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f7f8a45dfd0>

Perform Error analysis and explain using few examples.

In [9]:
scores = model2.evaluate(X_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Test loss: 0.5329552888870239
Test accuracy: 0.8600800037384033


**Reasons for choosing these parameters**   
Since this is a binary classification task(positive or negative review), we can use binary crossentropy loss. This is because crossentropy loss is used in case of classification tasks, and given that the classification here is binary, we can use binary crossentropy loss.

Adam is used because this optimizer is compact and computationally efficient.

**Evaluating RNN model on 1000 random examples**

In [10]:
import random
import numpy as np
indices = random.sample(range(1, 25000), 1000)

# prediction, ground truth
result00 = 0
result01 = 0
result10 = 0
result11 = 0

for i in indices:
    scores = model.evaluate(np.array([X_test[i]]), np.array([y_test[i]]))
    
    if scores[1]==0 and y_test[i]==0: result10+=1
    if scores[1]==1 and y_test[i]==0: result00+=1
    if scores[1]==0 and y_test[i]==1: result01+=1
    if scores[1]==1 and y_test[i]==1: result11+=1

print()
print(f"Ground truth was negative, and model was correct = {result00}")
print(f"Ground truth was negative, and model was wrong = {result10}")
print(f"Ground truth was positive, and model was correct = {result11}")
print(f"Ground truth was positive, and model was wrong = {result01}")
print()
print(f"Accuracy when ground truth is negative = {100*result00/(result00+result10)}")
print(f"Accuracy when ground truth is positive = {100*result11/(result11+result10)}")


Ground truth was negative, and model was correct = 358
Ground truth was negative, and model was wrong = 123
Ground truth was positive, and model was correct = 384
Ground truth was positive, and model was wrong = 135

Accuracy when ground truth is negative = 74.42827442827443
Accuracy when ground truth is positive = 75.7396449704142


**Evaluating LSTM model on 1000 random examples**

In [11]:
# prediction, ground truth
result00 = 0
result01 = 0
result10 = 0
result11 = 0

for i in indices:
  scores = model2.evaluate(np.array([X_test[i]]), np.array([y_test[i]]))

  if scores[1]==0 and y_test[i]==0: result10+=1
  if scores[1]==1 and y_test[i]==0: result00+=1
  if scores[1]==0 and y_test[i]==1: result01+=1
  if scores[1]==1 and y_test[i]==1: result11+=1

print()
print(f"Ground truth was negative, and model was correct = {result00}")
print(f"Ground truth was negative, and model was wrong = {result10}")
print(f"Ground truth was positive, and model was correct = {result11}")
print(f"Ground truth was positive, and model was wrong = {result01}")
print()
print(f"Accuracy when ground truth is negative = {100*result00/(result00+result10)}")
print(f"Accuracy when ground truth is positive = {100*result11/(result11+result10)}")


Ground truth was negative, and model was correct = 424
Ground truth was negative, and model was wrong = 57
Ground truth was positive, and model was correct = 436
Ground truth was positive, and model was wrong = 83

Accuracy when ground truth is negative = 88.14968814968815
Accuracy when ground truth is positive = 88.43813387423936


**Inferences**  
I have plucked **1000 random test instances and evaluated accuracies for positive and negative reviews** individually to check whether the model is skewed in any direction.  

For both LSTM and RNN, the accuracy on negative and positive reviews are similar. Mostly accuracy of LSTM is higher than RNN. Review analysis requires retaining the information gathered throughout the review. LSTM can retain the information over the entire review text whereas RNNs do not have such long term memory. Due to the problem of vanishing gradients, the accuracy of RNN is much lesser, since it can't retain sentiment information over the course of the entire review. Thus, LSTMs have a better accuracy in most runs.


Other reason for low accuracies might be that sentence lacks cleaning and the review seems broken, this could be the case with all the statements. The solution is construct better embeddings for the words before running LSTM models on them. Simple id mapping is not sufficient to obtain good results. I have printed an example sentence in the cell below. We can see that the sentences are improper and lack cleaning. 'this was me best says br as genre' seems incoherent, and similar case might be with the entire dataset. Also, just using simple ID is not sufficient for tasks like sentiment analysis, and instead, things like word embeddings should be used.

In [16]:
sent_ind = X_train[0]
sent = [id2word[i] for i in sent_ind if i!=0]
print(sent)

['the', 'as', 'you', 'with', 'out', 'themselves', 'powerful', 'lets', 'loves', 'their', 'becomes', 'reaching', 'had', 'journalist', 'of', 'lot', 'from', 'anyone', 'to', 'have', 'after', 'out', 'atmosphere', 'never', 'more', 'room', 'and', 'it', 'so', 'heart', 'shows', 'to', 'years', 'of', 'every', 'never', 'going', 'and', 'help', 'moments', 'or', 'of', 'every', 'chest', 'visual', 'movie', 'except', 'her', 'was', 'several', 'of', 'enough', 'more', 'with', 'is', 'now', 'current', 'film', 'as', 'you', 'of', 'mine', 'potentially', 'unfortunately', 'of', 'you', 'than', 'him', 'that', 'with', 'out', 'themselves', 'her', 'get', 'for', 'was', 'camp', 'of', 'you', 'movie', 'sometimes', 'movie', 'that', 'with', 'scary', 'but', 'and', 'to', 'story', 'wonderful', 'that', 'in', 'seeing', 'in', 'character', 'to', 'of', '70s', 'and', 'with', 'heart', 'had', 'shadows', 'they', 'of', 'here', 'that', 'with', 'her', 'serious', 'to', 'have', 'does', 'when', 'from', 'why', 'what', 'have', 'critics', 'they'

In [15]:
import numpy as np

sentence1 = "this movie is amazing and very good"
sentence2 = "this movie is very bad and it sucks"
sent1_array = sentence1.split()
sent2_array = sentence2.split()
sent1_indices = [word2id[i] for i in sent1_array]
sent2_indices = [word2id[i] for i in sent2_array]
my_training_data_x = [sent1_indices, sent2_indices]
my_training_data_x = sequence.pad_sequences(my_training_data_x, maxlen=500)
my_training_data_y = np.array([1, 0])
model2.evaluate(my_training_data_x, my_training_data_y)
model.evaluate(my_training_data_x, my_training_data_y)



[0.4451318085193634, 0.5]

Above, we can see the example of positive and negative sentiment reviews as well, whose percentages are analysed. Also, custom sentences of positive and negative were taken, and both RNN and LSTM got only one correct.