<a href="https://colab.research.google.com/github/varunkr24/Natural-Language-Processing/blob/Python/Sentiment%20Analyses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:

from keras.datasets import imdb

In [2]:
from keras.preprocessing.sequence import pad_sequences
vocab_size = 10000 #vocab size
maxlen = 30  #number of word used from each review

In [3]:
#load dataset as a list of ints
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
#make all sequences of the same length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In [4]:
print (x_train[1])

[ 371   78   22  625   64 1382    9    8  168  145   23    4 1690   15
   16    4 1355    5   28    6   52  154  462   33   89   78  285   16
  145   95]


In [5]:
print (x_train[1].shape)

(30,)


In [6]:
print(y_train[1])

0


In [7]:
import numpy as np

unique_elements, counts_elements = np.unique(y_train, return_counts=True)
print(np.asarray((unique_elements, counts_elements)))

[[    0     1]
 [12500 12500]]



WORD INDEX BUILDING    
Get the word index and then Create a key-value pair for word and word_id (12.5 points)

In [8]:
word_index = imdb.get_word_index()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


In [9]:
#ref :: https://stackoverflow.com/questions/41971587/how-to-convert-predicted-sequence-back-to-text-in-keras
reverse_word_map = dict(map(reversed, word_index.items()))

In [10]:
# Function takes a tokenized sentence and returns the words
def sequence_to_text(list_of_indices):
    # Looking up words in dictionary
    words = [reverse_word_map.get(letter) for letter in list_of_indices]
    return(words)

In [11]:
#test
review = sequence_to_text(x_train[0])
print(review)

['but', 'when', 'from', 'one', 'bit', 'then', 'have', 'two', 'of', 'script', 'their', 'with', 'her', 'nobody', 'most', 'that', 'with', "wasn't", 'to', 'with', 'armed', 'acting', 'watch', 'an', 'for', 'with', 'heartfelt', 'film', 'want', 'an']


Build Keras Embedding Layer Model     
We can think of the Embedding layer as a dicionary that maps a index assigned to a word to a word vector. This layer is very flexible and can be used in a few ways:    

The embedding layer can be used at the start of a larger deep learning model.    
Also we could load pre-train word embeddings into the embedding layer when we create our model.    
Use the embedding layer to train our own word2vec models.    
The keras embedding layer doesn't require us to onehot encode our words, instead we have to give each word a unqiue intger number as an id. For the imdb dataset we've loaded this has already been done, but if this wasn't the case we could use sklearn LabelEncoder.

In [12]:
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.layers import LSTM
### create the model
model = Sequential()
model.add(Embedding(vocab_size, 128, trainable=True, input_length=maxlen))
model.add(LSTM(units=64, dropout=0.2))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
### Fit the model
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, batch_size=500, verbose=1)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 30, 128)           1280000   
_________________________________________________________________
lstm (LSTM)                  (None, 64)                49408     
_________________________________________________________________
dense (Dense)                (None, 32)                2080      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
Total params: 1,331,521
Trainable params: 1,331,521
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f2da994fc10>


Model Accuracy   
Report the Accuracy of the model

In [13]:
# Final evaluation of the model
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 75.45%


In [14]:
y_pred = model.predict(x_test)

In [15]:
print(y_pred)

[[9.9546659e-01]
 [9.9999595e-01]
 [9.9616325e-01]
 ...
 [5.8326125e-04]
 [6.6266179e-02]
 [9.9995315e-01]]


In [16]:
y_pred = np.round(y_pred, 0)

In [17]:
y_pred = y_pred.ravel()
y_pred.shape

(25000,)

In [18]:
y_pred = y_pred.astype('int64')

In [19]:
y_test.ravel()
y_test

array([0, 1, 1, ..., 0, 0, 0])

In [20]:
from sklearn.metrics import classification_report
target_names = ['Sentiment_Positive', 'Sentiment_Negative']
print(classification_report(y_test, y_pred, target_names=target_names))

                    precision    recall  f1-score   support

Sentiment_Positive       0.75      0.76      0.76     12500
Sentiment_Negative       0.76      0.75      0.75     12500

          accuracy                           0.75     25000
         macro avg       0.75      0.75      0.75     25000
      weighted avg       0.75      0.75      0.75     25000



In [21]:
sequence_to_text(x_test[0])

['by',
 'are',
 'be',
 'favourites',
 'all',
 'family',
 'turn',
 'in',
 'does',
 'as',
 'three',
 'part',
 'in',
 'another',
 'some',
 'to',
 'be',
 'probably',
 'with',
 'world',
 'and',
 'her',
 'an',
 'have',
 'faint',
 'beginning',
 'own',
 'as',
 'is',
 'sequence']

In [22]:
sequence_to_text(x_test[1])

['good',
 '2',
 'which',
 'why',
 'super',
 'as',
 'it',
 'main',
 'of',
 'my',
 'i',
 'i',
 '\x96',
 'if',
 'time',
 'screenplay',
 'in',
 'same',
 'this',
 'remember',
 'assured',
 'have',
 'action',
 'one',
 'in',
 'realistic',
 'that',
 'better',
 'of',
 'lessons']

Retrieve the output of each layer in Keras for a given single test sample from the trained model you built

In [28]:
from tensorflow.keras import Input

In [29]:
from keras import backend as K


inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
print (outputs)
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs]    # evaluation functions

# Testing
test = x_test[0][np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]


print (layer_outs)

[<KerasTensor: shape=(None, 30, 128) dtype=float32 (created by layer 'embedding')>, <KerasTensor: shape=(None, 64) dtype=float32 (created by layer 'lstm')>, <KerasTensor: shape=(None, 32) dtype=float32 (created by layer 'dense')>, <KerasTensor: shape=(None, 1) dtype=float32 (created by layer 'dense_1')>]


ValueError: ignored