First our imports.  numpy and keras are the matrix and neural net packages (Keras is a 'fast to develop' wrapper of tensorflow).  In Keras the imdb dataset is pre-cleaned allowing us to focus on Neural Networks.

THe Sequential is just how we build the model (sequentially, adding layers).  We'll import the RNN, LSTM, and GRU cells as well as a special layer known as 'Embedding'

In [1]:
import numpy
import keras
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, SimpleRNN, GRU
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
# fix random seed for reproducibility
numpy.random.seed(7)

Using TensorFlow backend.


Now we will pull the top words from the dataset.  Everything not in this dataset is coded as 'unknown' Because the imdb dataset is sorted, it makes the data faster to load.  In addition, we create test/train splits

In [2]:
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)

Now we will make all our data the same length.  We set the maximum review length to be 500 and anything shorter than 500 tokens is padded with 0's at the beginning. Padding is at the beginning in RNNs as feeding 0s late in the cycle may cause memory loss (think about bi-directional challenges though!)

In [3]:
# truncate and pad input sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

Here we build a simple RNN model.  The embedding layer allows us to train our own word vectors.  It becomes in essence a lookup table for the word vectors as the first word "the" is encoded as "4"  That in essence means "Grab the 4th row of the embedding column as inputs"  Also because this is a weight matrix the weights will be learned.  (In fine detail the "4" becomes one hot encoded as a vectors with 0s everywhere except position 4, and multiplied by the weight matrix which 'selects' row 4.  However it is more effcient in implmentation to just make a lookup table than store a large matrix and multiply.

Ask yourself why are the number of parameters the way they are?  Can you calculate how many there should be?  Does it match? (Hint: It should!)

Notice we are validating are the test data!

In [4]:
# truncate and pad input sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)# create the model
embedding_vecor_length = 100
model1 = Sequential()
model1.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model1.add(SimpleRNN(100))
model1.add(Dense(1, activation='sigmoid'))
model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model1.summary())
model1.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 100)          500000    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 100)               20100     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
Total params: 520,201
Trainable params: 520,201
Non-trainable params: 0
_________________________________________________________________
None
Train on 25000 samples, validate on 25000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x28b18c56438>

Now let check how we performed on the test data overall

In [5]:
scores = model1.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 68.72%


Now we can swap out our RNN for an LSTM.  Keeping everything the same, how improved is it? (if at all)?  Can you make sense of the number of parameters here?

In [7]:
embedding_vecor_length = 100
model2 = Sequential()
model2.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model2.add(LSTM(100))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model2.summary())
model2.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 500, 100)          500000    
_________________________________________________________________
lstm_2 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 101       
Total params: 580,501
Trainable params: 580,501
Non-trainable params: 0
_________________________________________________________________
None
Train on 25000 samples, validate on 25000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x28b1c65fd30>

In [8]:
scores = model2.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 85.79%


Now lets use a GRU cell.  

In [9]:
embedding_vecor_length = 100
model3 = Sequential()
model3.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model3.add(GRU(100))
model3.add(Dense(1, activation='sigmoid'))
model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model3.summary())
model3.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 500, 100)          500000    
_________________________________________________________________
gru_1 (GRU)                  (None, 100)               60300     
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 101       
Total params: 560,401
Trainable params: 560,401
Non-trainable params: 0
_________________________________________________________________
None
Train on 25000 samples, validate on 25000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x28b238f4588>

In [10]:
scores = model3.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

KeyboardInterrupt: 

Now instead of training our own vectors let's try pre-trained GloVe vectors.  First I will read them in.  You must download them from the Glove website.

In [14]:
import numpy as np
embeddings_index = dict()
f = open('glove.6B/glove.6B.100d.txt','r',encoding='UTF-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

Now before we continue, let's take a look at what the data really was.  I like to tinker around in the data and make sure I understand what is happening. Its a good way to learn

In [15]:
NUM_WORDS=500  # only use top 1000 words
INDEX_FROM=3   # word index offset

train,test = keras.datasets.imdb.load_data(num_words=NUM_WORDS, index_from=INDEX_FROM)
train_x,train_y = train
test_x,test_y = test

word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+INDEX_FROM) for k,v in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2

id_to_word = {value:key for key,value in word_to_id.items()}
print(' '.join(id_to_word[id] for id in train_x[0] ))

<START> this film was just <UNK> <UNK> <UNK> <UNK> story direction <UNK> really <UNK> the part they played and you could just <UNK> being there <UNK> <UNK> is an amazing actor and now the same being director <UNK> father came from the same <UNK> <UNK> as <UNK> so i loved the fact there was a real <UNK> with this film the <UNK> <UNK> throughout the film were great it was just <UNK> so much that i <UNK> the film as <UNK> as it was <UNK> for <UNK> and would recommend it to everyone to watch and the <UNK> <UNK> was amazing really <UNK> at the end it was so <UNK> and you know what they say if you <UNK> at a film it must have been good and this definitely was also <UNK> to the two little <UNK> that played the <UNK> of <UNK> and <UNK> they were just <UNK> children are often left out of the <UNK> <UNK> i think because the stars that play them all <UNK> up are such a big <UNK> for the whole film but these children are amazing and should be <UNK> for what they have done don't you think the whole

What the heck is with the 5004 vocab size?  Did I go off my rocker, or is there some method to my madness? (Not crazy, my mother had me tested)
What I do here is replace the embedding matrix with the word vectors.  So when the embedding layer for the word 'the' comes up, my 100 length vector with 0's everywhere except position 4 will select the 4th rows of the embedding matrix, which we put in the glove vector for 'the'

In [16]:
vocabulary_size=5004
embedding_matrix = np.zeros((vocabulary_size, 100))
index=0
for word in word_to_id:
    index = word_to_id[word]
    if index > vocabulary_size - 1:
        pass
    else:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[index] = embedding_vector
        

Now you can check that GloVe and embedding matrix match

In [17]:
embedding_matrix[4]

array([-0.038194  , -0.24487001,  0.72812003, -0.39961001,  0.083172  ,
        0.043953  , -0.39140999,  0.3344    , -0.57545   ,  0.087459  ,
        0.28786999, -0.06731   ,  0.30906001, -0.26383999, -0.13231   ,
       -0.20757   ,  0.33395001, -0.33848   , -0.31742999, -0.48335999,
        0.1464    , -0.37303999,  0.34577   ,  0.052041  ,  0.44946   ,
       -0.46970999,  0.02628   , -0.54154998, -0.15518001, -0.14106999,
       -0.039722  ,  0.28277001,  0.14393   ,  0.23464   , -0.31020999,
        0.086173  ,  0.20397   ,  0.52623999,  0.17163999, -0.082378  ,
       -0.71787   , -0.41531   ,  0.20334999, -0.12763   ,  0.41367   ,
        0.55186999,  0.57907999, -0.33476999, -0.36559001, -0.54856998,
       -0.062892  ,  0.26583999,  0.30204999,  0.99774998, -0.80480999,
       -3.0243001 ,  0.01254   , -0.36941999,  2.21670008,  0.72201002,
       -0.24978   ,  0.92136002,  0.034514  ,  0.46744999,  1.10790002,
       -0.19358   , -0.074575  ,  0.23353   , -0.052062  , -0.22

In [18]:
embeddings_index['the']

array([-0.038194, -0.24487 ,  0.72812 , -0.39961 ,  0.083172,  0.043953,
       -0.39141 ,  0.3344  , -0.57545 ,  0.087459,  0.28787 , -0.06731 ,
        0.30906 , -0.26384 , -0.13231 , -0.20757 ,  0.33395 , -0.33848 ,
       -0.31743 , -0.48336 ,  0.1464  , -0.37304 ,  0.34577 ,  0.052041,
        0.44946 , -0.46971 ,  0.02628 , -0.54155 , -0.15518 , -0.14107 ,
       -0.039722,  0.28277 ,  0.14393 ,  0.23464 , -0.31021 ,  0.086173,
        0.20397 ,  0.52624 ,  0.17164 , -0.082378, -0.71787 , -0.41531 ,
        0.20335 , -0.12763 ,  0.41367 ,  0.55187 ,  0.57908 , -0.33477 ,
       -0.36559 , -0.54857 , -0.062892,  0.26584 ,  0.30205 ,  0.99775 ,
       -0.80481 , -3.0243  ,  0.01254 , -0.36942 ,  2.2167  ,  0.72201 ,
       -0.24978 ,  0.92136 ,  0.034514,  0.46745 ,  1.1079  , -0.19358 ,
       -0.074575,  0.23353 , -0.052062, -0.22044 ,  0.057162, -0.15806 ,
       -0.30798 , -0.41625 ,  0.37972 ,  0.15006 , -0.53212 , -0.2055  ,
       -1.2526  ,  0.071624,  0.70565 ,  0.49744 , 

So now I will redo the simple RNN, notice the change to the embedding layer where I tell it the weights and set the layer to be frozen of 'trainable=False'.  That means do not do weight calculations for our embedding matrix.  Notice how I also resized the Embedding layer-> top_words+4

In [None]:
# create the model
embedding_vecor_length = 100
model4 = Sequential()
model4.add(Embedding(top_words+4, embedding_vecor_length, weights=[embedding_matrix], input_length=max_review_length, trainable=False))
model4.add(SimpleRNN(100))
model4.add(Dense(1, activation='sigmoid'))
model4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model4.summary())
model4.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 500, 100)          500400    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 100)               20100     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 101       
Total params: 520,601
Trainable params: 20,201
Non-trainable params: 500,400
_________________________________________________________________
None
Train on 25000 samples, validate on 25000 samples
Epoch 1/3

In [None]:
scores = model4.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Let's do a GRU with pre-trained vectors.  Same issues as before.

In [None]:
embedding_vecor_length = 100
model5 = Sequential()
model5.add(Embedding(top_words+4, embedding_vecor_length, weights=[embedding_matrix], input_length=max_review_length, trainable=False))
model5.add(GRU(100))
model5.add(Dense(1, activation='sigmoid'))
model5.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model5.summary())
model5.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

In [None]:
scores = model5.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))