# Bidirection LSTM - IMDB sentiment classification

see **https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py**

see **https://github.com/transcranial/keras-js/blob/master/notebooks/demos/imdb_bidirectional_lstm.ipynb**

In [1]:
KERAS_MODEL_FILEPATH = 'imdb_bidirectional_lstm.h5'

## Load our libraries

In [2]:
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Input, Bidirectional
from keras.datasets import imdb
from keras.callbacks import EarlyStopping, ModelCheckpoint

import json

Using TensorFlow backend.


## Prepare our Data

max_features - How many words from our dataset do we want to use? The top 20,000 most common.

maxlen - We're going to pad or truncate all the reviews after 200 words.

imdb.load_data - We're pulling from a 'toy' dataset and using 20,000 of the to words. The words are already in a tokenized format. (eg. 10 = the, 11=movie, 171=great)

Create a 50/50 split of test and train data

we now need to pad our sequences to 200 words. Keras pads from the beginning of the senquence.

Now, we have a 2 dimensional array size.

In [3]:
max_features = 20000
maxlen = 200  # cut texts after this number of words (among top max_features most common words)

print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print("Pad sequences (samples x time)")
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
y_train = np.array(y_train)
y_test = np.array(y_test)

Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 200)
X_test shape: (25000, 200)


## Define our Model Architecture

We're creating a "Sequential" model. Think of this as a for loop or do-while with a single exit point. Beginning to end.

We're using a "Word Embedding" layer. Allow me to explain...

Next layer is an Bidirectional LSTM. Remember, LTSMs are "smart" neurons that can remember the events that preceeded them. This helps provide context to the use of words. "I did hate" and "I didn't hate" are very different sentiments but both have the word "Hate" in them. Without taking the previous word into consideration, the sentiment will be wrong. This is a 32 neuron wide layer. Wider layers can be smarter. But, deeper architectures provide far more value. Unless you specify, the default "Activation" function is Tanh.

We're using a .5 dropout. This seems a little high to me, but its possible this dataset overfits quickly.

Lastly, we're adding a dense layer. There's a direct relationship between number of neurons and number of "Things" you're looking for. In our case, we're finding a binary output, 0 or 1. We only want an activation function to activate on 1, so anything lower than .50 will be considered 0, thus only having 1 neuron.

Finally, we compile our model. We're using the "Adam" optimizer with the "Binary Crossentropy" loss function.

In [4]:
model = Sequential()
model.add(Embedding(max_features, 64, input_length=maxlen))
model.add(Bidirectional(LSTM(32)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])

## Training our model

The "Checkpointer" is a method of saving your trained model periodically

Early Stopping prevents you from over training your model beyond the point of diminishing returns.

Batch size, tweak this value a bit. You don't want a batch size too large, but too small and training will take forever!

Epochs, again, this is how many times you want your model to look at all your data.

model.fit is where we begin training our data.

    x_train - our 25,000 training datapoints
    y_train - our "ground truth" labels. (the back of the flash card)
    
    callbacks - Tell the trainer to occasionally update your variables defined earlier.

In [5]:
# Model saving callback
checkpointer = ModelCheckpoint(filepath=KERAS_MODEL_FILEPATH, monitor='val_acc', verbose=1, save_best_only=True)

# Early stopping
early_stopping = EarlyStopping(monitor='val_acc', verbose=1, patience=2)

# train
batch_size = 128
epochs = 10
model.fit(X_train, y_train, 
          validation_data=[X_test, y_test],
          batch_size=batch_size, epochs=epochs, verbose=1,
          callbacks=[checkpointer, early_stopping])

Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 00003: early stopping


<keras.callbacks.History at 0x7f421566c9e8>

## Lets try out some models

**https://transcranial.github.io/keras-js/#/imdb-bidirectional-lstm**
