Permalink
Browse files

added IMDB movie rating LSTM

  • Loading branch information...
jonkrohn committed May 10, 2017
1 parent 3aed230 commit 45aa8168fdfd84e0749e5d9cdbdcdc78fa8d1c77
Showing with 203 additions and 0 deletions.
  1. +203 −0 demos-for-talks/imdb_lstm.ipynb
@@ -0,0 +1,203 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# An LSTM for IMDB Review Classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### from [Jason Brownlee](http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using TensorFlow backend.\n"
]
}
],
"source": [
"import numpy\n",
"from keras.datasets import imdb\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense\n",
"from keras.layers import LSTM\n",
"from keras.layers.convolutional import Conv1D\n",
"from keras.layers.convolutional import MaxPooling1D\n",
"from keras.layers.embeddings import Embedding\n",
"from keras.preprocessing import sequence\n",
"numpy.random.seed(7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load data set, keeping only top *n* words, zero-ing the rest"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"top_words = 5000\n",
"(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Truncate and pad input sentences"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"max_review_length = 500\n",
"X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)\n",
"X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Start by building LSTM with three layers...\n",
"1. **embedded** with 32-length vectors to represent each word\n",
"2. **LSTM** with 100 memory units\n",
"3. **dense** output layer with single sigmoid neuron \n",
"\n",
"...and develop it to the below via the following steps:\n",
"* 3-layer LSTM: 87.7% after two epochs\n",
"* ditto but with dropout: 86.8% peak after two epochs\n",
"* ditto but with LSTM-layer specific recurrent_dropout: 84.8% peak after three epochs \n",
"* final network with convolutional layer outperforms with ~88% peak after two epochs"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"_________________________________________________________________\n",
"Layer (type) Output Shape Param # \n",
"=================================================================\n",
"embedding_1 (Embedding) (None, 500, 32) 160000 \n",
"_________________________________________________________________\n",
"conv1d_1 (Conv1D) (None, 500, 32) 3104 \n",
"_________________________________________________________________\n",
"max_pooling1d_1 (MaxPooling1 (None, 250, 32) 0 \n",
"_________________________________________________________________\n",
"lstm_1 (LSTM) (None, 100) 53200 \n",
"_________________________________________________________________\n",
"dense_1 (Dense) (None, 1) 101 \n",
"=================================================================\n",
"Total params: 216,405\n",
"Trainable params: 216,405\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n",
"None\n",
"Train on 25000 samples, validate on 25000 samples\n",
"Epoch 1/10\n",
"25000/25000 [==============================] - 169s - loss: 0.4265 - acc: 0.7901 - val_loss: 0.3143 - val_acc: 0.8648\n",
"Epoch 2/10\n",
"25000/25000 [==============================] - 166s - loss: 0.2553 - acc: 0.9006 - val_loss: 0.2961 - val_acc: 0.8807\n",
"Epoch 3/10\n",
"25000/25000 [==============================] - 172s - loss: 0.2133 - acc: 0.9178 - val_loss: 0.2907 - val_acc: 0.8796\n",
"Epoch 4/10\n",
"25000/25000 [==============================] - 170s - loss: 0.1833 - acc: 0.9314 - val_loss: 0.3054 - val_acc: 0.8802\n",
"Epoch 5/10\n",
"25000/25000 [==============================] - 167s - loss: 0.1554 - acc: 0.9434 - val_loss: 0.3512 - val_acc: 0.8770\n",
"Epoch 6/10\n",
"25000/25000 [==============================] - 168s - loss: 0.1347 - acc: 0.9520 - val_loss: 0.3603 - val_acc: 0.8720\n",
"Epoch 7/10\n",
"25000/25000 [==============================] - 167s - loss: 0.1195 - acc: 0.9599 - val_loss: 0.3999 - val_acc: 0.8725\n",
"Epoch 8/10\n",
"25000/25000 [==============================] - 169s - loss: 0.1018 - acc: 0.9658 - val_loss: 0.4413 - val_acc: 0.8682\n",
"Epoch 9/10\n",
"25000/25000 [==============================] - 167s - loss: 0.0873 - acc: 0.9722 - val_loss: 0.4350 - val_acc: 0.8698\n",
"Epoch 10/10\n",
"25000/25000 [==============================] - 168s - loss: 0.0753 - acc: 0.9766 - val_loss: 0.4781 - val_acc: 0.8715\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7efcc02e3a90>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embedding_vector_length = 32\n",
"model = Sequential()\n",
"model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length))\n",
"model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))\n",
"model.add(MaxPooling1D(pool_size=2))\n",
"model.add(LSTM(100))\n",
"model.add(Dense(1, activation='sigmoid'))\n",
"model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
"print(model.summary())\n",
"model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=64)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 45aa816

Please sign in to comment.