# DEEP LEARNING: IMDB MOVIE DATA SET CLASSIFICATION USING RCNN

The IMDB Movies Dataset contains information about 14,762 movies. Information about these movies was downloaded with wget for the purpose of creating a movie recommendation app. The data was preprocessed and cleaned to be ready for machine learning applications.

For more information, please refer to:
- https://www.tensorflow.org/datasets/catalog/imdb_reviews
- https://github.com/orgesleka/filmempfehlung

### Libraries

In [1]:
from __future__ import print_function

In [2]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import Conv1D, MaxPooling1D
from keras.layers import LSTM
from keras.datasets import imdb

Using TensorFlow backend.
  return f(*args, **kwds)


In [3]:
max_features = 20000
maxlen = 80
filters = 64
batch_size = 32
kernel_size = 5
pool_size = 4

In [4]:
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
25000 train sequences
25000 test sequences


In [5]:
x_train.shape, x_test.shape

((25000,), (25000,))

### Sequence Padding

In [6]:
print('Pad sequences (samples by time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

Pad sequences (samples by time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)


In [7]:
x_train.shape, x_test.shape

((25000, 80), (25000, 80))

### Build RCNN Architecture

This section we construct a convolutional layer, a MaxPooling layer, and a LSTM layer. For technical details about the architecture and why does the neural network family works the way it works, please check out [my video](https://youtu.be/zhBLiMdqOdQ).

In [24]:
print('Build model ...')
model = Sequential()
model.add(Embedding(max_features, 128*2))
model.add(Conv1D(filters, kernel_size, padding='valid', activation='relu', strides=1))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, padding='valid', activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, 2, padding='valid', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

Build model ...


In [25]:
print('Summarize ...')
model.summary()

Summarize ...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, None, 256)         5120000   
_________________________________________________________________
conv1d_12 (Conv1D)           (None, None, 64)          81984     
_________________________________________________________________
max_pooling1d_12 (MaxPooling (None, None, 64)          0         
_________________________________________________________________
conv1d_13 (Conv1D)           (None, None, 64)          20544     
_________________________________________________________________
max_pooling1d_13 (MaxPooling (None, None, 64)          0         
_________________________________________________________________
conv1d_14 (Conv1D)           (None, None, 64)          8256      
_________________________________________________________________
max_pooling1d_14 (MaxPooling (None, None, 64)          0      

In [26]:
print('Compile ...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Compile ...


### Train, Test, and Performance

In [27]:
print('Train ...')
model.fit(x_train, y_train, batch_size=batch_size, epochs=30, validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Train ...
Train on 25000 samples, validate on 25000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Test score: 1.78482596172
Test accuracy: 0.74336


Investigation ends here.