# DEEP LEARNING: IMDB MOVIE DATA SET CLASSIFICATION USING RCNN

The IMDB Movies Dataset contains information about 14,762 movies. Information about these movies was downloaded with wget for the purpose of creating a movie recommendation app. The data was preprocessed and cleaned to be ready for machine learning applications.

For more information, please refer to:
- https://www.tensorflow.org/datasets/catalog/imdb_reviews
- https://github.com/orgesleka/filmempfehlung

### Libraries

In [20]:
from __future__ import print_function

In [21]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import Conv1D, MaxPooling1D
from keras.layers import LSTM
from keras.datasets import imdb

In [27]:
max_features = 20000
maxlen = 80
filters = 64
batch_size = 32
kernel_size = 5
pool_size = 4

In [15]:
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

Loading data...
25000 train sequences
25000 test sequences


In [16]:
x_train.shape, x_test.shape

((25000,), (25000,))

### Sequence Padding

In [17]:
print('Pad sequences (samples by time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

Pad sequences (samples by time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)


In [18]:
x_train.shape, x_test.shape

((25000, 80), (25000, 80))

### Build RCNN Architecture

This section we construct a convolutional layer, a MaxPooling layer, and a LSTM layer. For technical details about the architecture and why does the neural network family works the way it works, please check out [my video](https://youtu.be/zhBLiMdqOdQ).

In [29]:
print('Build model ...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(Conv1D(filters,
                 kernel_size,
                 padding='valid',
                 activation='relu',
                 strides=1))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

Build model ...


In [32]:
print('Summarize ...')
model.summary()

Summarize ...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, None, 128)         2560000   
_________________________________________________________________
conv1d_3 (Conv1D)            (None, None, 64)          41024     
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, None, 64)          0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 128)               98816     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 129       
Total params: 2,699,969
Trainable params: 2,699,969
Non-trainable params: 0
_________________________________________________________________


In [30]:
print('Compile ...')
model.compile(loss='binary_crossentropy',
              optimizer='adam',
             metrics=['accuracy'])

Compile ...


### Train, Test, and Performance

In [31]:
print('Train ...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=15,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Train ...
Train on 25000 samples, validate on 25000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Test score: 1.0776984479618072
Test accuracy: 0.81656


Investigation ends here.