# Instructions

1. Go to https://colab.research.google.com and choose the \"Upload\" option to upload this notebook file.
1. In the Edit menu, choose \"Notebook Settings\" and then set the \"Hardware Accelerator\" dropdown to GPU.
1. Read through the code in the following sections:
  * [IMDB Dataset](#scrollTo=mXcb24B6a03_)
  * [Define model](#scrollTo=kAz68ipVa05_)
  * [Train model](#scrollTo=kIynp1v_a06Y)
  * [Assess model](#scrollTo=ALyNCqx4a06r)
1. Complete at least one of these exercises. Remember to keep notes about what you do!
  * [Exercise Option #1 - Standard Difficulty](#scrollTo=_9dsjJwya06_)
  * [Exercise Option #2 - Advanced Difficulty](#scrollTo=nyZbljLAa09z)

## Documentation/Sources
* [Class Notes](https://jennselby.github.io/MachineLearningCourseNotes/#recurrent-neural-networks)
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

In [1]:
# upgrade tensorflow to tensorflow 2
%tensorflow_version 2.x
# display matplotlib plots
%matplotlib inline
from tensorflow import test
from tensorflow import device

# IMDB Dataset
The [IMDB dataset](https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification) consists of movie reviews (x_train) that have been marked as positive or negative (y_train). See the [Word Vectors Tutorial](https://github.com/jennselby/MachineLearningTutorials/blob/master/WordVectors.ipynb) for more details on the IMDB dataset.

In [2]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

In [3]:
(imdb_x_train, imdb_y_train), (imdb_x_test, imdb_y_test) = imdb.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


For a standard keras model, every input has to be the same length, so we need to set some length after which we will cutoff the rest of the review. (We will also need to pad the shorter reviews with zeros to make them the same length).

In [4]:
cutoff = 500
imdb_x_train_padded = sequence.pad_sequences(imdb_x_train, maxlen=cutoff)
imdb_x_test_padded = sequence.pad_sequences(imdb_x_test, maxlen=cutoff)

 # see https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
imdb_index_offset = 3

In [5]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Define model

Unlike last time, when we used convolutional layers, we're going to use an LSTM, a special type of recurrent network.

Using recurrent networks means that rather than seeing these reviews as one input happening all at once, with the convolutional layers taking into account which words are next to each other, we are going to see them as a sequence of inputs, with one word occurring at each timestep.

In [6]:
imdb_lstm_model = Sequential()
imdb_lstm_model.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
imdb_lstm_model.add(LSTM(units=32, return_sequences=True))
imdb_lstm_model.add(LSTM(units=32))
imdb_lstm_model.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
imdb_lstm_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


# Train model

In [7]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_lstm_model.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



# Assess model

In [8]:
with device('/device:GPU:0'):
  imdb_lstm_scores = imdb_lstm_model.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_lstm_scores))

loss: 0.3066539764404297 accuracy: 0.8712800145149231


# Exercise Option #1 - Standard Difficulty

Experiment with different model configurations from the one above. Try other recurrent layers, different numbers of layers, change some of the defaults. See [Keras Recurrent Layers](https://keras.io/layers/recurrent/)

__Keep notes on what you try and what results you get.__

87% is already a high bar to beat, but I'll begin by trying adding another LSTM layer.

In [9]:
imdb_model2 = Sequential()
imdb_model2.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
imdb_model2.add(LSTM(units=32, return_sequences=True))
imdb_model2.add(LSTM(units=32, return_sequences=True))
imdb_model2.add(LSTM(units=32))
imdb_model2.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
imdb_model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [10]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_model2.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



In [12]:
with device('/device:GPU:0'):
  imdb_scores2 = imdb_model2.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_scores2))

loss: 0.3453221023082733 accuracy: 0.8535199761390686


This seems to give around the same accuracy, if not slightly worse. For the next model I tried adding a dense layer after the LSTM layers to see if they might help filtering the results of the LSTM better than a single layer.

In [13]:
imdb_model3 = Sequential()
imdb_model3.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
imdb_model3.add(LSTM(units=32, return_sequences=True))
imdb_model3.add(LSTM(units=32))
imdb_model3.add(Dense(units=64))
imdb_model3.add(Dense(units=1, activation='sigmoid'))
imdb_model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [14]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_model3.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



In [15]:
with device('/device:GPU:0'):
  imdb_scores3 = imdb_model3.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_scores3))

loss: 0.3084729015827179 accuracy: 0.8751199841499329


This gave a very similar accuracy. Then I tried making a wider model, to add more LSTM units, thinking that giving more units might help.

In [16]:
imdb_model4 = Sequential()
imdb_model4.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
imdb_model4.add(LSTM(units=64, return_sequences=True))
imdb_model4.add(LSTM(units=64))
imdb_model4.add(Dense(units=1, activation='sigmoid'))
imdb_model4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [17]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_model4.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



In [18]:
with device('/device:GPU:0'):
  imdb_scores4 = imdb_model4.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_scores4))

loss: 0.3413776457309723 accuracy: 0.8623999953269958


This ended up not helping much either. Lastly, I thought the model might just be overcomplicating, so I lowered the unit number and added a small dense layer.

In [19]:
imdb_model5 = Sequential()
imdb_model5.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
imdb_model5.add(LSTM(units=16, return_sequences=True))
imdb_model5.add(LSTM(units=16))
imdb_model3.add(Dense(units=16))
imdb_model5.add(Dense(units=1, activation='sigmoid'))
imdb_model5.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [20]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_model5.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



In [21]:
with device('/device:GPU:0'):
  imdb_scores5 = imdb_model5.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_scores5))

loss: 0.3122839331626892 accuracy: 0.8696399927139282


Again, this also did not improve the model much. I've tried a lot of changes but each one has had minimal effect on the accuracy of the model, which makes me think that either this dataset has somewhat of an upper bound for accuracy, at least for LSTMs.

In [39]:
imdb_model5.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 500, 100)          8858700   
_________________________________________________________________
lstm_9 (LSTM)                (None, 500, 16)           7488      
_________________________________________________________________
lstm_10 (LSTM)               (None, 16)                2112      
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 17        
Total params: 8,868,317
Trainable params: 8,868,317
Non-trainable params: 0
_________________________________________________________________


# Exercise Option #2 - Advanced Difficulty

Set up your own RNN model for the Reuters Classification Problem

Take the model from exercise 1 (imdb_lstm_model) and modify it to classify the [Reuters data](https://keras.io/datasets/#reuters-newswire-topics-classification).

Think about what you are trying to predict in this case, and how you will have to change your model to deal with this.

In [22]:
from tensorflow.keras.datasets import reuters
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [23]:
(reuters_x_train, reuters_y_train), (reuters_x_test, reuters_y_test) = reuters.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [24]:
from keras.utils import to_categorical

In [26]:
y_onehot_train = to_categorical(reuters_y_train)

In [27]:
imdb_offset = 3
reuters_map = dict((index + imdb_offset, word) for (word, index) in reuters.get_word_index().items())
reuters_map[0] = 'PADDING'
reuters_map[1] = 'START'
reuters_map[2] = 'UNKNOWN'

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json


In [29]:
' '.join([reuters_map[word_index] for word_index in reuters_x_train[0]])

'START mcgrath rentcorp said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3'

In [32]:
x_train_padded = sequence.pad_sequences(reuters_x_train, maxlen=cutoff)
x_test_padded = sequence.pad_sequences(reuters_x_test, maxlen=cutoff)

In [33]:
reuters_model = Sequential()
reuters_model.add(Embedding(input_dim=len(reuters_map), output_dim=100, input_length=cutoff))
reuters_model.add(LSTM(units=32, return_sequences=True))
reuters_model.add(LSTM(units=32))
reuters_model.add(Dense(units=64, activation='relu'))
reuters_model.add(Dense(units=46, activation='softmax'))
reuters_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [34]:
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  reuters_model.fit(x_train_padded, y_onehot_train, epochs=1, batch_size=64)



In [36]:
y_onehot_test = to_categorical(reuters_y_test)

In [37]:
with device('/device:GPU:0'):
  reuters_scores = reuters_model.evaluate(x_test_padded, y_onehot_test)
print('loss: {} accuracy: {}'.format(*reuters_scores))

loss: 0.07233385741710663 accuracy: 0.9782605171203613


In [38]:
reuters_model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 500, 100)          3098200   
_________________________________________________________________
lstm_11 (LSTM)               (None, 500, 32)           17024     
_________________________________________________________________
lstm_12 (LSTM)               (None, 32)                8320      
_________________________________________________________________
dense_7 (Dense)              (None, 64)                2112      
_________________________________________________________________
dense_8 (Dense)              (None, 46)                2990      
Total params: 3,128,646
Trainable params: 3,128,646
Non-trainable params: 0
_________________________________________________________________


Given that this is nearly exactly the same accuracy as what I got for the CNN word embedding models in the previous project (0.9782605171203613), and a high one, I wonder if maybe there's a strict upper bound to how accurate the algorithm can be in classifying these reuters articles, and whether we've hit that.