# Introduction to LSTMs

In this lesson we will look at how to implement a simple LSTM model that predicts whether we are talking about cats or dogs.

## Defining the dataset

Here we are defining a small toy dataset. `x` is a list of sentences and `y` is a label corresponding to each sentence (i.e. 0 means cat and 1 means dog).

In [1]:
import pandas as pd

x = [
    'i like cats', 'cats are the best', 'felines are evil', 'there is a cat in the patio' , 'i hate cats', 
    'dogs are loyal', 'my dog is my best friend', 'the puppy ate my homework', 'my dog barks at the moon', 'the puppy barked at the mailman'
]
y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

df = pd.DataFrame({'text': x, 'label': y})
df = df.sample(frac=1.0, random_state=100)
df.head(n=10)

Unnamed: 0,label,text
7,1,the puppy ate my homework
6,1,my dog is my best friend
1,0,cats are the best
5,1,dogs are loyal
4,0,i hate cats
2,0,felines are evil
0,0,i like cats
3,0,there is a cat in the patio
9,1,the puppy barked at the mailman
8,1,my dog barks at the moon


In [3]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
import tensorflow.keras.backend as K

K.clear_session()
# Without these lines I get 
# > UnknownError:  [_Derived_]  Fail to find the dnn implementation.
# >  [[{{node CudnnRNN}}]]
# >  [[model/lstm/StatefulPartitionedCall]] [Op:__inference_distributed_function_5819]
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

## Preprocessing text

Here we are fitting a tokenizer to the text. Then we convert the words to IDs and pad the sentences so that their length is constant.

In [4]:
x = df['text'].tolist()
y = df['label'].to_numpy()

n_vocab = len(set(' '.join(x).split(' ')))
tok = Tokenizer(num_words=n_vocab)
tok.fit_on_texts(x)

n_pad = 6
seq_x = tok.texts_to_sequences(x)
pad_x = pad_sequences(seq_x, maxlen=n_pad)
oh_x = to_categorical(pad_x, num_classes=n_vocab)
print(oh_x.shape)

(10, 6, 28)


## Defining the model

We are defining a very simple model that has a single LSTM layer with 50 nodes and one dense layer with a single node.

In [7]:
model = models.Sequential()
model.add(layers.LSTM(50, input_shape=(n_pad, n_vocab)))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 50)                15800     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 51        
Total params: 15,851
Trainable params: 15,851
Non-trainable params: 0
_________________________________________________________________


## Training the model

In [8]:
model.fit(oh_x, y, batch_size=10, epoch)

Train on 10 samples


<tensorflow.python.keras.callbacks.History at 0x20db373ee80>