# Introduction: Recurrent Neural Network Quickstart

The purpose of this notebook is to serve as a rapid introduction to recurrent neural networks. All of the details can be found in `Deep Dive into Recurrent Neural Networks` while this notebook focuses on using the pre-trained network.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from IPython.core.interactiveshell import InteractiveShell
from IPython.display import HTML

InteractiveShell.ast_node_interactivity = 'all'

import warnings
warnings.filterwarnings('ignore', category = RuntimeWarning)
warnings.filterwarnings('ignore', category = UserWarning)

import pandas as pd
import numpy as np
from utils import get_data, generate_output, guess_human, seed_sequence, get_embeddings, find_closest

Using TensorFlow backend.


# Fetch Training Data

* Using patent abstracts from patent search for neural network
* 3000+ patents total


In [3]:
data = pd.read_csv('../data/neural_network_patent_query.csv')
data.head()

Unnamed: 0,patent_abstract,patent_date,patent_number,patent_title
0,""" A """"Barometer"""" Neuron enhances stability in...",1996-07-09,5535303,"""""""Barometer"""" neuron for a neural network"""
1,""" This invention is a novel high-speed neural ...",1993-10-19,5255349,"""Electronic neural network for solving """"trave..."
2,An optical information processor for use as a ...,1995-01-17,5383042,3 layer liquid crystal neural network with out...
3,A method and system for intelligent control of...,2001-01-02,6169981,3-brain architecture for an intelligent decisi...
4,A method and system for intelligent control of...,2003-06-17,6581048,3-brain architecture for an intelligent decisi...


In [4]:
training_dict, word_idx, idx_word, sequences = get_data('../data/neural_network_patent_query.csv', training_len = 50)

There are 16192 unique words.
There are 318563 sequences.


* Sequences of text are represented as integers
    * `word_idx` maps words to integers
    * `idx_word` maps integers to words
* Features are integer sequences of length 50
* Label is next word in sequence
* Labels are one-hot encoded

In [5]:
training_dict['X_train'][:2]
training_dict['y_train'][:2]

array([[  117,     7,   141,   277,     4,    18,    81,   110,    10,
          219,    29,     1,   952,  2453,    19,     5,     6,     1,
          117,    10,   182,  2166,    21,     1,    81,   178,     4,
           13,   117,   894,    14,  6163,     7,   302,     1,     9,
            8,    29,    33,    23,    74,   428,     7,   692,     1,
           81,   183,     4,    13,   117],
       [    6,    41,     2,    87,     3,  1340,    79,     7,     1,
          409,   543,    22,   484,     6,     2,  2113,   728,    24,
            1,   178,     3,     1,  1820,    55,    14, 13942,  7240,
          244,     5,    14, 13943,  7240,   244,     5,     2,  2113,
         7240,   244,     5,     2,    38,  9292,   244,     2,    49,
         9292,   244,    14,    22, 13944]])

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int8)

In [6]:
for i, sequence in enumerate(training_dict['X_train'][:2]):
    text = []
    for idx in sequence:
        text.append(idx_word[idx])
        
    print('Features: ' + ' '.join(text) + '\n')
    print('Label: ' + idx_word[np.argmax(training_dict['y_train'][i])] + '\n')
    

Features: user to provide samples . A recognition operation is performed on the user's handwritten input , and the user is not satisfied with the recognition result . The user selects an option to train the neural network on one or more characters to improve the recognition results . The user

Label: is

Features: and includes a number of amplifiers corresponding to the N bit output sum and a carry generation from the result of the adding process an augend input-synapse group , an addend input-synapse group , a carry input-synapse group , a first bias-synapse group a second bias-synapse group an output feedback-synapse

Label: group



# Make Recurrent Neural Network

* Embedding dimension = 100
* 64 LSTM cells in one layer
    * Dropout and recurrent dropout for regularization
* Fully connected layer with 64 units on top of LSTM
     * 'relu' activation
* Drop out for regularization
* Output layer produces prediction for each word
    * 'softmax' activation
* Adam optimizer with defaults
* Categorical cross entropy loss
* Monitor accuracy

In [7]:
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dense, Dropout, Embedding, Masking, Bidirectional
from keras.optimizers import Adam

from keras.utils import plot_model

In [8]:
model = Sequential()

# Embedding layer
model.add(
    Embedding(
        input_dim=len(word_idx) + 1,
        output_dim=100,
        weights=None,
        trainable=True))

# Recurrent layer
model.add(
    LSTM(
        64, return_sequences=False, dropout=0.1,
        recurrent_dropout=0.1))

# Fully connected layer
model.add(Dense(64, activation='relu'))

# Dropout for regularization
model.add(Dropout(0.5))

# Output layer
model.add(Dense(len(word_idx) + 1, activation='softmax'))

# Compile the model
model.compile(
    optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 100)         1619200   
_________________________________________________________________
lstm_1 (LSTM)                (None, 64)                42240     
_________________________________________________________________
dense_1 (Dense)              (None, 32)                2080      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 16192)             534336    
Total params: 2,197,856
Trainable params: 2,197,856
Non-trainable params: 0
_________________________________________________________________


## Load in Pre-Trained Model

Rather than waiting several hours to train the model, we can load in a model trained for 150 epochs. We'll demonstrate how to train this model for another 5 epochs which shouldn't take too long depending on your hardware.

In [10]:
from keras.models import load_model

# Load in model and demonstrate training
model = load_model('../models/train-embeddings-rnn.h5')
h = model.fit(training_dict['X_train'], training_dict['y_train'], epochs = 5, batch_size = 2048, 
          validation_data = (training_dict['X_valid'], training_dict['y_valid']), 
          verbose = 1)

Train on 222994 samples, validate on 95569 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [11]:
model = load_model('../models/train-embeddings-rnn.h5')
print('Model Performance: Log Loss and Accuracy on training data')
model.evaluate(training_dict['X_train'], training_dict['y_train'], batch_size = 2048)

print('\nModel Performance: Log Loss and Accuracy on validation data')
model.evaluate(training_dict['X_valid'], training_dict['y_valid'], batch_size = 2048)

Model Performance: Log Loss and Accuracy on training data


[3.282083551313851, 0.33844408377189383]


Model Performance: Log Loss and Accuracy on validation data


[4.737925765920241, 0.2671891513580688]

There is a minor amount of overfitting on the training data but it's not major. Using regularization in both the LSTM layer and after the fully dense layer can help to combat the prevalent issue of overfitting.

# Generate Output

We can use the fully trained model to generate output by starting it off with a seed sequence. The `diversity` controls the amount of stochasticity in the predictions: the next word predicted is selected based on the probabilities of the predictions.

In [13]:
for i in generate_output(model, sequences, idx_word, seed_length = 50, new_words = 30, diversity = 0.75):
    HTML(i)

In [15]:
for i in generate_output(model, sequences, idx_word, seed_length = 30, new_words = 30, diversity = 1.5):
    HTML(i)

Too high of a diversity and the output will be nearly random. Too low of a diversity and the model can get stuck outputting loops of text.

## Start the network with own input

Here you can input your own starting sequence for the network. The network will produce `num_words` of text.

In [16]:
s = 'This patent provides a basis for using a recurrent neural network to '
HTML(seed_sequence(model, s, word_idx, idx_word, diversity = 0.75, num_words = 20))

In [17]:
s = 'The cell state is passed along from one time step to another allowing the '
HTML(seed_sequence(model, s, word_idx, idx_word, diversity = 0.75, num_words = 20))

# Guess if Output is from network or human

The next function plays a simple game: is the output from a human or the network? Two of the choices are computer generated while the third is the actual ending but the order is randomized. Try to see if you can discern the differences! 

In [18]:
guess_human(model, sequences, idx_word)

Seed Sequence: unit includes a capacitor for storing a synapse load value information in a form of electric charges, and a refresh control circuit for remedying the change in the amount of the electric charges stored in the capacitor. The refresh control circuit includes a comparator for comparing a potential


Option 1 < --- > at the neuron. A signal in the weight values is calculated during a first insulating voltage signal, and a first phototransistor on the liquid crystal sandwiched. The processing unit of the transistor is a threshold value. The

Option 2 < --- > on a random amount of the temperature and a circuit. The charging the output voltage is applied so that the electrical output of the photosensitive body is obtained by the potential of the floating gate, and the circuit for

Option 3 < --- > at an electrode of the capacitor and a reference potential, and a drive circuit responsive to the comparator for recovering the electric charges of the capacitor through charge pumpin

In [19]:
guess_human(model, sequences, idx_word)

Seed Sequence: ##EQU1## where.alpha. is the coefficient, u.sub.(i) is the value which results from converting the input analog video image into the binary value, and P.sub.(i,j) is the value obtained from the function and g.sub.(i) is the value which is obtained


Option 1 < --- > from the function and the input analog video image, it is possible to convert the video image into the binary value

Option 2 < --- > between the premise of the premise of the neural network to generate a gain of the input data. Additionally, an

Option 3 < --- > of the output of the neural network together, performing a logic unit processing device. The output for the control region


Enter option you think is human (1-3): 2


***Incorrect***

------------------------------------------------------------
Correct Ordering:  ['human', 'computer0', 'computer1']
Diversity 0.95


In [20]:
guess_human(model, sequences, idx_word)

Seed Sequence: field effect transistors of different widths, by subtracting a plurality of differential signal components from an opposite most significant component, or by subtracting one half of a differential signal component from the opposite next most significant component. The neuron may provide binary sign selection and digit selection


Option 1 < --- > signal using a discrete learning period of distinguishing signals.

Option 2 < --- > of a drain to active discharge, such as a

Option 3 < --- > by switching input and reference signals at each synapse.


Enter option you think is human (1-3): 3


***Correct***

------------------------------------------------------------
Ordering:  ['computer0', 'computer1', 'human']
Diversity 0.94


# Inspect Embeddings

As a final piece of model inspection, we can look at the embeddings and find the words closest to a query word in the embedding space. This gives us an idea of what the network has learned.

In [21]:
embeddings = get_embeddings(model)
embeddings.shape

(16192, 100)

Each word in the vocabulary is now represented as a 100-dimensional vector. This could be reduced to 2 or 3 dimensions for visualization. It can also be used to find the closest word to a query word.

In [22]:
find_closest('network', embeddings, word_idx, idx_word)

Query: network

Word: network         Cosine Similarity: 1.0
Word: channel         Cosine Similarity: 0.7754999995231628
Word: networks        Cosine Similarity: 0.7745000123977661
Word: system          Cosine Similarity: 0.7559999823570251
Word: program         Cosine Similarity: 0.7541999816894531
Word: cable           Cosine Similarity: 0.7419999837875366
Word: now             Cosine Similarity: 0.7297999858856201
Word: programming     Cosine Similarity: 0.7179999947547913
Word: web             Cosine Similarity: 0.7138000130653381
Word: line            Cosine Similarity: 0.6915000081062317


A word should have a cosine similarity of 1.0 with itself! The embeddings are learned for a task, so the nearest words may only make sense in the context of the patents on which we trained the network.

In [23]:
find_closest('data', embeddings, word_idx, idx_word)

Query: data

Word: data            Cosine Similarity: 1.0
Word: information     Cosine Similarity: 0.8185999989509583
Word: numbers         Cosine Similarity: 0.683899998664856
Word: database        Cosine Similarity: 0.6776000261306763
Word: account         Cosine Similarity: 0.6575999855995178
Word: report          Cosine Similarity: 0.6575999855995178
Word: signals         Cosine Similarity: 0.6399999856948853
Word: system          Cosine Similarity: 0.6377000212669373
Word: statistics      Cosine Similarity: 0.6371999979019165
Word: web             Cosine Similarity: 0.6359000205993652


It seems the network has learned some basic relationships between words! 

# Conclusions

In this notebook, we saw a rapid introduction to recurrent neural networks. The full details can be found in `Deep Dive into Recurrent Neural Networks`. Recurrent neural networks are a powerful tool for natural language processing because of their ability to keep in mind an entire input sequence as they process one word at a time. This makes them applicable to sequence learning tasks where the order of the inputs matter and there can be long-term dependencies in the input sequences. 