# Preface

In this notebook, we introduce a simple application of recurrent neural networks for sequence generation.

The implementation here is based on https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py

In [None]:
import numpy as np
import tensorflow as tf

# Load Dataset

We are going to load the Shakespeare dataset

In [None]:
import io

In [None]:
path = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
with io.open(path, encoding='utf-8') as f:
    text = f.read()
print('corpus length:', len(text))

Let us take a look at the data

In [None]:
print(text)

Now, we need encode these text data into things that a neural network can process.
To do this, we simply check how many different characters there are, and then use integers to encode them. The `char_to_int` dictionary defined below does this. Conversely, `int_to_char` converts them back.

In [None]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = {c: i for i, c in enumerate(chars)}
indices_char = {i: c for i, c in enumerate(chars)}

In [None]:
print(chars)

Let us now prepare the training data.

We will need the following format
  * x_train should be [NUM_SAMPLES, SEQUENCE_LENGTH, 1]
  * y_train should be [NUM_SAMPLES, 1] which is converted to [NUM_SAMPLES, NUM_VOCAB] via one-hot encoding

In [None]:
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print(f'num sequences: {len(sentences)}')

In [None]:
sentences[:10]

Now, vectorize them into inputs and outputs

In [None]:
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Check that each x contains 40 characters (encoded into one-hot) and each y is the next character

In [None]:
''.join([indices_char[u] for u in x[0, :, :].argmax(-1)])

In [None]:
indices_char[y[0, :].argmax()]

# Building a RNN model (LSTM) for Text Generation

We will use a LSTM model as the main block. The LSTM (Long Short Term Memory) is an widely used variant of the classical RNN that is especially suited for natural language processing. You can read more about the LSTM and related architectures [here](https://en.wikipedia.org/wiki/Long_short-term_memory).

We will use the `keras` [sequential model API](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.optimizers import RMSprop

In [None]:
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars)))) 
model.add(Dense(len(chars), activation='softmax'))
optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
model.summary()

## Train Model

Just like CNNs, the RNN model can be trained by simply calling the `fit` method. In this case we are going to have a very long training process, so we will use the `ModelCheckpoint` as a *callback*. This allows us to periodically save the weights we obtained from training to perform analysis (say, on a separate notebook) or to restore to an earlier result. 

For large training tasks, this is always a good practice. In fact, you can even observe the training graphically as it proceeds using [Tensorboard callback](https://keras.io/callbacks/#tensorboard) functionalities.

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint, LambdaCallback
import random
import sys

In [None]:
savepath='saved_weights-{epoch:02d}-{loss:.4f}.h5'
checkpoint = ModelCheckpoint(savepath, monitor='loss', verbose=1, save_best_only=True, mode='min')

Since we want to see what text is printed, we want another callback that prints texts

In [None]:
rnn_history = model.fit(x, y, epochs=100, batch_size=128, callbacks=[checkpoint])

We can also plot the training history...

In [None]:
# plt.plot(rnn_history.history['loss'])

# Generate Text Using our Trained Model

We now see how we can generate text using our trained model. 

First, we load the saved weights (with best performance) into our model.

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
model.load_weights('rnn_shakespeare_weights.h5')

We give a starting text, and ask the trained model to start writing from here!

In [None]:
def sample(preds, temperature=1.0):
    """
    Helper function to sample an index from a probability array
    
    Parameters:
        preds: numpy array of predicted probabilities
        temperature: controls the diversity when picking from preds
    """
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

def generate_text(diversity=0.2, sentence=None, start_index=None, length=400):
    """
    Generate text using trained model
    
    Parameters:
        diversity: controls randomness of texts, higher = more variety
        sentence: starting sentence as seed
        start_index: starting index in text as seed
    """
    print(f'----- diversity: {diversity}')

    generated = ''
    if start_index is None:
        start_index = np.random.randint(0, len(text)-maxlen)
    if sentence is None:
        sentence = text[start_index: start_index + maxlen]
    else:
        assert len(sentence) > maxlen, f'Need at least {maxlen} characters to start'
        sentence = sentence[:maxlen]
    generated += sentence
    print(f'----- Generating with seed: \n  "{sentence}" \n')
    sys.stdout.write(generated)

    for i in range(length):
        x_pred = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(sentence):
            x_pred[0, t, char_indices[char]] = 1.

        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, diversity)
        next_char = indices_char[next_index]

        sentence = sentence[1:] + next_char

        sys.stdout.write(next_char)
        sys.stdout.flush()
    print()

In [None]:
generate_text(
    diversity=0.9,
    sentence=' If music be the food of love, play on.  ',
    length=500,
)

## GPT-3, 175Billion paramters

try it at https://openai.com/api/