Stateless RNN:
    1. at each iteration the model start with a hidden state full of zeros
    2. then it updates this state at each time step and after the last time step
    3. it throws it away, as it's not needed anymore
Stateful RNN:
    1. what if we told the RNN to preserve this final state after processing one training batch and use it as the initial state for teh next training batch?
    2. this way the model can learn long-term patterns despite only backpropagation through short sequences
    3. this is calledd Stateful RNN

First:
    1. not that a stateful RNN only makes sense if each input sequence in a batch starts exactly where the correspondign sequence in the previous batch left off
    2. so the first thing we need to do to build a Statful RNN is to use sequential and nonoverlapping input sequences
    3. (rather than the shuffeled and overlapping sequences we used to train stateless RNN)
    4. when creating the Dataset we must therefore use shift=n_steps,(instead shift=1) when calling teh window() method
    5. Moreover, we must obviously not call the shuffle() method
    6. Batching is harder when preparing a dataset for Stateful RNN than it's for a stateless
    7. the first batch would be 1 to 32 the second would be 33 to 64, they are consecutive
    8.t the simplest solution to this porblem is to use 'batches' containing  a single window

In [5]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

In [6]:
# andrej Karpathy, showed how to train an RNN to predict the next character in sentence
# this Char-RNN then can be used to generate novel text, one character at a time

# Stateless RNN
np.random.seed(42)
tf.random.set_seed(42)

# Creating the Training Dataset
shakespeare_url = 'https://homl.info/shakespeare' #shortcut URL
filepath = keras.utils.get_file('shakespeare.txt', shakespeare_url)
with open(filepath) as f:
    shakespeare_text = f.read()
    
"".join(sorted(set(shakespeare_text.lower())))

tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
tokenizer.fit_on_texts(shakespeare_text)

max_id = len(tokenizer.word_index) # number of distinct characters
dataset_size = tokenizer.document_count # total number of characters

[encoded] = np.array(tokenizer.texts_to_sequences([shakespeare_text])) - 1
train_size = dataset_size * 90 // 100
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])

n_steps = 100
window_length = n_steps + 1 # target = input shifted 1 character ahead

In [9]:
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])
dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))
dataset = dataset.batch(1)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

In [10]:
batch_size = 32
encoded_parts = np.array_split(encoded[:train_size], batch_size)
datasets = []
for encoded_part in encoded_parts:
    dataset = tf.data.Dataset.from_tensor_slices(encoded_part)
    dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
    dataset = dataset.flat_map(lambda window: window.batch(window_length))
    datasets.append(dataset)
dataset = tf.data.Dataset.zip(tuple(datasets)).map(lambda *windows: tf.stack(windows))
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map( lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

In [11]:
model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, stateful=True,
                     #dropout=0.2, recurrent_dropout=0.2,
                     dropout=0.2,
                     batch_input_shape=[batch_size, None, max_id]),
    keras.layers.GRU(128, return_sequences=True, stateful=True,
                     #dropout=0.2, recurrent_dropout=0.2),
                     dropout=0.2),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                    activation="softmax"))
])

at the end of each epoch, we need to rest the states before we go back to the begining of the text, for this we need to sue a small callback

In [12]:
class ResetStatesCallback(keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs):
        self.model.reset_states()

and there is only one instance per batch

In [13]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
history = model.fit(dataset, epochs=50, callbacks=[ResetStatesCallback()])

Epoch 1/50
     62/Unknown - 16s 213ms/step - loss: 3.1263

KeyboardInterrupt: 

To use the model with different batch sizes, we need to create a stateless copy. We can get rid of dropout since it is only used during training:

In [None]:
stateless_model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id]),
    keras.layers.GRU(128, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                    activation="softmax"))
])

To set the weights, we first need to build the model (so the weights get created):

In [None]:
stateless_model.build(tf.TensorShape([None, None, max_id]))

stateless_model.set_weights(model.get_weights())
model = stateless_model

tf.random.set_seed(42)

print(complete_text("t"))

Now that we have built a character level model:
    1. it's time to look at word-level models and tackle a common neural language processing task
    2. Sentiment Analysis, 
    3. in this process we will learn how to handle sequences of variables length using masking

In [4]:
import re
import requests
the_idiot_url = 'https://www.gutenberg.org/files/2638/2638-0.txt'

def get_book(url):
    # Sends a http request to get the text from project Gutenberg
    raw = requests.get(url).text
    # Discards the metadata from the beginning of the book
    start = re.search(r"\*\*\* START OF THIS PROJECT GUTENBERG EBOOK .* \*\*\*",raw ).end()
    # Discards the text starting Part 2 of the book
    stop = re.search(r"II", raw).start()
    # Keeps the relevant text
    text = raw[start:stop]
    return text

def preprocess(sentence):
    return re.sub('[^A-Za-z0-9.]+' , ' ', sentence).lower()

book = get_book(the_idiot_url)
processed_book = preprocess(book)
print(processed_book)

AttributeError: 'NoneType' object has no attribute 'end'