## **Project: Generating Shakespearean Text Using a Character RNN**

In a famous 2015 blog post titled “The Unreasonable Effectiveness of Recurrent Neural
Networks,” Andrej Karpathy showed how to train an RNN to predict the next
character in a sentence. This **Char-RNN** can then be used to generate novel text, one
character at a time. Here is a small sample of the text generated by a Char-RNN
model after it was trained on all of Shakespeare’s work:
PANDARUS:
Alas, I think he shall be come approached and the day
when little rain would be attain’d into being never fed,
and who is but a chain and subjects of his death,
I should not sleep.

## **Natural Language Processing Processing (NLP)**
A common approach of **NLP** is to use **RNN**. a *character* RNN, trained to predict the next character in a sentence. This will allow us to generate some original text, and in the process we will see how to build a Tensorflow dataset on a very long sequence.



---



### **Building a Char-RNN**
- **(Char-RNN)** character RNN, *trained to predict the next character in a sentence*
- *Trained to generate a text, one characater at a time.*


### **Loading data and preparing dataset**
- import libraries
- load dataset with filepath
- run the  first 100 characters 

In [None]:
# import libraries
import os
import sys
import numpy as np
import sklearn
import tensorflow as tf
from tensorflow import keras
import numpy as np

# plot figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt


In [None]:
# load dataset with filepath
shakespeare_url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"


In [None]:
filepath = keras.utils.get_file("shakespeare.txt", shakespeare_url)
with open(filepath) as f:
    shakespeare_text = f.read()

Downloading data from https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt


In [None]:
# first 100 characters
print(shakespeare_text[:100])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You


Next we encode every characater as an integer using a Keras's Tokenizer class. First we need to **fit a tokenizer to the text**: it will find all the characters used in the text and map each of them to different character ID, **from 1 to the number of distinct character ID** (does not start from 0)
- **fit_on_texts method** encodes words/texts: tokenize a list of sentence (**character ID's from 1 to ->**)

#### **Encode every character as an integer**
- get keras tokenizer class with character-level encoding
- encode character in text method

In [None]:
# keras tokenizer class
tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
# char in text method
tokenizer.fit_on_texts(shakespeare_text)

**Tokenize sequence of the word "First"**
1. text to sequences
2. sequences to text
3. show number of distinct 's ID
4.  show the total number of characters (dataset_size of document_count)
5. Let’s encode the full text so each character is represented by its ID (we subtract 1 to
get IDs from 0 to 38, rather than from 1 to 39)

In [None]:
# text to sequences
tokenizer.texts_to_sequences(["First"])

[[20, 6, 9, 8, 3]]

In [None]:
# sequences to text
tokenizer.sequences_to_texts([[20,6,9,8,3]])

['f i r s t']

In [None]:
# number of distinct chars ID
max_id = len(tokenizer.word_index)
print(max_id)

39


In [None]:
# dataset_size (total number of characters)
dataset_size = tokenizer.document_count
print(dataset_size)

1115394


In [None]:
# encode full text
[encoded] = np.array(tokenizer.texts_to_sequences([shakespeare_text])) - 1
print([encoded])

[array([19,  5,  8, ..., 20, 26, 10])]


### **Split Dataset into Training set and Validation set**
Now back to Shakespeare! Let’s take the first 90% of the text for the training set
(keeping the rest for the validation set and the test set), and create a tf.data.Dataset
that will return each character one by one from this set:
- Train text size to 90%


In [None]:
# train text size to 90%
train_size = dataset_size * 90// 100
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])

### **Split Dataset into Multiple windows**
Use the **dataset's window() method**. long sequence of characters into smaller windows of text. **Truncated backprpagation through time**. Every instance in the dataset
will be a fairly short substring of the whole text, and the RNN will be unrolled
only over the length of these substrings.
- Call the window() method to create a dataset of short text windows
- Use shift=1 to get characters 1 to 101
- Use drop_remainder=True to ensure all windows are 101 characters long(this allow us to create batches without padding)
- Shuffle and batch the windows separate the inputs
- Separate the input(the first 100 chars) from the target(last characs)

In [None]:
# create dataset fo short text windows
n_steps =100
window_length = n_steps + 1 # target input shifted 1 character ahead
dataset = dataset.window(window_length, shift=1, drop_remainder=True)

**The window()** method: creates a dataset that contains windows, each of which is also represented as a **nested dataset.** we cannot use a nested dataset directly for
training, as our model will expect **tensors(vectors an matrices inputs)** as input, not datasets. 

We call the **flat_map()** method it convert a nested dataset into a flat dataset(not containing datasets). It takes
a function as an argument, which allows you to transform each dataset in the nested
dataset before flattening. For example, if you pass the function lambda ds:
ds.batch(2) to flat_map(), then it will transform the nested dataset {{1, 2}, {3, 4, 5,
6}} into the flat dataset {[1, 2], [3, 4], [5, 6]}: it’s a dataset of tensors of size 2. With that
in mind, we are ready to flatten our dataset:
- pass the lambda ds into the **flat method()**
- call the **batch(window_length) method** on each window (all windows have same length)

In [None]:
dataset = dataset.flat_map(lambda window: window.batch(window_length))

### **Linear Regression Review**
**Linear Regression**: is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog)

**Cost function:** The cost function is calculated as an average of loss functions.

**loss function: (how well the network is doing)** is a value which is calculated at every instance. It is a part of the cost function
**Loss:** Neural networks trained using an optimization process that requires a loss function to calculate the model error

    loss = 'categorical_crossentropy' ( ouput probability/how well the network is doing) targets one-hot-coded.
    loss = sparse_categorical_crossentropy targets integer and binary_crossentrophy targets two images



#### **Shuffle and batch windows**
- make output stable
- batch size = 32
- shuffle 10,000 batch sizes
- encode each character using a one-hot vector 
- ad **prefectching:** a mechanism used to pull information out of memory in advance of its use.
- print batch shape and size



In [None]:
# output stable across runs
np.random.seed(42)
tf.random.set_seed(42)# output stable across runs


In [None]:
# batch and shuffle windows
batch_size = 32
dataset = dataset.shuffle(1000).batch(batch_size)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))

In [None]:
# encode each character
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))

In [None]:
# add prefectching (pull infor from memory)
dataset = dataset.prefetch(1)

In [None]:
# batch shape
for X_batch, Y_batch in dataset.take(1):
  print(X_batch.shape, Y_batch.shape)

(32, 100, 39) (32, 100)


### **Build(layers) & Compile neural network with history**
    - **Sequential**: SEQUENCE of layers in the neural network.
    - **Input Shape**: the first convolution.
    - **Dropouts**: They remove a random layer of neurons in your network, thus spedding up network
    - **drop_rate** = 0.5: Dropout after each laer
    - **Dense layer**: feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer

    - **GRU** Gated Recurrent Unit: Rest an update gate
    - **LSTM** Long short Term Memory in RNN architecture. The input and output gate
    - **TimeDistributed** (add a wrapper) Dense applies a same Dense (fully-connected) operation to every timestep of a 3D tensor.


**Compile**
- **Loss** (*output probability*) how good the predictions are. - **Optimizer**,(*optimal values*) generates new predictions
- **loss** = **sparse_categorical_crossentropy** targets integer 

## **Creating and Training the Model**
   -  2 layers keras layer GRU 128 with return seqences, recurrent dropout 0.2
    - TimeDistribution: Dense layer with max_id, activation softmax
    - loss sparse_categorical crossentropy
    - optimizer adam, epochs 5
    
    

In [None]:
model = keras.models.Sequential([
        keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id],
        dropout=0.2, recurrent_dropout=0.2),
        keras.layers.GRU(128, return_sequences=True,
        dropout=0.2, recurrent_dropout=0.2),
        keras.layers.TimeDistributed(keras.layers.Dense(max_id,
        activation="softmax"))
])



model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
history = model.fit(dataset, steps_per_epoch=train_size // batch_size,
                    epochs=5)

### **Using Model to Generate Text**
Now we have a model that can predict the next character in text written by Shakespeare.
To feed it some text, we first need to preprocess it like we did earlier, so let’s
create a little function for this:
- create preprocess texts function
- predict the next letter in some text


In [None]:
# preprocess texts
def preprocess(texts):
  X = np.array(tokenizer.texts_to_sequences(texts)) - 1
  return tf.one_hot(X, max_id)

In [None]:
# predict next letter
X_new = preprocess(["How are yo"])
Y_pred = model.predict_classes(X_new)
tokenizer.sequences_to_texts(Y_pred + 1)[0][-1] # 1st sentence, last char

'u'

We can
pick the next character randomly, with a probability equal to the estimated probability,
using TensorFlow’s tf.random.categorical() function. This will generate more
diverse and interesting text. The categorical() function samples random class indices,
given the class log probabilities (logits). To have more control over the diversity
of the generated text, we can divide the logits by a number called the temperature,
which we can tweak as we wish: a temperature close to 0 will favor the highprobability
characters, while a very high temperature will give all characters an equal
probability. The following next_char() function uses this approach to pick the next
character to add to the input text:
- use the next char() function to pick the next character to add to the input text
- pick next character randomly using tf random categorical() function
- write a function that will repeatedly call the next_char()to get the next character and append it to the given text

In [None]:
# function to pick next character
def next_char(text, temperature=1):
  X_new = preprocess([text])
  y_proba = model.predict(X_new)[0, -1:, :]
  rescaled_logits = tf.math.log(y_proba) / temperature
  char_id = tf.random.categorical(rescaled_logits, num_samples=1) + 1
  return tokenizer.sequences_to_texts(char_id.numpy())[0]

In [None]:
# function to call the nex_char() function
def complete_text(text, n_chars=50, temperature=1):
  for _ in range(n_chars):
    text += next_char(text, temperature)
  return text

**We are now ready to generate some test**
- print complete_text 't' temperature=0.2
- print complete_text('w' temperature=1
- prnt complete_text 'w' temperature=2

In [None]:
print(complete_text("t", temperature=0.2))

the death,
and you mayor me with the consure of the


In [None]:
print(complete_text("w", temperature=1))

ward seart.

rovell
of traitorous his drazentpent m


In [None]:
print(complete_text("w", temperature=2))

wel hase?

?gloucester:
valg deain rulk-your hisnda


#### **Summary**
From the results above, it appears that our texts works best with temperatures close to 1.To generate
more convincing text, you could try using more GRU layers and more neurons per
layer, train for longer, and add some regularization (for example, you could set recur
rent_dropout=0.3 in the GRU layers). Moreover, the model is currently incapable of
learning patterns longer than n_steps, which is just 100 characters.