<a href="https://colab.research.google.com/github/unniths/Text-Generator-RNN/blob/master/Text_generation_with_an_RNN_LSTM_with_comments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Shiva Unnithan
# Text generation with an RNN

In [0]:
import tensorflow as tf
import numpy as np
import os
import time

In [0]:
path_to_file = tf.keras.utils.get_file('wonderland.txt', 'https://raw.githubusercontent.com/unniths/Text-Generator-RNN/master/wonderland.txt') # instead of using the shakespeare source, I'm using a cleaned up version of 
                                                                                                                                               # Alice in Wonderland that had everything but its text removed. 

In [3]:
# Read, then decode for py2 compat
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} character'.format(len(text)))

Length of text: 143552 character


In [4]:
print(text[:1000]) # printing the first 1000 characters

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, 'and what is the use of a book,' thought Alice 'without pictures or
conversations?'

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure
of making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.

There was nothing so VERY remarkable in that; nor did Alice think it so
VERY much out of the way to hear the Rabbit say to itself, 'Oh dear!
Oh dear! I shall be late!' (when she thought it over afterwards, it
occurred to her that she ought to have wondered at this, but at the time
it all seemed quite natural); but when the Rabbit actually TOOK A WATCH
OUT OF ITS WAISTCOAT-POCKET, and looked at it, and 

In [5]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))
#print(len(vocab))

67 unique characters


In [0]:
# Creating a mapping from unique characters to indices 
char2idx = {u:i for i, u in enumerate(vocab)} # look up table to map characters to numbers
idx2char = np.array(vocab) 
text_as_int = np.array([char2idx[c] for c in text])

In [7]:
print('{')
for char,_ in zip(char2idx, range(67)): # printing the range of all unique characters just to visualize what each character is represented with
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char])) #formatting to see which character represents which integer
print('  ...\n}')

{
  '\n':   0,
  ' ' :   1,
  '!' :   2,
  '"' :   3,
  "'" :   4,
  '(' :   5,
  ')' :   6,
  ',' :   7,
  '-' :   8,
  '.' :   9,
  ':' :  10,
  ';' :  11,
  '?' :  12,
  'A' :  13,
  'B' :  14,
  'C' :  15,
  'D' :  16,
  'E' :  17,
  'F' :  18,
  'G' :  19,
  'H' :  20,
  'I' :  21,
  'J' :  22,
  'K' :  23,
  'L' :  24,
  'M' :  25,
  'N' :  26,
  'O' :  27,
  'P' :  28,
  'Q' :  29,
  'R' :  30,
  'S' :  31,
  'T' :  32,
  'U' :  33,
  'V' :  34,
  'W' :  35,
  'Y' :  36,
  'Z' :  37,
  '[' :  38,
  ']' :  39,
  '_' :  40,
  'a' :  41,
  'b' :  42,
  'c' :  43,
  'd' :  44,
  'e' :  45,
  'f' :  46,
  'g' :  47,
  'h' :  48,
  'i' :  49,
  'j' :  50,
  'k' :  51,
  'l' :  52,
  'm' :  53,
  'n' :  54,
  'o' :  55,
  'p' :  56,
  'q' :  57,
  'r' :  58,
  's' :  59,
  't' :  60,
  'u' :  61,
  'v' :  62,
  'w' :  63,
  'x' :  64,
  'y' :  65,
  'z' :  66,
  ...
}


In [8]:
print('{} ---- characters mapped to int ----> {}'.format(repr(text[:13]), text_as_int[:13])) # Quick example of using the first 13 characters turned into 13 corresponding integers

'Alice was beg' ---- characters mapped to int ----> [13 52 49 43 45  1 63 41 59  1 42 45 47]


In [9]:
# The maximum length sentence we want for a single input in characters
seq_length = 100 # length of text measured by characters
# examples_per_epoch = len(text)//(seq_length+1) 
# dataX = []
# dataY = []
# Create training examples / target
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int) # Converting the text vector into a stream of character indices

for i in char_dataset.take(5): #take the first 5 characters from the dataset
    print(idx2char[i.numpy()]) #print the first five indices from the array using idx2char

A
l
i
c
e


In [10]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True) # in dataset, batch METHOD converts individual characters to sequences or chunks

for item in sequences.take(5): # 5 is the desired length for the sequence
    print(repr(''.join(idx2char[item.numpy()]))) 

'Alice was beginning to get very tired of sitting by her sister on the\nbank, and of having nothing to '
'do: once or twice she had peeped into the\nbook her sister was reading, but it had no pictures or conv'
"ersations in\nit, 'and what is the use of a book,' thought Alice 'without pictures or\nconversations?'\n"
'\nSo she was considering in her own mind (as well as she could, for the\nhot day made her feel very sle'
'epy and stupid), whether the pleasure\nof making a daisy-chain would be worth the trouble of getting u'


In [0]:
def split_input_target(chunk): 
    input_text = chunk[:-1] # seq_length -> input
    target_text = chunk[1:] # seq_length+1 -> target
    return input_text, target_text # the values of -1 and 1 have to do with the tanh squashing function which is used in RNN.

dataset = sequences.map(split_input_target) # using map function to apply this method to each batch. We are duplicating and shifting once to the right to create input/target for each batch. 

In [12]:
# This prints the example input value and target value, which also shows the way the data works. Input has the A in the beginning of the sentence but no "?", while target has no A but does have the "?" since it is shifted once.
for input_example, target_example in dataset.take(1): 
    print('Input data: ', repr(''. join(idx2char[input_example.numpy()])))
    print('# of Characters: {} '.format(len(idx2char[input_example.numpy()])))
    print('Target data: ', repr(''.join(idx2char[target_example.numpy()]))) 
    print('# of Characters: {} '.format(len(idx2char[target_example.numpy()])))
# added the extra print statements just to show the same amount of characters are being used that was defined before in seq_length

Input data:  'Alice was beginning to get very tired of sitting by her sister on the\nbank, and of having nothing to'
# of Characters: 100 
Target data:  'lice was beginning to get very tired of sitting by her sister on the\nbank, and of having nothing to '
# of Characters: 100 


In [13]:
# This part is showing how the model is predicting the next character. So the model first gets index 13 (A), and expects the index 52 (l) afterwards. 
# When it puts the input of l, RNN means it would remember the previous results and continue with predictions.
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 13 ('A')
  expected output: 52 ('l')
Step    1
  input: 52 ('l')
  expected output: 49 ('i')
Step    2
  input: 49 ('i')
  expected output: 43 ('c')
Step    3
  input: 43 ('c')
  expected output: 45 ('e')
Step    4
  input: 45 ('e')
  expected output: 1 (' ')


## Create Training Batches
https://www.tensorflow.org/tutorials/text/text_generation#create_training_batches

In [14]:
# Batch size means how big a batch of data that was split from the text source will be
BATCH_SIZE = 64

# Buffer size to shuffle the dataset 
# (TF data is design to work with possibly infinite sequences, so it doesn't attempt to shuffle the entire sequence in memory.
# Instead, it maintains a buffer in which is shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## Build The Model
https://www.tensorflow.org/tutorials/text/text_generation#build_the_model

In [15]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension 
embedding_dim = 256

# Number of RNN units 
rnn_units = 1024
print(vocab_size)

67


In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([ # Sequential is used to group a linear stack of layers. Used since all the layers have a single input and produce a single output. 
        tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]), # Embedding is the input layer. A trainable lookup table that will map numbers of each char to a vector using embedding_dim dimensions.
        tf.keras.layers.LSTM(rnn_units, # SWITCHING TO LSTM LAYER INSTEAD, GRU IS A TYPE OF RNN WITH A SIZE TO MAKE IT MORE ACCURATE
                           return_sequences=True,
                           stateful=True,
                           recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size) # Output layer, output are determined by vocab_size
    ])
    return model

## **Why I switched from GRU to LSTM**
While GRU has two gates (the reset and update gates), LSTM uses three gates (input, output, and forget). In this case, LSTMs are better for longer training data but do train slower. This results in where LSTMs remember longer sequences thus having more accurate predictions in the longer run. Both of these are used in order to fix the issue that RNN models have which is the Short Term memory that the model presents. Without either of these units, the model would be producing sentences that do not make sense in the context of the language it is being written in.

An example that I would use to explain this using the sentence "Shiva saw Zhao." Without LSTM or GRU, the model would make sentences that make sense gramatically but don't make sense in the context of the text such as: "Shiva saw Doug." or "Zhao saw." or even "Zhao.". 

What LSTM does can be explained better in the [YouTube tutorial by Brandon Rohrer](https://www.youtube.com/watch?v=WCUNPb-5EYI) that I used in order to better understand LSTM, but the basics is that for every word (or character in our model's case), it passes through a memory of every word it has taken before to see if the word that is going to be outputted makes sense. Every possible word choice is being shaved down more and more until it finds an output that makes sense as a prediction. However, a part of that prediction is then inputted next time with the NEXT input and goes through each gate as a memory to make sure we don't get repeated words nor a word that doesn't make sense in the context of what we've had before. 

In [0]:
model = build_model(
    vocab_size = len(vocab), # 65
    embedding_dim=embedding_dim, #using embedding dm which we declared as 256
    rnn_units=rnn_units, # using rnn_unnits which was declared at 1024 
    batch_size=BATCH_SIZE) # using BATCH_SIZE which was declared as 64

## Try the Model
https://www.tensorflow.org/tutorials/text/text_generation#try_the_model

In [18]:
#Testing the model
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 67) # (batch_size, sequence_length, vocab_size)


In [19]:
model.summary() # summary to show every individual layer of the model

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           17152     
_________________________________________________________________
lstm (LSTM)                  (64, None, 1024)          5246976   
_________________________________________________________________
dense (Dense)                (64, None, 67)            68675     
Total params: 5,332,803
Trainable params: 5,332,803
Non-trainable params: 0
_________________________________________________________________


In [0]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1) # First examle in the batch
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy() # removing a specific axis of size (-1) 

In [21]:
sampled_indices

array([32, 60,  7, 31, 55, 30, 11, 14, 49, 30, 51, 47, 39, 10, 57, 55, 18,
       44,  0, 59, 58, 24, 59, 20, 30, 55,  5, 38,  6, 35, 13, 39, 50, 32,
       14, 32, 11,  2, 43, 26,  8,  9,  0, 51, 22, 49,  9, 10, 27, 59, 22,
       18,  5, 46, 28,  5, 60, 59, 10,  4, 27, 25, 59, 12,  7, 57, 52, 15,
       54, 16, 40, 24, 29,  1, 54, 18, 19, 18, 19, 43, 62, 43, 48, 10, 10,
       16, 48, 16, 38, 40,  1, 59,  0, 13, 37, 25, 48, 62, 33, 17])

In [22]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]]))) # The regular input batch 
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ]))) # The predictions made by the model W/O training 

Input: 
 "They can't have anything to put\ndown yet, before the trial's begun.'\n\n'They're putting down their na"

Next Char Predictions: 
 "Tt,SoR;BiRkg]:qoFd\nsrLsHRo([)WA]jTBT;!cN-.\nkJi.:OsJF(fP(ts:'OMs?,qlCnD_LQ nFGFGcvch::DhD[_ s\nAZMhvUE"


## Configure Checkpoints
https://www.tensorflow.org/tutorials/text/text_generation#configure_checkpoints

In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoints files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}") #checkpoints are used to save specific points during the process of the model

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( # Save checkpoints at a specific frequency
    filepath=checkpoint_prefix,
    save_weights_only=True)

## Train the Model
https://www.tensorflow.org/tutorials/text/text_generation#train_the_model

In [24]:
def loss(labels, logits): #takes in labels and logits. Logits are raw predictions that have not gone through the normalization process by the model.
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True) # calculates sparse categorial crossentropy loss, from_logits=True when the model is returning logits

example_batch_loss = loss(target_example_batch, example_batch_predictions) 
print("Prediction shape: ", example_batch_predictions.shape, "# (base_size, sequence_length, vocab_size)")
print("scalar_loss:       ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 67) # (base_size, sequence_length, vocab_size)
scalar_loss:        4.2050767


In [0]:
model.compile(optimizer='adam', loss=loss) #specific optomizer which is called adam

## Execute the Training
To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.

In [0]:
# unit of time, increasing the epochs would decrease the loss function, thus training the model harder for better results. Pushing this up would help but runs longer and harder. 
EPOCHS=50

In [27]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback]) #fitting into the model the dataset, the epochs designated before (20), and the checkpoint callbacks previously mentioned.

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Generate Text
### Restore the Latest Checkpooint
https://www.tensorflow.org/tutorials/text/text_generation#generate_text

In [28]:
tf.train.latest_checkpoint(checkpoint_dir) # Going back to a latest checkpoint since the model can fit a specific batch size only

'./training_checkpoints/ckpt_50'

In [0]:
model = build_model(vocab_size, embedding_dim, rnn_units,batch_size=1) # to make the model with a different batch size, you would have to rebuild and restore weights from the checkpoint
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir)) #loading the last checkpoint/weight which has the lowest lowest
model.build(tf.TensorShape([1, None]))

In [30]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            17152     
_________________________________________________________________
lstm_1 (LSTM)                (1, None, 1024)           5246976   
_________________________________________________________________
dense_1 (Dense)              (1, None, 67)             68675     
Total params: 5,332,803
Trainable params: 5,332,803
Non-trainable params: 0
_________________________________________________________________


## The Prediction Loop

In [0]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)
  
  # Number of characters to generate
  num_generate = 1000

  # Converting our start to string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures result in more surprising text.
  # Experiment to find the best setting.
  temperature = 0.6

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
    predictions = model(input_eval)
    # remove the batch dimension
    predictions = tf.squeeze(predictions, 0)

    # using a categorical distribution to predict the character returned by the model
    predictions = predictions / temperature
    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

    # We pass the predicted character as the next input to the model
    # along with previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)

    text_generated.append(idx2char[predicted_id])
  
  return (start_string +''.join(text_generated))

In [32]:
print(generate_text(model, start_string=u'Alice')) # Starting string affects the the result that you are going to get.
                                                   # If I used the same starting string "ROMEO: " as the tutorial, my results would have been wildly different due to the original text

Alice 'WhAnke the whiting kindly twon the end of the accident, Oh, there were all quarrel so dream,
    "Hat you ters a voice of the end--' the Hatter was the first to break the silence. 'What day of the month
is it?' he said, turning to the jury, in a low, trembling voice.

'There's more evidence Yourself a little
queer in such a nice soft thing to nurse--and she's
such a capital one for catching mice--ohe
was going to begin at the end of it, and on both sides of it, and
begun to replaid that save after the
right house before.

'Of course,' the Dodo solemnly partoould herself falling down a very deep
well.

Either the well was very deep, or she fell very slowly,
but she heard one of
them say, 'Look out now, Five! Don't go splashing paint over me like
that!'

'I could see if she came, near the end of the bat,
     And welk triem, and seemed to quiveralle hed that they could not taste
theirs, and the small ones choked and had to be patted on the back.
How suppose it the made of this rep

# Conclusion

All in all, the most important changes that I have done are the following in this code:

1) Changing the source from Shakespeare to Alice in Wonderland like I have seen in many different examples online.
  - This was mostly due to the formatting of Shakespearean writing and the play format that it is written in. I think with a bit more training I could have overcome that, but I only realized this after working with the new source and playing around with the settings in the model. 

2) Changing the type of neural network the RNN is from GRU to LSTMs.
  - The model is definitely a lot more complicated with it's nuances, but as I explained in [this cell](https://colab.research.google.com/drive/1Qzi8KNLwdecCY299pjvpE-dHigtD7a2Y#scrollTo=YB1UGKXI-XpI), the neural network would definitely benefit from a larger data source/sequence in order to make more accurate predictions.

3) EPOCHS
  - EPOCHS as explained in the comments are a unit of time that is related to the inception of Python. In this model, we use EPOCHS as a time variable in order to show how many cycles of training the model does. The larger the EPOCH value, the more training the model gets, and the more accurate it gets with its prediction and generally better with how "clean" the predictions look. 
  - I increased the EPOCHS to 50 personally just to see how much more accurate it looked, and the difference is definitely visible in the sentence structures. The most noticeable part is the decreased amount of typos and the generated text looking much more like an excerpt out of a book rather than conjoined sentences. 

4) Temperature
  - Temperature is a hyperparameter of neural networks including GRU and LSTM. Temperature is related to the predictions given by text generator because it either excites or calms the raw predictions. For example, the normal temperature used by the TensorFlow guide used *temperature = 1.0* which is just the *logits/1.0* meaning the raw prediction data (logits) are just pushed out. With a higher value for temperature, we get more diversity but more mistakes, while a lower value for temperature gives us more conservative results that have less mistakes.



