# Perth Machine Learning Group Poem Generator

## Introduction

The following code uses GRU to generate rhymes.

In short, it observes sequences of rhymes, and infers the next rhyme.

## The code

### Data exploration

In [None]:
import tensorflow as tf  # version 1.9 or above
tf.enable_eager_execution()  # Execution of code as it runs in the notebook. Normally, TensorFlow looks up the whole code before execution for efficiency.

from tensorflow.keras.layers import Embedding, Dense, GRU
from tensorflow.keras import Model
#from tensorflow.data.Dataset import from_tensor_slice
from tensorflow.train import AdamOptimizer
from tensorflow.losses import sparse_softmax_cross_entropy
import numpy as np
import re
import time

In [None]:
path = 'rhymes.txt'

In [None]:
with open(path, encoding='utf-8') as f:
    text = f.read().lower()

text = re.sub('[^a-z\n]', ' ', text)
text = text.split('\n')

### Dataset creation

In [None]:
unique = sorted(set(text))  # contains all the unique words in the corpus of rhymes

word2idx = {u:i for i, u in enumerate(unique)}  # maps words to indexes
idx2word = {i:u for i, u in enumerate(unique)}  # maps indexes to words

In [None]:
max_length = 100  # Maximum length sentence we want per input in the network
vocab_size = len(unique)
embedding_dim = 128  # number of 'meaningful' features to learn. Ex: ['queen', 'king', 'man', 'woman'] has a least 2 embedding dimension: royalty and gender.
units = 512  # In keras: number of output of a sequence. In short it rem
BATCH_SIZE = 64
BUFFER_SIZE = 10000

In [None]:
input_text = []
target_text = []

for f in range(0, len(text) - max_length, max_length):
    inps = text[f : f + max_length]
    targ = text[f + 1 : f + 1 + max_length]
    input_text.append([word2idx[i] for i in inps])
    target_text.append([word2idx[t] for t in targ])

In [None]:
dataset = tf.data.Dataset.from_tensor_slices((input_text, target_text)).shuffle(BUFFER_SIZE)
dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(BATCH_SIZE))

### Explaination

In fact, the algorithm does not learn which characters comes next. It analyzes sequences of characters as inputs (ex: 'abcd'), and predicts sequences as outputs (ex: 'bcde').

Why?

During the training phase, it learns more that just the next character. It updates weights for each characters from the input sequence to the output sequence.

> Consider the sequences 'abcd', 'bcde', 'cdef', 'defg', the letter "d" is given different weights that depend on the previous sequences

The use of these updates helps predicting better the next sequences and so on. So it learns the next character but also all the subsequent weights to better predict the next letter

In our dataset, an example of input and target are:

In [None]:
# example of input:
print('Given the following sequence: \n\n')
print([idx2word[input_text[15][i]] for i in range(len(target_text[0]))])
print('\n\n')
print('the network has to learn that a correct continuation is: \n')
# example of output the algorithm has to learn
print([idx2word[target_text[15][i]] for i in range(len(input_text[0]))])

### Model

We build a model with:
  * an embedding layer to prepare output to feed the GRU layer
  * a GRU (Gated Recurrent Unit) layer
  * a regular neural network layer

In [None]:
class Model(Model):
  def __init__(self, vocab_size, embedding_dim, units, batch_size):
    super(Model, self).__init__()
    self.units = units
    self.batch_sz = batch_size
    self.embedding = Embedding(vocab_size, embedding_dim)
    self.gru = GRU(self.units, 
                   return_sequences=True, 
                   return_state=True, 
                   recurrent_activation='sigmoid', 
                   recurrent_initializer='glorot_uniform')
    self.fc = Dense(vocab_size)
        
  def call(self, x, hidden):
    '''
    Predicts an output given x
    This function will be used for gradient descent during the training phase
    '''
    x = self.embedding(x)
    output, states = self.gru(x, initial_state=hidden)
    output = tf.reshape(output, (-1, output.shape[2]))
    x = self.fc(output)
    return x, states

In [None]:
model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)

Then we choose a regular AdamOptimizer, and a cross entropy loss funtion

In [None]:
optimizer = AdamOptimizer()

In [None]:
def loss_function(real, preds):
    return sparse_softmax_cross_entropy(labels=real, logits=preds)

### Training

We train the model over 100 epoch (you can train it longer if you want)

In [None]:
n_epochs = 1

for epoch in range(n_epochs):
    start = time.time()
    hidden = model.reset_states()  # initializes the hidden state at the start of every epoch
    
    for (batch, (inp, target)) in enumerate(dataset):
          with tf.GradientTape() as tape:
              predictions, hidden = model(inp, hidden)  # predicts next letter given an input
              target = tf.reshape(target, (-1, ))
              loss = loss_function(target, predictions)  # compares the prediction with the real output

          grads = tape.gradient(loss, model.variables)
          optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())  # Gradient descent

          if batch % 100 == 0:
              print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1, batch, loss))
    
    print ('Epoch {} Loss {:.4f}'.format(epoch + 1, loss))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

### Save our model

We often want to save a model and use it later. Here is a way to do it. (some improvements can be made)

First, put all hyperparameters in a dictionary

In [None]:
hyperparameters = {}
hyperparameters['max_length'] = max_length
hyperparameters['vocab_size'] = vocab_size
hyperparameters['embedding_dim'] = embedding_dim
hyperparameters['units'] = units
hyperparameters['BATCH_SIZE'] = BATCH_SIZE
hyperparameters['BUFFER_SIZE'] = BUFFER_SIZE

Second, save the hyperparameters, the weights for every layers we have trained (embedding, gru, fc)

In [None]:
np.save('hyperparameters_rhymes', hyperparameters)
np.save('embedding_weights_rhymes', model.embedding.get_weights())
np.save('gru_weights_rhymes', model.gru.get_weights())
np.save('fc_weights_rhymes', model.fc.get_weights())
np.save('word2idx_rhymes', word2idx)
np.save('idx2word_rhymes', idx2word) 

### Text Generation

We can now see how the model performs. 

A value we want to tune is the temperature. It tells how 'random' we want our predictions to be. The lowest the value, the more the prediction is random; giving nonsense. Greater values favorize the letter that is the most probable; creating a lot of repetitions.

Feel free to change the rhymes and the temperatures.

In [None]:
num_generate = 100  # number of characters to generate
start_string = ['fell', 'vain', 'well', 'tree', 'fell', 'leave', 'me', 'above', 'melody']  # beginning of the generated text. TODO: try start_string = ' '

input_eval = [word2idx[s] for s in start_string]  # converts start_string to numbers the model understands
input_eval = tf.expand_dims(input_eval, 0)  # 

text_generated = []

temperature = 0.0001# the greater, the closer to an observation in the corpus

hidden = [tf.zeros((1, units))]
for i in range(num_generate):
    predictions, hidden = model(input_eval, hidden)  # predictions holds the probabily for each character to be most adequate continuation
   
    predictions = predictions / temperature  # alters characters' probabilities to be picked (but keeps the order)
    predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].numpy()  # picks the next character for the generated text
    input_eval = tf.expand_dims([predicted_id], 0)
    text_generated += [idx2word[predicted_id]]

print (start_string + text_generated)

## Conclusion

That's promising. There are some interesting properties depending on how 'low' the temperature is (the meaning 'low' depend from a model to another)
* Very low temperature may lead to prose
* Low temperature can come up with weak rhymes by chance
* high temperature lead to loops, some repetition of words. That could be an effect somebody looks for

Hard-to-fix issue: rare words are really rare

Possible improvements:
* train at a character level?