# Generating Beyonce Lyrics using an RNN text generation model

(adapted from the [tensorflow example](https://www.tensorflow.org/tutorials/sequences/text_generation), to run on [datahub.ucsd.edu](datahub.ucsd.edu))

In [1]:
import tensorflow as tf
tf.enable_eager_execution()

import numpy as np
import os
import time
from IPython.display import Image


## Opening the txt file and examining the contents

In [2]:
path_to_file = "lyrics_text.txt"

In [3]:
#open the file and read it 
text = open(path_to_file, 'rb').read().decode(encoding = "ISO-8859-1")
# length of text
print ('Length of text: {} characters'.format(len(text)))

Length of text: 272892 characters


In [4]:
# Number of unique characters 
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

78 unique characters


## Process the text

## Vectorize the text

Mapping strings to numerical representations

In [5]:
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

In [6]:
# Example of the character mapping
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'head down as ' ---- characters mapped to int ---- > [34 31 27 30  0 30 41 49 40  0 27 45  0]


## Prediction

Creating the training examples and targets in order to use them for prediction. 

In [7]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//seq_length

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
    print(idx2char[i.numpy()])

h
e
a
d
 


In [8]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
    print(repr(''.join(idx2char[item.numpy()])))

'head down as i watch my feet take turns hitting the ground eyes shut i find myself in love racing the'
' earth and im soaked in your love and love was right in my path, in my grasp and me and you belong  i'
' wanna run (run) smash into you i wanna run (run) and smash into you  ears closed what i hear no one '
'else has to know cause i know that what we have is worth first place in gold and im soaked in your lo'
've and love is right in my path, in my grasp and me and you belong, oh...  i wanna run (run) smash in'


In [9]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [10]:
for input_example, target_example in  dataset.take(1):
    print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
    print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'head down as i watch my feet take turns hitting the ground eyes shut i find myself in love racing th'
Target data: 'ead down as i watch my feet take turns hitting the ground eyes shut i find myself in love racing the'


In [11]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 34 ('h')
  expected output: 31 ('e')
Step    1
  input: 31 ('e')
  expected output: 27 ('a')
Step    2
  input: 27 ('a')
  expected output: 30 ('d')
Step    3
  input: 30 ('d')
  expected output: 0 (' ')
Step    4
  input: 0 (' ')
  expected output: 30 ('d')


## Training batches

Splitting the text into mangeable sequences

In [12]:
BATCH_SIZE = 64
steps_per_epoch = examples_per_epoch//BATCH_SIZE
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
dataset

<DatasetV1Adapter shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## Building the model

In [13]:
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024

In [14]:
if tf.test.is_gpu_available():
    rnn = tf.keras.layers.CuDNNGRU
    rnn2 = tf.keras.layers.CuDNNGRU
else:
    import functools
    rnn = functools.partial(
    tf.keras.layers.GRU, recurrent_activation='sigmoid')

In [15]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, 
                              batch_input_shape=[batch_size, None]),
    rnn(rnn_units,
        return_sequences=True, 
        recurrent_initializer='glorot_uniform',
        stateful=True),

    tf.keras.layers.Dense(vocab_size)
    ])
    return model

In [16]:
model = build_model(
  vocab_size = len(vocab), 
  embedding_dim=embedding_dim, 
  rnn_units=rnn_units, 
  batch_size=BATCH_SIZE)

## Running the model

In [17]:
for input_example_batch, target_example_batch in dataset.take(1): 
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 78) # (batch_size, sequence_length, vocab_size)


In [18]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           19968     
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3935232   
_________________________________________________________________
dense (Dense)                (64, None, 78)            79950     
Total params: 4,035,150
Trainable params: 4,035,150
Non-trainable params: 0
_________________________________________________________________


### sampling from the output distribution

In [19]:
# sampled_indices = tf.random.multinomial(example_batch_predictions[0], num_samples=1) # TF 1.12
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

In [20]:
sampled_indices

array([17, 26, 45, 28,  8,  3, 19, 59, 61,  2, 26, 48, 76, 65, 60, 52, 76,
       69, 61, 36, 59, 31, 20, 61,  8, 73, 30, 34, 44, 54, 65,  2, 25, 57,
       12, 24, 62,  0, 65, 56, 58, 19,  3, 49, 11,  1,  1, 11, 57, 70, 31,
        4, 10,  3, 59, 41, 55, 10, 66, 60, 73, 10, 70, 75, 46,  2, 68, 48,
       68, 19,  6,  4, 36, 28, 27, 37, 27, 36,  9, 56, 75, 12, 21, 73,  8,
       66, 47, 57, 55, 50, 51, 44, 57, 50, 59, 13, 68,  5, 50, 51])

In [21]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 ' everything im asking for but you... stop making a big deal out of the little things, lets get carri'

Next Char Predictions: 
 '6`sb-&8\x98\x9c"`vÂ£\x99zÂ©\x9cj\x98e9\x9c-³dhr}£"]\x931[\x9d £\x89\x948&w0!!0\x93\xade(/&\x98o\x80/¦\x99³/\xad¿t"¨v¨8+(jbakaj.\x89¿1:³-¦u\x93\x80xyr\x93x\x982¨)xy'


## Training the model

### Use an optimizer and a loss function to improve the model

In [22]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)") 
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 78)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.357435


In [23]:
model.compile(
    optimizer = tf.train.AdamOptimizer(),
    loss = loss)

### Configure checkpoints

In [24]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training'

# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Execute the training

In [26]:
EPOCHS=30

In [27]:
history = model.fit(dataset.repeat(), epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


## Generate text

### Restore the latest checkpoint

In [28]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training/ckpt_30'

In [29]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [30]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            19968     
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1024)           3935232   
_________________________________________________________________
dense_1 (Dense)              (1, None, 78)             79950     
Total params: 4,035,150
Trainable params: 4,035,150
Non-trainable params: 0
_________________________________________________________________


### Function that generates the text with a prediction loop

In [31]:
def generate_text(model, start_string):
    
    #average number of characters in a Beyonce song 
    num_generate = 2139 

    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    text_generated = []

    # we want to keep this temperature low because we want the text that is generated to
    #as accurately represent Beyonce lyrics as it can
    temperature = 1.0 
 
    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)

        predictions = tf.squeeze(predictions, 0)

        predictions = predictions / temperature
        predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()

        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

In [32]:
#one of the most common words in a Beyonce song, a peer review suggested we used a word that was less random and not 
#a title of one of her songs 
print(generate_text(model, start_string=u"love")) 

Instructions for updating:
Use `tf.random.categorical` instead.
love? why dont you neeved moliethe way   divation elves too much he wanna got the green light (whoa oh oh!) (uh uh huh uh uh uh huh uh uh) you got the green light (whoa oh oh oh oh oh oh oh oh oh!) oh ooh whoa whoa oh ooh whoa whoa oh ooh whoa whoa oh oh oh oh oh, oh oh oh oh oh ix awd thing i tolk my own keep tigns (i cant rung waiting doison my love in the urms a pecirs but i will complete and i thought it would be i will love you  i love to leave ito so scared of lonely and im scared of being the only sharmingood)  for you all in your equilld places got me looking so crazy right now your love got me looking so crazy youre through like a baby   free you thought wanna play it we the ones thats hey you reverse, and i impregnatea love and mine  (oh) cause we like a little too far...  with me better halo, halo, halo i can feel your halo, halo, halo i can see your heast  [verse 2:] you and me were standin on the sun feel ever

In [33]:
print(generate_text(model, start_string=u"beyonce")) 


beyonce] we be all night, and everything a trut if you wanna know its your secret creole  baby) boys about not a dag, sen fighting through musions are again me, baby, nowill i wanna be for you like youre there for me you cunt me looking so crazy youre schoolin life?  theres nothing not to love about me how to fight temptation im not sereatent just we can prottless over second til we rendezvous  all i got comescare on your video, video if you wanta mouth like a birn this beats so damn quick when you say my day you know what sidgle lame and ime a do but just the temperature pon my dreams  baby boy not a day goes by without my find somebody else if i begced nobody heart wanna party wanna be, im good on any mlk boulevard (i go off), i gotta hold on youve got a hold on youve got the best of mustands  out to all the why im surrigas mama, all you taust  everything i see is you and i dont want no substitune bay i together its coorigged around what you love me from you yeah, yeah, yeah, yeah, y

In [34]:
print(generate_text(model, start_string=u"drunk in love")) #can use a phrase here 
#this is one of her song lyrics

drunk in love)  you know it costs to be  your souce i love you  never can love all night long... (all night yeah)  sweet love all night long... (all night yeah)  good love all night long...  they sare asse you saving that we thank, thank) i cant believe we made it (this is what we made, made) this is what were thankful for (they never let me go) (uh uh huh shake up lookin corfidention i got a hot say forther way im not the peak, babe, the peak, babe, the peak, babe, the peak, babe, the peaks and beyour  off to catch myself but im out on your money, ive got a hold on, a hold on to me, brown like an eyes siftle all my ladies get it aint right we can chince to show we dont uve feel your halo, halo, halo i can see your halo, halo, halo i can feel your halo, halo, halo i can see your halo, halo, halo i can see your hurts, we shinike round whe hey  sit runcer oh, ladies, you put my love to the deal thats okay  well no, they id hadiama now fliest no seems to be the boss of me i just me so cra

In [35]:
#using a word from a country song (not something you usually see in her songs) to see what the 
#model would produce
print(generate_text(model, start_string=u"tractor"))  

tractor, cause youre nocanor you back up, this is my dream then i am let as lo que es amar a una mujer sabrÃ£Â­a escuchar pues conozco el dolor ves me  my bedyone you at the time you know that shed befter my toming, keep me home oh oh, oh ah it could be far worse  closer lating wime ill be your neast goes around comes back my time is something that ive got the bresk my histrese dont work out that way, woust you dont need nobody else thats why youre the one that (i love all yÃ¢ÂÂall capricorn aquarius pisces aries taurus gemini cancer oh! leo virgo libra scorpio sagittarius i love all yÃ¢ÂÂall  i wike cause you his mind take you there world-wide woman www you know im the typh of you scream mus and a friend you reverse that cowgirl i trust you i keep my fingers off 25 huntre, as here i know its like i hove you  look at all than pincy the next do the thind i found a wock the ver thoughey no i see is oh, oh yeah  oh what do you do whatever i hear thit i moke away, cause youre no angel 