# Lyrics Generation
**Veronica Bruno (230904), Cristina Galvez (230260) and Rafael Bardisa (231142)**

In this notebook, we prepare a dataset containing the lyrics of many famous songs to feed different Deep Learning Models:
- RNN 
- LSTM
- BiDirectional LSTM

Once the model is trained, it is able to generate new song lyrics given a seed (an initial string of words) that will resemble the patterns the model will have learned during training.

In [None]:
# import Keras library
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Dropout
from keras.layers import LSTM, SimpleRNN, Input, Bidirectional
from keras.callbacks import ModelCheckpoint
from tensorflow.keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from keras.metrics import categorical_accuracy

# import spacy, and spacy french model
# spacy is used to work on text
import spacy
nlp = spacy.load("en_core_web_sm")

#import other libraries
import numpy as np
import pandas as pd
import random
import sys
import os
import time
import codecs
import collections
from six.moves import cPickle

from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')
data_path = '/content/drive/Shareddrives/Deep Learning/DeepLearning_2022/Final Project/Data/'
results_path = '/content/drive/Shareddrives/Deep Learning/DeepLearning_2022/Final Project/Results/'

df = pd.read_csv(data_path + 'songdata.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Data Preparation

Prepare all data from the dataset *'songdata.csv'* to be used in the BiLSTM algorithm.

In [None]:
# join all song lyrics from the dataset in a long string
data = ', '.join(df['text'])

In [None]:
# function to create a wordlist
def create_wordlist(doc):
    wl = []
    for word in doc:
        if word.text not in ("\n","\n\n",'\u2009','\xa0'):
            wl.append(word.text.lower())
    return wl

In [None]:
# create array of words (in order)
wordlist = []
word_limit = 100000 # define amount of words used (limited by RAM memory)

doc = nlp(data[0:word_limit])
wl = create_wordlist(doc)
wordlist = wordlist + wl

In [None]:
# count the number of words
word_counts = collections.Counter(wordlist)

# Mapping from index to word : that's the vocabulary
vocabulary_inv = [x[0] for x in word_counts.most_common()]
vocabulary_inv = list(sorted(vocabulary_inv))

# Mapping from word to index
vocab = {x: i for i, x in enumerate(vocabulary_inv)}
words = [x[0] for x in word_counts.most_common()]

# size of the vocabulary
vocab_size = len(words)
print("Vocabulary size:", vocab_size)

# save the words and vocabulary
with open(results_path + "vocab_file.pkl", 'w+b') as f:
    cPickle.dump((words, vocab, vocabulary_inv), f)

Vocabulary size: 1832


In [None]:
# create sequences of fixed length
sequences = []
next_words = []
seq_length = 30  # define sequence length
sequences_step = 1

for i in range(0, len(wordlist) - seq_length, sequences_step):
    sequences.append(wordlist[i: i + seq_length])
    next_words.append(wordlist[i + seq_length])

print('Number of sequences:', len(sequences))

Number of sequences: 24326


In [None]:
# define data as matrices with 0s and 1s
X = np.zeros((len(sequences), seq_length, vocab_size), dtype=np.bool)
y = np.zeros((len(sequences), vocab_size), dtype=np.bool)
for i, sentence in enumerate(sequences):
    for t, word in enumerate(sentence):
        X[i, t, vocab[word]] = 1
    y[i, vocab[next_words[i]]] = 1

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  This is separate from the ipykernel package so we can avoid doing imports until


## Model Definition

The first model we are going to define will be the **RNN** model:

In [None]:
def rnn_model(seq_length, vocab_size):
    print('Build RNN model.')
    model = Sequential()
    model.add(SimpleRNN(units, activation="relu",input_shape=(seq_length, vocab_size))) # add RNN layer
    model.add(Dropout(0.6))
    model.add(Dense(vocab_size))
    model.add(Activation('softmax'))
    
    optimizer = Adam(lr=learning_rate)
    callbacks=[EarlyStopping(patience=2, monitor='val_loss')]
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=[categorical_accuracy])
    print("model built!")
    return model

The following is the **LSTM** model:

In [None]:
def lstm_model(seq_length, vocab_size):
    print('Build LSTM model.')
    model = Sequential()
    model.add(LSTM(units, activation="relu",input_shape=(seq_length, vocab_size))) # add LSTM layer
    model.add(Dropout(0.6))
    model.add(Dense(vocab_size))
    model.add(Activation('softmax'))
    
    optimizer = Adam(lr=learning_rate)
    callbacks=[EarlyStopping(patience=2, monitor='val_loss')]
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=[categorical_accuracy])
    print("model built!")
    return model

The following function defines a **Bidirectional LSTM** model, which uses two LSTM models (one in each direction) so that both directions of propagation are taken into account.

In [None]:
def bidirectional_lstm_model(seq_length, vocab_size):
    print('Build LSTM model.')
    model = Sequential()
    model.add(Bidirectional(LSTM(units, activation="relu"),input_shape=(seq_length, vocab_size))) # add BiLSTM layer
    model.add(Dropout(0.6))
    model.add(Dense(vocab_size))
    model.add(Activation('softmax'))
    
    optimizer = Adam(lr=learning_rate)
    callbacks=[EarlyStopping(patience=2, monitor='val_loss')]
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=[categorical_accuracy])
    print("model built!")
    return model

Once we have the models define, we initialize them. We also define a learning rate and the amount of units in the model.

In [None]:
units = 256 # units in the model
learning_rate = 0.001 #learning rate

Initialize models:

In [None]:
# RNN
md_rnn = rnn_model(seq_length, vocab_size)
md_rnn.summary()

Build RNN model.
model built!
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (None, 256)               534784    
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense (Dense)               (None, 1832)              470824    
                                                                 
 activation (Activation)     (None, 1832)              0         
                                                                 
Total params: 1,005,608
Trainable params: 1,005,608
Non-trainable params: 0
_________________________________________________________________


  super(Adam, self).__init__(name, **kwargs)


In [None]:
# LSTM
md_lstm = lstm_model(seq_length, vocab_size)
md_lstm.summary()

Build LSTM model.
model built!
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 256)               2139136   
                                                                 
 dropout_1 (Dropout)         (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 1832)              470824    
                                                                 
 activation_1 (Activation)   (None, 1832)              0         
                                                                 
Total params: 2,609,960
Trainable params: 2,609,960
Non-trainable params: 0
_________________________________________________________________


  super(Adam, self).__init__(name, **kwargs)


In [None]:
# Bidirectional LSTM
md_bilstm = bidirectional_lstm_model(seq_length, vocab_size)
md_bilstm.summary()

Build LSTM model.
model built!
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bidirectional (Bidirectiona  (None, 512)              4278272   
 l)                                                              
                                                                 
 dropout_2 (Dropout)         (None, 512)               0         
                                                                 
 dense_2 (Dense)             (None, 1832)              939816    
                                                                 
 activation_2 (Activation)   (None, 1832)              0         
                                                                 
Total params: 5,218,088
Trainable params: 5,218,088
Non-trainable params: 0
_________________________________________________________________


  super(Adam, self).__init__(name, **kwargs)


## Training the Model

**DO NOT RUN IF NOT NECESSARY**

**The training of the models can take up to an hour, and previously trained models (same data) can be loaded in the next section.**

In [None]:
batch_size = 32 # minibatch size
num_epochs = 50 # number of epochs

Train RNN with the prepared data:

In [None]:
# train the RNN model
history = md_rnn.fit(X, y,
                 batch_size=batch_size,
                 shuffle=True,
                 epochs=num_epochs,
                 validation_split=0.1)

# save the model
md_rnn.save(results_path + 'my_model_generate_sentences_rnn.h5')

Train LSTM with the prepared data:

In [None]:
# train the LSTM model
history = md_lstm.fit(X, y,
                 batch_size=batch_size,
                 shuffle=True,
                 epochs=num_epochs,
                 validation_split=0.1)

# save the model
md_lstm.save(results_path + 'my_model_generate_sentences_lstm.h5')

Train BiLSTM with the prepared data:

In [None]:
# train the Biderectional LSTM model
history = md_bilstm.fit(X, y,
                 batch_size=batch_size,
                 shuffle=True,
                 epochs=num_epochs,
                 validation_split=0.1)

# save the model
md_bilstm.save(results_path + 'my_model_generate_sentences_bilstm.h5')

## Upload Generated Data

To upload a previously generated dictionary:

In [None]:
# load vocabulary
print("loading vocabulary...")
vocab_file = os.path.join(results_path, "vocab_file.pkl")

with open(os.path.join(results_path, 'vocab_file.pkl'), 'rb') as f:
        words, vocab, vocabulary_inv = cPickle.load(f)

vocab_size = len(words)

loading vocabulary...


To load a trained model:

In [None]:
from keras.models import load_model

# load the RNN model
print("loading RNN model...")
model_rnn = load_model(results_path + 'my_model_generate_sentences_rnn.h5')

loading RNN model...


In [None]:
# load the LSTM model
print("loading LSTM model...")
model_lstm = load_model(results_path + 'my_model_generate_sentences_lstm.h5')

loading LSTM model...


In [None]:
# load the BiLSTM model
print("loading BiLSTM model...")
model_bilstm = load_model(results_path + 'my_model_generate_sentences_bilstm.h5')

loading BiLSTM model...


## Lyrics Generation
Define functions to generate lyrics, given a model, a length and a seed sentence.

In [None]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [None]:
def generate_lyrics(model, words_to_generate, seed):
  # initiate sentences
  generated = ''
  sentence = []
  seq_length = 30

  # we shate the seed accordingly to the neural netwrok needs:
  for i in range(seq_length):
      sentence.append("oh")

  seed = seed_sentences.split()

  for i in range(len(seed)):
      sentence[seq_length-i-1]=seed[len(seed)-i-1]

  generated += ' '.join(sentence)

  #then, we generate the text
  for i in range(words_number):
      # create the vector
      x = np.zeros((1, seq_length, vocab_size))
      for t, word in enumerate(sentence):
        x[0, t, vocab[word]] = 1.

      # calculate next word
      preds = model.predict(x, verbose=0)[0]
      next_index = sample(preds, 0.33)
      next_word = vocabulary_inv[next_index]

      # add the next word to the text
      generated += " " + next_word
      # shift the sentence by one, and and the next word at its end
      sentence = sentence[1:] + [next_word]

  # print the whole text
  return generated
  #print('\n' + generated)

#### Generate Lyrics

In [None]:
words_number = 200 # number of words to generate

# seed sentence to start the generating.
seed_sentences = '''do you call her , almost say my name ? 'cause let 's say , we kinda do sound the same i hate to think that i was just your''' 

Generate RNN:

In [None]:
# generate from RNN model
rnn_gen_lyrics = generate_lyrics(model_rnn, words_number, seed_sentences)
print('\n RNN\n', rnn_gen_lyrics)

  after removing the cwd from sys.path.



 RNN
 do you call her , almost say my name ? 'cause let 's say , we kinda do sound the same i hate to think that i was just your  
 but i am the city  
 you let me be  
  
 somewhere in the middle of the never ending noise  
 there is a constant steady rhythm of a heart that beats  
  
 somewhere in the crowd  
 the first , you do  
 i 'm not a coward  
 oh no , i 'll be strong  
 one chance in a lifetime  
 yes i will take it , it ca n't go wrong  
  
 i 've been waiting for you  
 oh , i 'm riding higher than the sky and there is fire in every kiss  
 kisses of fire  
 kisses of fire  
  
 kisses of fire , burning , burning  
 i 'm at the point of no returning  
 kisses of fire , sweet devotions  
 caught in a landslide of emotions  
 i 've had my share of love affairs but they were nothing compared to this  
 oh , i 've been waiting for you  
 oh , i 've been waiting for you  
 oh , i 'm riding higher than the sky and there is nothing we can do  
 knowing me , knowing you ( ah - ha

Generate LSTM:

In [None]:
# generate from LSTM model
lstm_gen_lyrics = generate_lyrics(model_lstm, words_number, seed_sentences)
print('\n LSTM\n', lstm_gen_lyrics)

  after removing the cwd from sys.path.



 LSTM
 do you call her , almost say my name ? 'cause let 's say , we kinda do sound the same i hate to think that i was just your  
 and i know what he 's gon na sing you make it all gon na sing it all comes back to me break  
 but who of the morning without you  
  
 would  
 to see you little longer , yeah  
 i can see that you must be our  
 to just a dream  
 we were always has to love for me  
 he 's too on  
 just a bell ring  
 one more the and we can hear the night  
 touch my my life is so sad ,  
 i 've been waiting for a night  
 i 'm not a movie  
 you know i was n't know what a mean  
 when you 're all alone  
 so dance while the music still goes on  
 it 's a crying of your mind  
 'cause it 's gon na make it )  
 but i can imagine the night i want to be  
  
 i 'm gon na sing it my love song , gon na bring you some light  
 gon na make you feel happy every day of your life  
 gon na sing you my love song , gon


Generate Bidirectional LSTM:

In [None]:
# generate from BiLSTM model
bilstm_gen_lyrics = generate_lyrics(model_bilstm, words_number, seed_sentences)
print('\n BiLSTM\n', bilstm_gen_lyrics)

  after removing the cwd from sys.path.



 BiLSTM
 do you call her , almost say my name ? 'cause let 's say , we kinda do sound the same i hate to think that i was just your love  
 ca n't deny it  
 'cause it 's true  
 i do , i do , i do  
  
 i do , i do  
  
 oh , no hard feelings between you and me  
 if we ca n't make it  
 but just wait and see  
  
 so come on now lets try it  
 i love you , ca n't deny it  
 'cause it 's true , i do , i do , i do , i do , i do  
  
 so love me or leave me  
 make your choice but believe me  
 i love you , i do , i do , i do , i do , i do  
  
 i ca n't conceal it , do n't you see ?  
 ca n't you feel it ?  
 do n't you too ?  
 i do , i do , i do , i do , i do  
  
 oh , i 've been dreaming through my lonely past  
 now i 've just made it  
 i found you at last  
 so come on  
 now you let 's try it  
 i love you  
 ca
