## SCS_3546_006 Term Project - Yang Sui
# **Music Generation using RNN**
---
Inspired by [this Toward Data Science article by Sigurður Skúli](https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5).

Code borrowed and modified from [this Github project](https://github.com/unmonoqueteclea/DeepLearning-Notebooks/tree/master/LSTM-Music-Generation).

Improvements from the referenced code are discussed throughout this notebook.

# Music generation with LSTM in Keras
In this notebook, I am **generate some piano compositions** using a Long Short-Term Memory (LSTM) network. I'm using some piano compositions from Beethoven and Mozart to form the training data for the network. I feed the network with a long sequence of notes parsed from **MIDI files** of Beethoven and Mozart piano pieces. After training, the network will be able to generate new MIDI files.

## LSTM networks
Long Short-Term Memory networks are one type of **Recurrent Neural Network (RNN)**. 
They are networks whose output depends on the previous ones. This loop behaviour makes them the perfect option to work with sequences and lists. LSTM in particular is designed to capture long-term dependencies, i.e. remember information for long periods.

## Music learning and generation
Training dataset is parsed from MIDI files of piano music. MIDI file contain information about music, rather than the audio signal of a recording of the music being played. It is easy to pull information from a MIDI file. For this model, I read the MIDI files to find out what note on the piano is being played. Those notes are encoded into the training data array.

After training, the network is asked to generate new sequences of notes. That sequence is easily encoded into a MIDI file which can then be played by media player software (e.g. Windows Media Player).


## Google drive configuration for Colab
Mount my **Google Drive** as a drive to store files generated from the model.

In [0]:
from google.colab import drive
# Mount my google drive as a drive on Colab
drive.mount('/content/drive')

#use variable wd to store the working directory path as the "Input" folder provided with this exercise
wd = "/content/drive/My Drive/UofT Deep Learning Course/Project/Temp"

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


## Packages and data
Use [**music21 package**](http://web.mit.edu/music21/) to process MIDI files to get only the information needed and discard the rest.

Music21 creates its own representation of a MIDI file, with different **Note** and **Chord** objects representing all the music inside a MIDI file. It's a representation easier to read than the MIDI one, so it will help our network to *understand* music and be able to create new compositions.

In [0]:
!pip install music21;



### Original MIDI files
 **MIDI files** used in for training come from [piano-midi.de](http://www.piano-midi.de/midis/format0/).
 
This training set contain 29 MIDI files from Beethoven and 21 files from Mozart.

In [0]:
# list the files on the Colab drive
!ls

Beethoven  drive  midi_files_BeethovenMozart.zip  Mozart  notes  sample_data


In [0]:
# Clean up the Colab drive in case of mistakes.
#!rm -rf midi_files/

In [0]:
from google.colab import files
uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving midi_files_BeethovenMozart.zip to midi_files_BeethovenMozart.zip
User uploaded file "midi_files_BeethovenMozart.zip" with length 507600 bytes


In [0]:
!unzip midi_files_BeethovenMozart.zip;

Archive:  midi_files_BeethovenMozart.zip
   creating: Beethoven/
  inflating: Beethoven/appass_1.mid  
  inflating: Beethoven/appass_2.mid  
  inflating: Beethoven/appass_3.mid  
  inflating: Beethoven/beethoven_hammerklavier_1.mid  
  inflating: Beethoven/beethoven_hammerklavier_2.mid  
  inflating: Beethoven/beethoven_hammerklavier_3.mid  
  inflating: Beethoven/beethoven_hammerklavier_4.mid  
  inflating: Beethoven/beethoven_les_adieux_1.mid  
  inflating: Beethoven/beethoven_les_adieux_2.mid  
  inflating: Beethoven/beethoven_les_adieux_3.mid  
  inflating: Beethoven/beethoven_opus10_1.mid  
  inflating: Beethoven/beethoven_opus10_2.mid  
  inflating: Beethoven/beethoven_opus10_3.mid  
  inflating: Beethoven/beethoven_opus22_1.mid  
  inflating: Beethoven/beethoven_opus22_2.mid  
  inflating: Beethoven/beethoven_opus22_3.mid  
  inflating: Beethoven/beethoven_opus22_4.mid  
  inflating: Beethoven/beethoven_opus90_1.mid  
  inflating: Beethoven/beethoven_opus90_2.mid  
  inflating: 

## Processing data

Let's process the files, and load them into **music21**

In [0]:
# Importing dependencies
import glob
import pickle
import numpy
from music21 import converter, instrument, note, chord, stream
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Activation
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint

Using TensorFlow backend.


**music21** represents music with two elements:
- **Notes**
- **Chords**

Each element also has a time offset indicating when the note or chord is played.

In [0]:
# check that music21 can parse an example MIDI file
file = "Beethoven/appass_1.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.note.Note C> 4.5
<music21.note.Note C> 4.5
<music21.note.Note G#> 5.75
<music21.note.Note G#> 5.75
<music21.note.Note F> 6.0
<music21.note.Note F> 6.0
<music21.note.Note G#> 10.5
<music21.note.Note G#> 10.5
<music21.note.Note C> 11.75
<music21.note.Note C> 11.75


- A **note** is stored in the list as a string representing the pitch (the note name) and the octave.

- A **chord** (set of notes played at the same time) will be stored as a string of numbers separated by dots. Each number represents the pitch of a note within the chord.

**This approach does not consider time offsets of each element**. I.e. the note offset is not captured. This only encodes the order of notes/chords in a piece of music. The duration of each note/chord is also not captured. This sequence of notes/chords is stored in the 'notes' array, which serves as the training data for this model.

In [0]:
notes = []
# loop through all files for first composer
for i,file in enumerate(glob.glob("Beethoven/*.mid")):
  midi = converter.parse(file)
  print('\r', 'Parsing file ', i, " ",file, end='')
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.parts[0].recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))

# loop through all files for second composer
for i,file in enumerate(glob.glob("Mozart/*.mid")):
  midi = converter.parse(file)
  print('\r', 'Parsing file ', i, " ",file, end='')
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.parts[0].recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))

with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

# save notes array for reloading later if I lose Colab connection
with open(wd + '/mashup1_notes', 'wb') as filepath2:
  pickle.dump(notes, filepath2)

 Parsing file  20   Mozart/mz_332_3.mid

In [0]:
# or re-load previously saved 'notes' variable values instead of rebuilding from the zip file again.
with open(wd + '/mashup1_notes', 'rb') as filepath:
  notes = pickle.load(filepath)

Obtain the number of different notes in the dataset. This will be the **number of possible output classes**  of the model.

In [0]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

285

Now some **data processing** is required:

- map each pitch or chord to an integer
- create a dictionary pairing input sequences to their corresponding output note (output note being the note/chord immediatedly after the last note/chord in the input sequence).

Surely, different **sequence_length** will lead to different results. The original tutorial I borrowed this model from used a sequence length of 100. In this model, I'm trying a sequence_length of 50.

The network will make its prediction of the next note/chord, based on the previous *sequence_length* notes/chords. 

![alt text](https://raw.githubusercontent.com/yangsui05/Music-generation-LSTM/master/inputoutputsequences_corrected.png)

In [0]:
sequence_length = 50
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length] # Size sequence_length
  sequence_out = notes[i + sequence_length]  # Size 1
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in sequence_in])
  # Map integer of sequence_out to an integer
  network_output.append(note_to_int[sequence_out])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
# normalize input
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output)

Checking the network_input size

In [0]:
network_input.shape

(146004, 50, 1)

## Creating model

Create a network with 9 layers (3 of them **LSTM layers**).

For regularization, add 3 **Dropout** layers in betweeen.

In [0]:
def create_network(network_input, n_vocab):
    """ create the structure of the neural network """
    model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        return_sequences=True
    ))
    model.add(Dropout(0.3))
    model.add(LSTM(512, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(512))
    model.add(Dense(256))
    model.add(Dropout(0.3))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    return model

In [0]:
model = create_network(network_input,n_vocab)
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_7 (LSTM)                (None, 50, 512)           1052672   
_________________________________________________________________
dropout_7 (Dropout)          (None, 50, 512)           0         
_________________________________________________________________
lstm_8 (LSTM)                (None, 50, 512)           2099200   
_________________________________________________________________
dropout_8 (Dropout)          (None, 50, 512)           0         
_________________________________________________________________
lstm_9 (LSTM)                (None, 512)               2099200   
_________________________________________________________________
dense_5 (Dense)              (None, 256)               131328    
_________________________________________________________________
dropout_9 (Dropout)          (None, 256)              

(Optional) load in previously trained weights. For restarts in case Colab dropped connection or to continue training from a specific epoch.

In [0]:
# In case we want to use previously trained weights
weights = ""
if(len(weights)>0): model.load_weights(weights)

In [0]:
# In case we want to use previously trained weights
weights = wd + "/mashup1_best.h5"
if(len(weights)>0): model.load_weights(weights)

# Train the model
Use **ModelCheckpoint** to save the best weights during training. Saved weights can be reloaded later to continue training if Colab connection is lost.

In [0]:
#filepath = wd + "/mashup1_best.h5"
filepath = wd + "/mashup1_best.h5"

checkpoint = ModelCheckpoint(filepath, monitor='loss',verbose=0,
                             save_best_only=True,mode='min')

callbacks_list = [checkpoint]
model.fit(network_input, network_output, epochs=5, batch_size=64, 
          callbacks=callbacks_list)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9b1a9d1550>

## Music generation

After the model is trained, it can be used to generate music (sequence of notes/chords to turn into a MIDI file).

![Music Generation](https://raw.githubusercontent.com/yangsui05/Music-generation-LSTM/master/Generative%20workflow_corrected.png)

In [0]:
# In case I want to use other previously trained weights
weights = wd + "/mashup1_best.h5"
if(len(weights)>0): model.load_weights(weights)

In [0]:
# Generate network input again
network_input = []
output = []
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length]
  sequence_out = notes[i + sequence_length]
  network_input.append([note_to_int[char] for char in sequence_in])
  output.append(note_to_int[sequence_out])
n_patterns = len(network_input)

The workflow now is:


1.   Pick a **seed sequence** randomly from the list of inputs (*pattern* variable)
2.   Pass it as input to the model to generate a new element (note or chord)
3.   Add the new element to the final song (prediction_output variable) and the *pattern* list
4.   Remove the first item from *pattern*.
5.   Go to step 2, repeat for the number of elements desired.




In [0]:
""" Generate notes from the neural network based on a sequence of notes """
# number of notes to generate
numNotes = 50

# pick a random sequence from the input as a starting point for the prediction
start = numpy.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = network_input[start]
prediction_output = []
# generate notes
for i,note_index in enumerate(range(numNotes)):
  prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
  prediction_input = prediction_input / float(n_vocab)
  prediction = model.predict(prediction_input, verbose=0)
  index = numpy.argmax(prediction)
  result = int_to_note[index]
  print('\r', 'Predicted ', i, " ",result, end='')
  prediction_output.append(result)
  pattern.append(index)
  pattern = pattern[1:len(pattern)]

 Predicted  49   E4

The last step is creating a MIDI file from the predictions.

**music21** will once again be used for this task. Create a **Stream** and add to it the predicted notes and chords (elements of the predicted_output array.

Add an offset of 0.5 between elements. This is required because the sequence of notes generated by the model does not have timing information. This step manually puts the generated notes on a timeline with equal duration for each note. This simulates a song that only contains quarter notes. Obviously, this isn't real music as real music has variation in note duration. But it's a compromise for now. Updating the model to account for note duration is reserved for future work.

In [0]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp=wd + '/mashup1_output.mid')

'/content/drive/My Drive/UofT Deep Learning Course/Project/Temp/mashup1_output.mid'

In [0]:
from google.colab import files
files.download(wd + '/mashup1_output.mid')

# Sample output
Follow the following link to an mp3 of sample output from this model.
[Sample Output](https://github.com/yangsui05/Music-generation-LSTM/blob/master/mashup1_seqSize50_epochs10_20_26_46_72_90.mp3)

This sample mp3 demonstrates the evolution of the training of this model. It consists of 10 seconds of the model output at 10 epochs, 20 epochs, 26 epochs, 46 epochs, 72 epochs, and 90 epochs, with short silences in between.

# Next steps
There are many things I would like to continue to explore and improve:
- Account for note/chord duration so that output sounds more like intentionally composed music.
- Try mashing up musical styles that are very different. E.g. video game music with jazz, rag time piano with classical etc.
- Try mashing up many more composers and musical styles.
- Explore different sequence lengths and other hyperparameter values.
- Explore how to add an element of creativity. Perhaps supplement or augment the model's musical vocabulary.