# Jazz melody generation using LSTMs

Using data from the Weimar Jazz Database and based on Jason Brownlee's LSTM text generation tutorial.

Currently this only takes in a single MIDI file containing the melody track; further notebooks will explore multiple MIDI files, harmony mappings, and who knows what else!

Audio links are at the very bottom.

### Imports

In [1]:
import datetime
import re

import h5py
import keras
import mido
import numpy as np

Using TensorFlow backend.


### User-defined variables

In [2]:
# Name of the tune we're loading in; should match MIDI file name
#tune_name = "ArtPepper_Anthropology_FINAL"
tune_name = "ColemanHawkins_BodyAndSoul_FINAL"
# tune_name = "JohnColtrane_Oleo_FINAL"
# tune_name = "MilesDavis_Oleo-1_FINAL"
# tune_name = "RedGarland_Oleo_FINAL"

# Whether or not to fit the model (or to use existing weights)
should_fit_model = False # Change this to True if we want to fit the model, False to skip

### Load the data

In [3]:
# Load the data
# midi_file = mido.MidiFile("../data/midi/{}.mid".format(tune_name)) # Unquantized
midi_file = mido.MidiFile("../data/midi_quantized/{}.mid".format(tune_name)) # Quantized
midi_track = midi_file.tracks[0]

### Clean the data

In [4]:
# Get notes only
midi_notes = [msg for msg in midi_track if msg.type=="note_on" or msg.type=="note_off"]
len(midi_notes)
midi_notes[:10]

[<message note_on channel=0 note=51 velocity=100 time=76230>,
 <message note_off channel=0 note=51 velocity=100 time=16170>,
 <message note_on channel=0 note=51 velocity=96 time=0>,
 <message note_off channel=0 note=51 velocity=96 time=9240>,
 <message note_on channel=0 note=51 velocity=94 time=0>,
 <message note_off channel=0 note=51 velocity=94 time=43890>,
 <message note_on channel=0 note=51 velocity=104 time=0>,
 <message note_off channel=0 note=51 velocity=104 time=6930>,
 <message note_on channel=0 note=53 velocity=99 time=0>,
 <message note_off channel=0 note=53 velocity=99 time=6930>]

In [5]:
# len([msg for msg in midi_track if msg.type=="note_on" and msg.time>0])

In [6]:
# Create note on/off pairs
midi_note_pairs = [(midi_notes[i], midi_notes[i+1]) for i,_ in enumerate(midi_notes[:-1])
                    if midi_notes[i].type=="note_on" and midi_notes[i+1].type=="note_off"
                    and midi_notes[i].note == midi_notes[i+1].note]
len(midi_note_pairs)

635

In [7]:
# Normalize note velocities
# TODO: Play with normalizing other parameters
for note_on, note_off in midi_note_pairs:
    note_on.velocity = note_on.velocity - (note_on.velocity % 10)
set([note_on.velocity for note_on, note_off in midi_note_pairs])

{0, 60, 70, 80, 90, 100, 110, 120}

In [8]:
midi_note_pairs[:10]

[(<message note_on channel=0 note=51 velocity=100 time=76230>,
  <message note_off channel=0 note=51 velocity=100 time=16170>),
 (<message note_on channel=0 note=51 velocity=90 time=0>,
  <message note_off channel=0 note=51 velocity=96 time=9240>),
 (<message note_on channel=0 note=51 velocity=90 time=0>,
  <message note_off channel=0 note=51 velocity=94 time=43890>),
 (<message note_on channel=0 note=51 velocity=100 time=0>,
  <message note_off channel=0 note=51 velocity=104 time=6930>),
 (<message note_on channel=0 note=53 velocity=90 time=0>,
  <message note_off channel=0 note=53 velocity=99 time=6930>),
 (<message note_on channel=0 note=54 velocity=110 time=0>,
  <message note_off channel=0 note=54 velocity=111 time=6930>),
 (<message note_on channel=0 note=53 velocity=100 time=0>,
  <message note_off channel=0 note=53 velocity=108 time=4620>),
 (<message note_on channel=0 note=54 velocity=100 time=0>,
  <message note_off channel=0 note=54 velocity=108 time=4620>),
 (<message note_

In [9]:
# Create note set
# note_events_keys = ("type", "pitch", "velocity", "duration")
# note_events = [(note.type, note.note, note.velocity, note.time) for note in midi_notes]

note_events_keys = ("noteon_pitch", "noteon_velocity", "noteon_time", "noteoff_time") # Don't use note off velocity to shrink possibilities, and don't use note off pitch because it's the same as note on pitch
note_events = [(note_on.note, note_on.velocity, note_on.time, note_off.time)
               for note_on, note_off in midi_note_pairs]

note_set = sorted(list(set(note_events)))
num_note_events = len(note_events)
num_unique_notes = len(note_set)
print("{} unique notes in note set (vs. {} note events in MIDI file)".format(num_unique_notes, num_note_events))
note_set[:10]

365 unique notes in note set (vs. 635 note events in MIDI file)


[(46, 100, 0, 27720),
 (48, 80, 0, 6930),
 (48, 90, 0, 6930),
 (48, 90, 0, 13860),
 (48, 100, 0, 6930),
 (48, 100, 0, 9240),
 (48, 100, 0, 16170),
 (48, 100, 0, 23100),
 (49, 80, 0, 6930),
 (49, 80, 0, 7919)]

In [10]:
# len([note for note in note_set if note[0] == "note_off"])

In [11]:
# Make map for note to integer
note_to_int = dict((n, i) for i, n in enumerate(note_set))
{list(note_to_int.keys())[0]: note_to_int[list(note_to_int.keys())[0]]}

{(65, 100, 0, 18480): 321}

In [12]:
# Make map for integer back to note (we'll need this in the generation phase)
int_to_note = dict((i, n) for i, n in enumerate(note_set))
{list(int_to_note.keys())[0]: int_to_note[list(int_to_note.keys())[0]]}

{0: (46, 100, 0, 27720)}

In [13]:
# Split into subsequences
# TODO: Play with sequence lengths (for both input and outputs)
seq_length = 10
data_input = [] # "X"
data_output = [] # "y"
for i in range(num_note_events-seq_length):
    seq_input = note_events[i:i+seq_length]
    seq_output = note_events[i+seq_length]
    data_input.append([note_to_int[note] for note in seq_input])
    data_output.append(note_to_int[seq_output])
num_seqs = len(data_input)
print("{} sequences".format(num_seqs))
print("{} ==> {}".format(data_input[0], data_output[0]))
data_input[:5]

625 sequences
[47, 41, 43, 45, 74, 109, 80, 98, 78, 42] ==> 193


[[47, 41, 43, 45, 74, 109, 80, 98, 78, 42],
 [41, 43, 45, 74, 109, 80, 98, 78, 42, 193],
 [43, 45, 74, 109, 80, 98, 78, 42, 193, 194],
 [45, 74, 109, 80, 98, 78, 42, 193, 194, 185],
 [74, 109, 80, 98, 78, 42, 193, 194, 185, 26]]

In [14]:
# Reshape input sequences into form [samples, time steps, features]
X = np.reshape(data_input, (num_seqs, seq_length, 1))

# Normalize to 0-1 range
X = X / float(num_unique_notes)

# Convert output to one-hot encoding
y = keras.utils.np_utils.to_categorical(data_output)

In [15]:
print(X[0])
print("==>")
print(y[0])

[[ 0.12876712]
 [ 0.11232877]
 [ 0.11780822]
 [ 0.12328767]
 [ 0.20273973]
 [ 0.29863014]
 [ 0.21917808]
 [ 0.26849315]
 [ 0.21369863]
 [ 0.11506849]]
==>
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  

### Define the LSTM model

In [16]:
# Remembering what our shape is
"X.shape = {}, y.shape = {}".format(X.shape, y.shape)

'X.shape = (625, 10, 1), y.shape = (625, 365)'

In [17]:
# Define the model
model = keras.models.Sequential()
model.add(keras.layers.LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.LSTM(256))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(y.shape[1], activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam")

In [18]:
# Setup checkpoints
curr_datetime = str(datetime.datetime.now())
curr_datetime = re.sub("\W+", "", curr_datetime)
checkpoint_path = "weights_" + str(tune_name) + "_" + str(curr_datetime) + "_{epoch:02d}_{loss:.4f}.hdf5"
checkpoint = keras.callbacks.ModelCheckpoint(checkpoint_path, monitor="loss", verbose=1, save_best_only=True, mode="min")
callbacks = [checkpoint]

In [19]:
# Set up model fit parameters
# TODO: Play with these parameters, of course
num_epochs = 200
batch_size = 32

In [20]:
# Fit the model (i.e. train the network)!
if should_fit_model:
    model.fit(X, y, epochs=num_epochs, batch_size=batch_size, callbacks=callbacks)
else:
    print("Not fitting model; we'll use existing weights for {} instead".format(tune_name))

Not fitting model; we'll use existing weights for ColemanHawkins_BodyAndSoul_FINAL instead


### Generate output notes

In [21]:
# Load network weights and recompile (if we didn't already fit the model)
# NOTE: Make sure these weights fit the model that has been defined

if not should_fit_model:
    # Anthropology - Art Pepper
    # weights_filename = "weights_99_0.9724.hdf5" # Using only note ons
    # weights_filename = "weights_99_1.3571.hdf5" # Using both note ons and note offs
    # weights_filename = "weights_95_1.4241.hdf5" # Using note on/off pairs
    # weights_filename = "weights_97_1.4300.hdf5" # Using note on/off pairs without note off velocity
    
    # Body and Soul - Coleman Hawkins
    weights_filename = "weights_ColemanHawkins_BodyAndSoul_FINAL_20170702124647869274_00_5.8874.hdf5" # 1 epoch
    weights_filename = "weights_ColemanHawkins_BodyAndSoul_FINAL_20170702124647869274_09_5.4606.hdf5" # 10 epochs
    weights_filename = "weights_ColemanHawkins_BodyAndSoul_FINAL_20170702124647869274_49_2.3032.hdf5" # 50 epochs
    weights_filename = "weights_ColemanHawkins_BodyAndSoul_FINAL_20170702124647869274_99_0.8517.hdf5" # 100 epochs
    #weights_filename = "weights_ColemanHawkins_BodyAndSoul_FINAL_20170702132955060412_189_0.2080.hdf5" # 200 epochs
    #weights_filename = "weights_ColemanHawkins_BodyAndSoul_FINAL_20170702132955060412_452_0.0212.hdf5" # 500 epochs

#     # Oleo - John Coltrane
#     weights_filename = "weights_JohnColtrane_Oleo_FINAL_20170702100647144230_00_5.5314.hdf5" # 1 epoch
#     weights_filename = "weights_JohnColtrane_Oleo_FINAL_20170702100647144230_09_5.0914.hdf5" # 10 epochs
#     weights_filename = "weights_JohnColtrane_Oleo_FINAL_20170702100647144230_49_3.0890.hdf5" # 50 epochs
#     weights_filename = "weights_JohnColtrane_Oleo_FINAL_20170702100647144230_99_1.0946.hdf5" # 100 epochs
#     # weights_filename = "weights_JohnColtrane_Oleo_FINAL_20170702100647144230_195_0.2916.hdf5" # 200 epochs
#     # weights_filename = "weights_JohnColtrane_Oleo_FINAL_20170702100647144230_347_0.1016.hdf5" # 350 epochs

#     # Oleo - Miles Davis
#     weights_filename = "weights_MilesDavis_Oleo-1_FINAL_20170702105943022387_00_4.9016.hdf5" # 1 epoch
#     weights_filename = "weights_MilesDavis_Oleo-1_FINAL_20170702105943022387_09_4.6219.hdf5" # 10 epochs
#     weights_filename = "weights_MilesDavis_Oleo-1_FINAL_20170702105943022387_48_2.3996.hdf5" # 50 epochs
# #     weights_filename = "weights_MilesDavis_Oleo-1_FINAL_20170702105943022387_96_1.1572.hdf5" # 100 epochs
    
#     # Oleo - Red Garland
# #     weights_filename = "weights_RedGarland_Oleo_FINAL_20170702110744825363_00_5.2685.hdf5" # 1 epoch
# #     weights_filename = "weights_RedGarland_Oleo_FINAL_20170702110744825363_09_4.8339.hdf5" # 10 epochs
# #     weights_filename = "weights_RedGarland_Oleo_FINAL_20170702110744825363_48_2.6569.hdf5" # 50 epochs
# #     weights_filename = "weights_RedGarland_Oleo_FINAL_20170702110744825363_98_1.1915.hdf5" # 100 epochs


    # Update tune name
    tune_name = weights_filename.replace("weights_", "").replace(".hdf5", "")
    #re.sub("_\d+_\d+_\d+\.\d+\.hdf5", "", tune_name)
    #tune_name = re.sub("FINAL_\d+", "FINAL", tune_name)

    model.load_weights(weights_filename)
    model.compile(loss="categorical_crossentropy", optimizer="adam")

# Print out a summary of the model
print(tune_name)
model.summary()

ColemanHawkins_BodyAndSoul_FINAL_20170702124647869274_99_0.8517
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 10, 256)           264192    
_________________________________________________________________
dropout_1 (Dropout)          (None, 10, 256)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 256)               525312    
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 365)               93805     
Total params: 883,309
Trainable params: 883,309
Non-trainable params: 0
_________________________________________________________________


In [22]:
# Start with a random seed
seq_in = data_input[np.random.randint(num_seqs)]
# seq_in = data_input[0][:10] # Force seed to first seq
# seq_in = [18,18,18,18,18,18,18,18,18,18]
[int_to_note[i] for i in seq_in]

[(57, 90, 0, 6930),
 (54, 100, 0, 6930),
 (49, 80, 0, 6930),
 (50, 80, 0, 6930),
 (54, 100, 0, 6930),
 (56, 90, 0, 6930),
 (53, 70, 0, 34650),
 (56, 110, 0, 6930),
 (59, 100, 0, 6930),
 (62, 110, 0, 6930)]

In [23]:
seq_in_notes = [int_to_note[i] for i in seq_in]
[dict((note_events_keys[i], note[i]) for i,_ in enumerate(note)) for note in seq_in_notes][0]

{'noteoff_time': 6930,
 'noteon_pitch': 57,
 'noteon_time': 0,
 'noteon_velocity': 90}

In [24]:
# Generate the notes!
num_notes_to_generate = 100
notes_out = []
notes_out.extend(seq_in_notes) # Add first sequence to output

for i in range(num_notes_to_generate):
    # Reshape and normalize
    x = np.reshape(seq_in, (1, len(seq_in), 1)) # Reshape
    x = x / float(num_unique_notes) # Normalize
    
    # Make the prediction
    pred = model.predict(x, batch_size=batch_size, verbose=0)
    
    # Get output note
    note_idx = np.argmax(pred)
    note = int_to_note[note_idx]
    
    # Add output note to list
    notes_out.append(note)
    
    # Add output note to input sequence, and move forward by one note
    seq_in.append(note_idx) 
    seq_in = seq_in[1:len(seq_in)]

notes_out[:20]

[(57, 90, 0, 6930),
 (54, 100, 0, 6930),
 (49, 80, 0, 6930),
 (50, 80, 0, 6930),
 (54, 100, 0, 6930),
 (56, 90, 0, 6930),
 (53, 70, 0, 34650),
 (56, 110, 0, 6930),
 (59, 100, 0, 6930),
 (62, 110, 0, 6930),
 (65, 100, 0, 6930),
 (62, 110, 0, 6930),
 (59, 90, 0, 6930),
 (56, 100, 0, 6930),
 (53, 90, 0, 6930),
 (50, 90, 0, 6930),
 (53, 80, 0, 6930),
 (63, 100, 0, 6930),
 (66, 90, 0, 6930),
 (63, 100, 0, 6930)]

In [25]:
# Convert the sequence of note tuples into a sequence of MIDI notes, and then write to MIDI file

# Create MIDI file and track
midi_file_out = mido.MidiFile()
midi_track_out = mido.MidiTrack()
midi_file_out.tracks.append(midi_track_out)

# Append "headers" (track name, tempo, key, time signature)
for message in midi_track[:4]:
    midi_track_out.append(message)

# Add notes
prev_time = 0
prev_note = 0

# Note times get all bunched together, so we stretch them out a little bit manually here...
time_multiplier = 2 # Art Pepper - Anthropology
time_multiplier = 0.02 # Coleman Hawkins - Body and Soul

for note in notes_out:
    ## Note ons only
    #curr_time = prev_time + note[2]
    #prev_note = note[0]
    #prev_time = curr_time
    #message_noteoff = mido.Message("note_off", note=prev_note, velocity=0, time=curr_time) # Prev note off
    #message_noteon = mido.Message("note_on", note=note[0], velocity=note[1], time=curr_time) # Curr note on
    #midi_track_out.append(message_noteoff)
    #midi_track_out.append(message_noteon)
    
    ## Note ons and note offs 
    #curr_time = prev_time + note[3] if note[0]=="note_on" else prev_time
    #curr_time = prev_time + note[3]
    #prev_time = curr_time
    #message = mido.Message(note[0], note=note[1], velocity=note[2], time=curr_time)
    #midi_track_out.append(message)
    
    # Note on/off pairs
    note = dict((note_events_keys[i], note[i]) for i,_ in enumerate(note))
    curr_time_noteon = prev_time + int(note["noteon_time"] * time_multiplier)
    curr_time_noteoff = prev_time + int(note["noteoff_time"] * time_multiplier)
    #prev_time = curr_time_noteoff
    message_noteon = mido.Message("note_on", note=note["noteon_pitch"], velocity=note["noteon_velocity"], time=curr_time_noteon)
    message_noteoff = mido.Message("note_off", note=note["noteon_pitch"], velocity=note["noteon_velocity"], time=curr_time_noteoff)
    midi_track_out.append(message_noteon)
    midi_track_out.append(message_noteoff)
    
# Save file to disk
curr_datetime = str(datetime.datetime.now())
curr_datetime = re.sub("\W+", "", curr_datetime)
filename_out = "../data/out_{}_{}.mid".format(tune_name, curr_datetime)
midi_file_out.save(filename_out)

for message in midi_track_out[4:20]:
    print(message)

note_on channel=0 note=57 velocity=90 time=0
note_off channel=0 note=57 velocity=90 time=138
note_on channel=0 note=54 velocity=100 time=0
note_off channel=0 note=54 velocity=100 time=138
note_on channel=0 note=49 velocity=80 time=0
note_off channel=0 note=49 velocity=80 time=138
note_on channel=0 note=50 velocity=80 time=0
note_off channel=0 note=50 velocity=80 time=138
note_on channel=0 note=54 velocity=100 time=0
note_off channel=0 note=54 velocity=100 time=138
note_on channel=0 note=56 velocity=90 time=0
note_off channel=0 note=56 velocity=90 time=138
note_on channel=0 note=53 velocity=70 time=0
note_off channel=0 note=53 velocity=70 time=693
note_on channel=0 note=56 velocity=110 time=0
note_off channel=0 note=56 velocity=110 time=138


### Audio links

Using Art Pepper - Anthropology as the input melody, with random seed:
- 100 epochs: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-100-epochs

Using Coleman Hawkins - Body and Soul as the input melody, seeded with first 10 notes of original sequence:
- 1 epoch: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-coleman-hawkins-body-and-soul-1-epoch
- 10 epochs: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-coleman-hawkins-body-and-soul-10-epochs
- 50 epochs: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-coleman-hawkins-body-and-soul-50-epochs
- 100 epochs: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-coleman-hawkins-body-and-soul-100-epochs
- 200 epochs: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-coleman-hawkins-body-and-soul-200-epochs
- 500 epochs: https://soundcloud.com/usdivad/jazz-ai-experiments-lstm-single-melody-coleman-hawkins-body-and-soul-500-epochs


You can hear a clear progression of improvement as the model is better able to represent the melodic attributes of the solo as epochs increase, until around 200 epochs where it begins overfitting.