# Generating music

To generate music, I will use the **transformer** model, used by OpenAI in their GPT-2 model. In prior iterations, I had considered using RNNs with the seq2seq model. But after further research, I discovered that the transformer model can achieve better performance and accuracy, with the added benefit of parallelization, albeit at the cost of memory. 

In [1]:
import mido
import os
import math
import csv
import matplotlib.pyplot as plt
from itertools import chain, islice
from functools import cmp_to_key
from copy import deepcopy
from fractions import Fraction
import json
import pprint
import numpy as np

## Data Preparation

In [2]:
DATA_DIR = '../../data'

The Test Score: <br>
<img height=800 width=600 src="https://imslp.org/images/8/8b/TN-Schumann%2C_Robert_Werke_Breitkopf_Gregg_Serie_7_Band_2_RS_51_Op_13_scan.jpg"></img> <br>
Robert Schumann,Symphonic Etudes Op. 13 (with Posthumous variations)


In [3]:
# define function to play music
TEST_FILE = f'{DATA_DIR}/maestro-v3.0.0/2018/MIDI-Unprocessed_Recital20_MID--AUDIO_20_R1_2018_wav--4.midi'
def play_midi(file_name=TEST_FILE):
    os.startfile(os.path.abspath(file_name))

In [4]:
relevant_msgs = { 'note_on', 'note_off', 'control_change'}

is_note_on = lambda msg: msg.type == 'note_on' and msg.velocity > 0
is_note_off = lambda msg: (msg.type == 'note_on' and msg.velocity == 0) or msg.type == 'note_off'
is_relevant_msg = lambda msg: msg.type in relevant_msgs

In [5]:
def get_pitch(note: int):
    notes = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
    return f"{notes[note % 12]}{note // 12 - 1}"

In [6]:
# make sure to iterate over tracks for correct deltatime value (ticks), otherwise deltatime is seconds. This is also faster.
def print_messages(track, limit: int=0, *, transnote=None, transvelocity=None, transtime=None):
    transnote = transnote or (lambda note: note) 
    transvelocity = transvelocity or (lambda velocity: velocity)
    transtime = transtime or (lambda time: time)
    track = islice(track, limit) if limit > 0 else track
    for event in track:
        if is_note_on(event):
            note, velocity, time = transnote(event.note), transvelocity(event.velocity), transtime(event.time)
            print(f'note_on channel={event.channel} note={note} velocity={velocity} time={time}')
        elif is_note_off(event):
            note, velocity, time = transnote(event.note), transvelocity(event.velocity), transtime(event.time)
            print(f'note_off channel={event.channel} note={note} time={time}')
        else:
            print(event)

In [7]:
schumann = mido.MidiFile(TEST_FILE)
for event in islice(schumann.tracks[1], 50):
    print(event)

<meta message track_name name='contestant 20' time=0>
program_change channel=0 program=0 time=0
control_change channel=0 control=64 value=127 time=0
control_change channel=0 control=67 value=0 time=0
note_on channel=0 note=73 velocity=64 time=775
note_on channel=0 note=64 velocity=41 time=41
note_on channel=0 note=68 velocity=46 time=4
note_on channel=0 note=37 velocity=39 time=8
note_on channel=0 note=61 velocity=35 time=9
note_on channel=0 note=44 velocity=30 time=8
note_on channel=0 note=73 velocity=0 time=222
note_on channel=0 note=68 velocity=0 time=45
note_on channel=0 note=64 velocity=0 time=31
note_on channel=0 note=61 velocity=0 time=113
note_on channel=0 note=68 velocity=67 time=365
note_on channel=0 note=56 velocity=45 time=28
note_on channel=0 note=64 velocity=44 time=1
note_on channel=0 note=61 velocity=44 time=14
note_on channel=0 note=68 velocity=0 time=252
note_on channel=0 note=64 velocity=0 time=50
note_on channel=0 note=61 velocity=0 time=129
note_on channel=0 note=5

In [8]:
schumann.ticks_per_beat

384

In [9]:
print_messages(schumann.tracks[1], 20, transnote=get_pitch)

<meta message track_name name='contestant 20' time=0>
program_change channel=0 program=0 time=0
control_change channel=0 control=64 value=127 time=0
control_change channel=0 control=67 value=0 time=0
note_on channel=0 note=C#5 velocity=64 time=775
note_on channel=0 note=E4 velocity=41 time=41
note_on channel=0 note=G#4 velocity=46 time=4
note_on channel=0 note=C#2 velocity=39 time=8
note_on channel=0 note=C#4 velocity=35 time=9
note_on channel=0 note=G#2 velocity=30 time=8
note_off channel=0 note=C#5 time=222
note_off channel=0 note=G#4 time=45
note_off channel=0 note=E4 time=31
note_off channel=0 note=C#4 time=113
note_on channel=0 note=G#4 velocity=67 time=365
note_on channel=0 note=G#3 velocity=45 time=28
note_on channel=0 note=E4 velocity=44 time=1
note_on channel=0 note=C#4 velocity=44 time=14
note_off channel=0 note=G#4 time=252
note_off channel=0 note=E4 time=50


### Converting Ticks to Beats

Time attribute represents <deltatime\>. <deltatime\> is represented as number of ticks before playing the message. The number of ticks per beat is defined in the MThd chunk as <division\>. (i.e. <division\> = 96 means 96 ticks per beat). The number of microseconds per beat is defined as $500,000 \frac{\mu s}{beat}$, or can be set in the meta message 'set_tempo' in each track.

So $time = 288$, $division = 384$, $tempo = 500,000$ equates to $\frac{500,000}{384} * 288 = 375,000 \mu s$

This is the ticks between the first note_on (C#5) and the corresponding note off

This is equivalent to .375 seconds, which is the deltatime value when using `midi.play()` or `iter(midi)`



 With $500,000 \frac{\mu s}{beat}$, BPM = 120. The denominator of the time signature tells what kind of note (quarter, eighth) is a beat. The numerator tells how many beats are in bar. With a time signature of 4/4, a beat is a quarter note. 

With $time = 288$, and $division = 384$, $288 \ \text{ticks} * \frac{1}{384} \frac{beat}{tick} = .75 \ \text{beats}$

This is equal to $375,000 \mu s * \frac{1}{500000} \frac{beat}{\mu s} = .75 \ \text{beats}$.

A time signature of 4 means $time = 288$ is .75 of a quarter note. However, from the image above, the first notes are quarter notes, not fractions of quarter notes, probably because the performance was played with a different BPM in mind (Andante, maybe 90), and not the one given in the midi file. 


### Quantization

Quantize notes so that notes will have deltatime corrected to the nearest multiple of $\epsilon$. A lower $\epsilon$ means a higher frequency, but also more off-beats. A greater $\epsilon$ means lower frequency, and more synchronization.

In [10]:
BEAT_RESOLUTION = 16

In [11]:
def _nearest_mult(val, multiple):
    temp = val + multiple / 2
    return temp - temp % multiple

def quantize(sequence, ticks_per_beat: int, resolution: int, seqlen: int=None) -> np.ndarray:
    """
    takes in an iterable of integers and snaps values to the nearest resolution multiple,
    so that each value is replaced with an integer in the range [0, resolution)
    params:
      sequence: sequence of midi message objects to quantize
      ticks_per_beat: ticks per beat as given in midi metadata
      resolution: desired quantization resolution
      key: optional callable to be called on each element in the sequence before binning
    returns:
      sequence of integers with original values snapped to nearest multiple
    """
    seqlen = len(sequence) if seqlen is None else seqlen
    tick_res = ticks_per_beat / resolution

    quantized = (int(_nearest_mult(val, tick_res)) for val in sequence)
    return np.fromiter(quantized, dtype=np.int32, count=seqlen)

In [12]:
relevant_schumann = [msg for msg in schumann.tracks[1] if not msg.is_meta]
quantized = quantize(list(map(lambda msg: msg.time, relevant_schumann)), 
                     resolution=BEAT_RESOLUTION, 
                     ticks_per_beat=schumann.ticks_per_beat)
quantized_schumann = [msg.copy(time=new_time) for msg, new_time in zip(relevant_schumann, quantized)]
print_messages(quantized_schumann, 20, transnote=get_pitch)

program_change channel=0 program=0 time=0
control_change channel=0 control=64 value=127 time=0
control_change channel=0 control=67 value=0 time=0
note_on channel=0 note=C#5 velocity=64 time=768
note_on channel=0 note=E4 velocity=41 time=48
note_on channel=0 note=G#4 velocity=46 time=0
note_on channel=0 note=C#2 velocity=39 time=0
note_on channel=0 note=C#4 velocity=35 time=0
note_on channel=0 note=G#2 velocity=30 time=0
note_off channel=0 note=C#5 time=216
note_off channel=0 note=G#4 time=48
note_off channel=0 note=E4 time=24
note_off channel=0 note=C#4 time=120
note_on channel=0 note=G#4 velocity=67 time=360
note_on channel=0 note=G#3 velocity=45 time=24
note_on channel=0 note=E4 velocity=44 time=0
note_on channel=0 note=C#4 velocity=44 time=24
note_off channel=0 note=G#4 time=264
note_off channel=0 note=E4 time=48
note_off channel=0 note=C#4 time=120


In [13]:
schumann_copy = deepcopy(schumann)
schumann_copy.tracks[1] = mido.MidiTrack(quantized_schumann)
schumann_copy

<midi file '../../data/maestro-v3.0.0/2018/MIDI-Unprocessed_Recital20_MID--AUDIO_20_R1_2018_wav--4.midi' type 1, 2 tracks, 108941 messages>

In [14]:
schumann_copy.save('schumann.midi')

## Moving Away from Midi

In [15]:
def rest(beats: int):
    return f'rest_b:{beats}'

def note(pitch: int, velocity: int, instrument: str='piano'):
    return f'note_p:{pitch}_v:{velocity}_i:{instrument}'

def control(control: int, value: int):
    return f'control_c:{control}_v:{value}'

MESSAGE_REF = {
  'note_on': note,
  'note_off': note,
  'control_change': control
}

def _get_message(msg):
    """
    constructs simple two byte messages (Control and Note). Doesn't 
    handle instruments. Doesn't handle note offs with non zero velocity
    """
    msg_data = msg.bytes()[1:]
    if msg.type == 'note_off':
        msg_data[-1] = 0
    msg_obj = MESSAGE_REF.get(msg.type, None)
    return msg_obj(*msg_data) if msg_obj is not None else msg_obj

### Converting to Notes

In [37]:
def gen_messages(track, tick_per_beat, resolution):
    accum_time = 0
    for msg in track:
        accum_time += msg.time
        msg_obj = _get_message(msg)
        if msg_obj is not None:
            accum_beats = int(accum_time * resolution / tick_per_beat) # this eq can be factored into quantize method
            excess_beats = accum_beats % resolution
            for i in range(accum_beats // resolution):
                yield rest(resolution)
            if excess_beats != 0:
                yield rest(excess_beats)
            accum_time = 0
            yield msg_obj

def to_beats(track, tick_per_beat, resolution):
    return list(gen_messages(track, tick_per_beat, resolution))

In [38]:
relevant_schumann_copy = [msg for msg in schumann_copy.tracks[1] if is_relevant_msg(msg)]
schumann_seq = to_beats(track=relevant_schumann_copy, resolution=16, tick_per_beat=schumann_copy.ticks_per_beat)
schumann_seq[:30]

['control_c:64_v:127',
 'control_c:67_v:0',
 'rest_b:16',
 'rest_b:16',
 'note_p:73_v:64_i:piano',
 'rest_b:2',
 'note_p:64_v:41_i:piano',
 'note_p:68_v:46_i:piano',
 'note_p:37_v:39_i:piano',
 'note_p:61_v:35_i:piano',
 'note_p:44_v:30_i:piano',
 'rest_b:9',
 'note_p:73_v:0_i:piano',
 'rest_b:2',
 'note_p:68_v:0_i:piano',
 'rest_b:1',
 'note_p:64_v:0_i:piano',
 'rest_b:5',
 'note_p:61_v:0_i:piano',
 'rest_b:15',
 'note_p:68_v:67_i:piano',
 'rest_b:1',
 'note_p:56_v:45_i:piano',
 'note_p:64_v:44_i:piano',
 'rest_b:1',
 'note_p:61_v:44_i:piano',
 'rest_b:11',
 'note_p:68_v:0_i:piano',
 'rest_b:2',
 'note_p:64_v:0_i:piano']

The first value in each tuple represents the number of beats for that note is played. Because notes were quantized with a 64th note resolution and beats set to quarter notes, each note will have a time that is some multiple of $\frac{1}{16}$. 

> When quantizing, each tick value was adjusted so that $$\text{ticks}' = x * \frac{\text{ticks_per_beat}}{\text{resolution}}$$When converting to beats, $\text{ticks}'$ is divided by ticks_per_beat. $$\text{beats} = \frac{\text{ticks}'}{\text{ticks_per_beat}} = x * \frac{\text{ticks_per_beat}}{\text{resolution} * \text{ticks_per_beat}} = x * \frac{1}{\text{resolution}}$$

Using this approach, we will have 16 waits defined in the vocabulary (wait:[1-16]). Waits longer than 1 beat will be recorded as (wait:16, wait:n). However, a constant resolution needs to be set for all songs. Test quantizing a faster song.

### Binning velocities

In [56]:
def _get_bin(value, bin_size: int, max_val: int, min_val: int) -> int:
    if value >= max_val:
        return bin_size + 1
    elif value < min_val:
        return 0
    else:
        return int((value - min_val) // bin_size) + 1

def bin_seq(sequence, num_bins: int, max_val: int, min_val: int, seqlen: int=None) -> np.ndarray:
    """
    takes in an iterable of numbers and assigns an integer label corresponding to the bin it falls
    in. Values greater than max_val are given a bin label of num_bins + 1, and values below min_val
    are given bin label 0.
    """
    bin_size = (max_val - min_val) / num_bins
    seqlen = len(sequence) if seqlen is None else seqlen

    binned = (_get_bin(val, bin_size, max_val, min_val) for val in sequence)
    return np.fromiter(binned, dtype=np.int32, count=seqlen)

In [57]:
NUM_BINS = 32
MAX_VAL = 128
MIN_VAL = 1
BIN_SIZE = (MAX_VAL - MIN_VAL) / NUM_BINS
all_bins = [f'[{(i * BIN_SIZE) + MIN_VAL}, {(i * BIN_SIZE) + MIN_VAL + BIN_SIZE})' for i in range(32)]
schumann_to_bin = [msg for msg in relevant_schumann_copy if msg.type == 'note_on']
binned_schumann = bin_seq([msg.velocity for msg in schumann_to_bin], NUM_BINS, MAX_VAL, MIN_VAL)
zipped = zip(binned_schumann[:30], schumann_to_bin[:30])
print(f'max_val: {MAX_VAL}')
print(f'min_val: {MIN_VAL}')
print(f'num_bins: {NUM_BINS}')
print(f'bin_size: {BIN_SIZE}')
print('all_bins:')
for idx, row in enumerate(all_bins):
    print(f'\t{idx + 1}: {row}')
for row in zipped:
    print(row)

max_val: 128
min_val: 1
num_bins: 32
bin_size: 3.96875
all_bins:
	1: [1.0, 4.96875)
	2: [4.96875, 8.9375)
	3: [8.9375, 12.90625)
	4: [12.90625, 16.875)
	5: [16.875, 20.84375)
	6: [20.84375, 24.8125)
	7: [24.8125, 28.78125)
	8: [28.78125, 32.75)
	9: [32.75, 36.71875)
	10: [36.71875, 40.6875)
	11: [40.6875, 44.65625)
	12: [44.65625, 48.625)
	13: [48.625, 52.59375)
	14: [52.59375, 56.5625)
	15: [56.5625, 60.53125)
	16: [60.53125, 64.5)
	17: [64.5, 68.46875)
	18: [68.46875, 72.4375)
	19: [72.4375, 76.40625)
	20: [76.40625, 80.375)
	21: [80.375, 84.34375)
	22: [84.34375, 88.3125)
	23: [88.3125, 92.28125)
	24: [92.28125, 96.25)
	25: [96.25, 100.21875)
	26: [100.21875, 104.1875)
	27: [104.1875, 108.15625)
	28: [108.15625, 112.125)
	29: [112.125, 116.09375)
	30: [116.09375, 120.0625)
	31: [120.0625, 124.03125)
	32: [124.03125, 128.0)
(16, <message note_on channel=0 note=73 velocity=64 time=768>)
(11, <message note_on channel=0 note=64 velocity=41 time=48>)
(12, <message note_on channel=0 note=

## Vocabulary

The full vocabulary will consist of 16 waits (or any power of 2 depending on what resolution I go with), and 128 notes, each with 32 volume levels, each with 6 instruments. A total size of 24,592.

Volumes will be binned for less noise, and to account for variability in performances.

For now I'll just stick with piano, so 88 notes (A0-C8), 32 volumes, 1 instrument, 16 waits. A total size of 2832. 

## Metadata + Features

### Additional Features

Aside from the vocabulary, I will train the model with the following features:

- [x] composer
- ~~key signature~~ (data not available)
- [x] tempo
- [x] time period/style


In [18]:
with open(f'{DATA_DIR}/maestro-v3.0.0/maestro-v3.0.0.csv', encoding='utf-8') as csvfile:
    reader = list(csv.DictReader(csvfile))

In [19]:
with open(f'{DATA_DIR}/metadata/composers.json') as composer_file:
    composers = json.load(composer_file)

For composer searching, I will need to clean up the csv file to make sure that composer names match up with the names given in the composers file.

### Epoch

In [20]:
def epoch(complete_name: str):
    result = [composer for composer in composers if composer['complete_name'].lower() == complete_name.lower()]
    if len(result) > 0:
        return result[0]['epoch']
    return None

In [21]:
epoch('Leoš Janáček')

'Late Romantic'

### Composers

In [22]:
# Unique Composers
unique_composers = set()
for row in reader:
    unique_composers.add(row['canonical_composer'])
unique_composers

{'Alban Berg',
 'Alexander Scriabin',
 'Antonio Soler',
 'Carl Maria von Weber',
 'Charles Gounod / Franz Liszt',
 'Claude Debussy',
 'César Franck',
 'Domenico Scarlatti',
 'Edvard Grieg',
 'Felix Mendelssohn',
 'Felix Mendelssohn / Sergei Rachmaninoff',
 'Franz Liszt',
 'Franz Liszt / Camille Saint-Saëns',
 'Franz Liszt / Vladimir Horowitz',
 'Franz Schubert',
 'Franz Schubert / Franz Liszt',
 'Franz Schubert / Leopold Godowsky',
 'Fritz Kreisler / Sergei Rachmaninoff',
 'Frédéric Chopin',
 'George Enescu',
 'George Frideric Handel',
 'Georges Bizet / Ferruccio Busoni',
 'Georges Bizet / Moritz Moszkowski',
 'Georges Bizet / Vladimir Horowitz',
 'Giuseppe Verdi / Franz Liszt',
 'Henry Purcell',
 'Isaac Albéniz',
 'Isaac Albéniz / Leopold Godowsky',
 'Jean-Philippe Rameau',
 'Johann Christian Fischer / Wolfgang Amadeus Mozart',
 'Johann Pachelbel',
 'Johann Sebastian Bach',
 'Johann Sebastian Bach / Egon Petri',
 'Johann Sebastian Bach / Ferruccio Busoni',
 'Johann Sebastian Bach / Fr

There is a possibility of multiple composers. When this happens, choose the more common composer.

### CSV and Composer Cross Reference

In [23]:
composers_both_present = dict()
for u_composer in unique_composers:
    split_composers = u_composer.split(' / ')
    for comp_name in split_composers:
        comp_entry = [entry for entry in composers if entry['complete_name'] == comp_name]
        comp_entry = comp_entry[0] if len(comp_entry) > 0 else None
        composers_both_present[comp_name] = comp_entry
pprint.pprint(composers_both_present)

{'Alban Berg': {'birth': '1885-01-01',
                'complete_name': 'Alban Berg',
                'death': '1935-01-01',
                'epoch': '20th Century',
                'id': '210',
                'name': 'Berg',
                'portrait': 'https://assets.openopus.org/portraits/48656640-1568084861.jpg'},
 'Alexander Scriabin': {'birth': '1872-01-01',
                        'complete_name': 'Alexander Scriabin',
                        'death': '1915-01-01',
                        'epoch': 'Late Romantic',
                        'id': '18',
                        'name': 'Scriabin',
                        'portrait': 'https://assets.openopus.org/portraits/33736318-1568084946.jpg'},
 'Alfred Grünfeld': None,
 'Antonio Soler': None,
 'Camille Saint-Saëns': {'birth': '1835-01-01',
                         'complete_name': 'Camille Saint-Saëns',
                         'death': '1921-01-01',
                         'epoch': 'Romantic',
                         'id': '4

In [24]:
composers_not_present = [entry for entry in composers_both_present if composers_both_present[entry] is None]
print(f"Number not found in composer file: {len(composers_not_present)}")
composers_not_present

Number not found in composer file: 18


['Vladimir Horowitz',
 'Leopold Godowsky',
 'Johann Strauss',
 'Alfred Grünfeld',
 'György Cziffra',
 'Fritz Kreisler',
 'Egon Petri',
 'Mikhail Pletnev',
 'Moritz Moszkowski',
 'Myra Hess',
 'Joseph Haydn',
 'Mikhail Glinka',
 'Antonio Soler',
 'Johann Christian Fischer',
 'Muzio Clementi',
 'Orlando Gibbons',
 'Nikolai Medtner',
 'Vyacheslav Gryaznov']

Some of these are surprising. For example Joseph Haydn is a very famous classical composer. Upon looking at the composers.json file, we can see that the names differ. Joseph Haydn is Joseph Franz Haydn in the composers.json file.

In [25]:
approximate_composer_matches = dict()
for not_present in composers_not_present:
    last_name = not_present.split()[-1]
    comp_entry = [entry for entry in composers if last_name in entry['complete_name']]
    comp_entry = comp_entry[0] if len(comp_entry) > 0 else None
    approximate_composer_matches[not_present] = comp_entry
pprint.pprint(approximate_composer_matches)
print('\n')
print(f'composers still not present: {sorted([comp_name for comp_name, entry in approximate_composer_matches.items() if entry is None])}')

{'Alfred Grünfeld': None,
 'Antonio Soler': None,
 'Egon Petri': None,
 'Fritz Kreisler': None,
 'György Cziffra': None,
 'Johann Christian Fischer': None,
 'Johann Strauss': {'birth': '1825-01-01',
                    'complete_name': 'Johann Strauss Jr',
                    'death': '1899-01-01',
                    'epoch': 'Romantic',
                    'id': '165',
                    'name': 'Strauss Jr',
                    'portrait': 'https://assets.openopus.org/portraits/93853123-1568084951.jpg'},
 'Joseph Haydn': {'birth': '1732-01-01',
                  'complete_name': 'Franz Joseph Haydn',
                  'death': '1809-01-01',
                  'epoch': 'Classical',
                  'id': '208',
                  'name': 'Haydn',
                  'portrait': 'https://assets.openopus.org/portraits/21056059-1568084909.jpg'},
 'Leopold Godowsky': None,
 'Mikhail Glinka': {'birth': '1804-01-01',
                    'complete_name': 'Mikhail Ivanovich Glinka',
          

At the end of it all, there's still some missing composers, which we'll just label as 'Other.'

Let's also see some more metrics about composer **last names** in the `composers.json` file.

In [26]:
unique_composers_in_json = set(comp['name'] for comp in composers)
all_lastnames_single_word = all(len(comp['name'].split()) == 0 for comp in composers)
print(f'All unique last names: {len(unique_composers_in_json) == len(composers)}')
print(f'All last names single word: {all_lastnames_single_word}')

All unique last names: True
All last names single word: False


Which last names aren't a single word?

In [27]:
lastnames_multi_words = [comp['name'] for comp in composers if comp['name'].count(' ') != 0]
print(f'last names not single word: {lastnames_multi_words}')

last names not single word: ['Marcello, A.', 'Scarlatti, A.', 'Bach, C.P.E.', 'Bach, J.C.', 'Strauss Jr', 'Vaughan Williams', 'Braga Santos', 'Camargo Guarnieri']


## Test playback

In [31]:
# Test playback original
play_midi()

In [32]:
# Test playback quantized
play_midi('schumann.midi')