**Assignment 1: Generating "Human-Like Rhythms from a given corpus**

**Part 1: Preprocessing - Parsing the dataset**

**Goal:** Parse the abc music files to only get the elements related to the rhythm. We will use music21 to obtain, from each piece's abc file, a list of bars with each bar containing the beats on which a note appears. Then, we will create a Dataframe of all the pieces encoded in list form, which we can split to use as a training set and a test set.

In [24]:
# Imports
import pandas as pd
import music21
import os
import textract
import pathlib

In [26]:
# Function that, from an abc file, gets the beat distribution of every bar of the piece
# Returns an array (piece) of arrays (bars)
def extractBeatArray(path):

    # getting the parsable score
    sample = music21.converter.parse(path)
    part = sample.parts[0]
    notes = part.recurse().notes
    length_in_bars = int(sample.duration.quarterLength/sample.getTimeSignatures()[0].numerator) 
    # should be numerator*(denominator/4), but here denominator is already 4 since all pieces are in 3/4
    
    bars = []

    # iteratively adding every beat of every bar. Each bar is a list and the piece is a list of bars i.e. a list of lists
    for i in range(0, length_in_bars):
        if part.measure(i) is not None:
            notes = part.measure(i).notes
            beats = []
            for n in notes:
                beats.append(n.beat)
                    
        bars.append(beats)
        i += 1

    return bars

In [3]:
# Parsing the dataset with the above function

data = pd.DataFrame(columns=['piece', 'notes'])

path = "./abc/_1814_698087.abc"

directory = './abc'

pieces = 0 # check every piece in the directory (599) is encoded

for name in os.listdir(directory):
    pieces += 1
    path = directory + '/' + name 
    bars = []
    try:
        bars = extractBeatArray(path)
    except:
        bars = []

    if (bars != []): # not adding unparsed pieces to dataframe
        data.loc[len(data)] = [path, bars]


print(f"{pieces} pieces were analysed.")    

print(f"{len(data)} pieces were correctly parsed.")

599 pieces were analysed.
585 pieces were correctly parsed.


In [4]:
# List of pieces that throw an exception (9)
exceptions = ['./abc/Näckapolska_efter_Anders_Bredal_c55b3d.abc', './abc/Pollonesse_ur_Andreas_Dahlgrens_notbok_no_84_f2a90d.abc',
'./abc/Polonäs_a39d56.abc', './abc/Polonäs_efter_Pehr_Andersson_Bild_30_nr_90_d06dcf.abc', './abc/Slängpolska_efter_Olof_Larsson_2f914f.abc',
'./abc/_Polonesse_in_G_Moll_no_1_ur_Anders_Larssons_notbok_61aa5e.abc', './abc/_Polonesse_in_G_Måll_no_2_769438.abc', 
'./abc/_Polonäs_sexdregasamlingen_del_2_nr_70_1b9662.abc', './abc/_Polska_Kringellåt_efter_Snickar_Erik_985b63.abc']

# Testing: 14 pieces are registered as note-less, for some reason. The 9 ones that threw an exception + 5 unknown others. 
# We choose to omit them from the dataset.

In [5]:
# Function that maps the beat distribution of a bar to a series of numbers between 1 and 12 (position of each sixteenth-note in the bar)
def map_beats(notes):
    notes_mapped = []
    for bar in notes:
        bar_mapped = []
        for x in bar:
            bar_mapped.append(int(4*(x-1)+1))
        notes_mapped.append(bar_mapped)
    return notes_mapped

In [6]:
# Function that maps a series of numbers between 1 and 12 (position of each sixteenth-note in the bar) to a music21 beat distribution
def map_beats_reverse(notes):
    notes_mapped = []
    for bar in notes:
        bar_mapped = []
        for y in bar:
            bar_mapped.append((y+3)/4)
        notes_mapped.append(bar_mapped)
    return notes_mapped

In [7]:
# Testing the function
example_notes = data[data['piece'] == './abc/_1814_698087.abc']['notes'].tolist()[0]
# when we retrieve a specific piece and convert to list it wraps the piece's array in another array for some reason, hence the [0]
# but when doing iterrows we don't have that problem
print(example_notes)
map_12 = map_beats(example_notes)
print(map_12)
print(map_beats_reverse(map_12))

[[1.0, 1.5, 2.0, 3.0], [1.0, 1.5, 2.0, 3.0], [1.0, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 2.0, 3.0], [1.0, 1.5, 2.0, 3.0], [1.0, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 2.0], [1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [1.0, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 2.0, 2.25, 2.5, 2.75, 3.0, 3.5], [1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [1.0, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 2.0]]
[[1, 3, 5, 9], [1, 3, 5, 9], [1, 4, 5, 7, 9, 11], [1, 3, 4, 5, 7, 9, 11], [1, 3, 5, 9], [1, 3, 5, 9], [1, 4, 5, 7, 9, 11], [1, 3, 5], [1, 3, 5, 7, 9, 11], [1, 3, 5, 7, 9, 11], [1, 4, 5, 7, 9, 11], [1, 5, 6, 7, 8, 9, 11], [1, 3, 5, 7, 9, 11], [1, 3, 5, 7, 9, 11], [1, 4, 5, 7, 9, 11], [1, 5]]
[[1.0, 1.5, 2.0, 3.0], [1.0, 1.5, 2.0, 3.0], [1.0, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 2.0, 3.0], [1.0, 1.5, 2.0, 3.0], [1.0, 1.75, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 2.0], [1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [1.0, 1.5, 

In [8]:
# Using the mapping function on every piece
data_mapped = data.copy()

for _, d in data_mapped.iterrows():
    d['notes'] = map_beats(d['notes'])

print("Done")

data_mapped.head() # our dataframe for the model!

Done


Unnamed: 0,piece,notes
0,./abc/1b651a94a8f8680da3cef1e60705194d.abc,"[[1, 3, 5, 6, 7, 8, 9, 11], [1, 3, 5, 6, 7, 8,..."
1,./abc/3679c385a618c89f00d159f418022b39.abc,"[[1, 3, 4, 5, 7, 8, 9, 11], [1, 4, 5, 6, 7, 8,..."
2,./abc/60ae81952628613e03d563b0dae8bbc2.abc,"[[1, 4, 5, 7, 9, 10, 11, 12], [1, 3, 5, 9, 11]]"
3,./abc/6a7d307d1f0712090e4043a6b4637efa.abc,"[[1, 5, 9], [1, 1, 3, 4, 5, 7, 9, 11], [1, 4, ..."
4,./abc/8c6965e3c831e8cb769e578423c12d1e.abc,"[[1, 5, 7, 8, 9], [1, 3, 4, 5, 6, 7, 8, 9, 10,..."


In [9]:
# Splitting the data into train and test
from sklearn.model_selection import train_test_split

x_train, x_test = train_test_split(data_mapped) # split 75% train 25% test

print(len(x_train))
print(len(x_test))

438
147


In [10]:
# Writing train and test to csv
x_train.to_csv("train.csv", index=False)
x_test.to_csv("test.csv", index=False)

**Part 2: The Random Model**

**Goal:** generate random 8-bar rhythms. The only constraint is the the minimal note duration is a sixteenth note.

In [12]:
import random

random.seed(1113)

In [15]:
# function to fill a bar with random notes
def random_bar():
    # first = True
    first = False
    # first decide how many notes are in the bar
    # (the tempo is always 3/4)
    bar = []
    for i in range(11):
        # decide if a note is played
        if first:
            bar.append(1.0)
            first = False
        elif random.random() < 0.5:
            # if yes, decide which note
            bar.append((i/4)+1)
        # check that the bar is not empty
    if len(bar) == 0:
        bar = random_bar()
    return bar

# function to create a random song
def create_random_song():
    # make a list with 8 lists (bars) inside
    n_bars = 8
    random_song = [[] for j in range(n_bars)]
    # for each list in random_song fill it with random notes
    for i in range(n_bars):
        random_song[i] = random_bar()
    return random_song

In [18]:
# example
random_song = create_random_song()
print('Random song example: ')
print(random_song)

Random song example: 
[[1.0, 1.75, 2.0, 3.0, 3.25, 3.5], [1.25, 2.5, 3.25, 3.5], [1.25, 1.5, 2.25, 2.75, 3.0, 3.25], [1.0, 1.75, 2.0, 2.5, 2.75, 3.25, 3.5], [1.0, 1.25, 2.0, 2.75, 3.0, 3.25, 3.5], [1.25, 2.0, 2.75, 3.0, 3.25, 3.5], [1.0, 1.25, 2.25, 2.5, 2.75, 3.0, 3.25], [1.25, 1.5, 1.75, 2.0, 2.25, 2.75, 3.0, 3.25]]


In [19]:
# function to convert from beats into durations
def convert_to_duration(random_song_list):
    last = 4.0
    # read the list backwards
    random_song_list.reverse()
    # create a list with the duration of each note
    duration_list = []
    for bar in random_song_list:
        # reverse the inner list
        bar.reverse()
        for note in bar:
            duration_list.append(last - note)
            last = note
        last += 4.0
    # reverse the list again
    duration_list.reverse()
    # return original order to original song aswell
    random_song_list.reverse()
    for bar in random_song_list:
        bar.reverse()
    return duration_list

In [21]:
# example
duration_list = convert_to_duration(random_song)
print('Duration list example: ')
print(duration_list)

Duration list example: 
[0.75, 0.25, 1.0, 0.25, 0.25, 1.75, 1.25, 0.75, 0.25, 1.75, 0.25, 0.75, 0.5, 0.25, 0.25, 1.75, 0.75, 0.25, 0.5, 0.25, 0.5, 0.25, 1.5, 0.25, 0.75, 0.75, 0.25, 0.25, 0.25, 1.75, 0.75, 0.75, 0.25, 0.25, 0.25, 1.5, 0.25, 1.0, 0.25, 0.25, 0.25, 0.25, 2.0, 0.25, 0.25, 0.25, 0.25, 0.5, 0.25, 0.25, 0.75]


In [27]:
# function to convert the duration list into a music21 stream
def convert_to_stream(duration_list):
    # create an empty stream
    stream = music21.stream.Stream()
    # define the tempo as 3/4
    stream.append(music21.meter.TimeSignature('3/4'))
    # create note
    # assign random pitch (C4)
    # assign duration (quarter)
    # append the note into empty stream
    for duration in duration_list:
        note = music21.note.Note(pitch='C4', quarterLength=duration)
        stream.append(note)
    return stream

In [30]:
# example
stream = convert_to_stream(duration_list)

# print with map_beats convertion (map_beats function from Part 1)
print('Random song example with map_beats: ')
print(map_beats(random_song))

# play the stream
stream.show('midi')

Random song example with map_beats: 
[[1, 4, 5, 9, 10, 11], [2, 7, 10, 11], [2, 3, 6, 8, 9, 10], [1, 4, 5, 7, 8, 10, 11], [1, 2, 5, 8, 9, 10, 11], [2, 5, 8, 9, 10, 11], [1, 2, 6, 7, 8, 9, 10], [2, 3, 4, 5, 6, 8, 9, 10]]


In [33]:
# produce a dataframe size 500 samples and save it as csv
def produce_dataframe():
    df = pd.DataFrame(columns=['id', 'notes'])
    for i in range(500):
        random_song = create_random_song()
        # df = df.append({'id': i, 'notes': map_beats(random_song)}, ignore_index=True)
        df.loc[len(df)] = [i, map_beats(random_song)]
    df.to_csv('random_songs.csv', index=False)
    return df

df = produce_dataframe()
print(f"{len(df)} random songs generated.")
df.head()

500 random songs generated.


Unnamed: 0,id,notes
0,0,"[[2, 3, 6, 7, 9, 11], [1, 2, 3, 5, 6, 7, 8, 9,..."
1,1,"[[1, 2, 3, 5, 7, 8, 10, 11], [5, 7, 8, 11], [1..."
2,2,"[[1, 3, 5, 8, 9], [2, 3, 4, 7, 8, 11], [2, 5, ..."
3,3,"[[1, 6, 7, 8, 11], [1, 3, 4, 5, 8, 10, 11], [1..."
4,4,"[[3, 4, 5, 6, 7, 9, 10], [1, 2, 5, 6, 8, 9, 11..."


**Part 3: The Generation Model**

**Goal:** generate "human-like" 8-bar rhythms in the style of Slängpolska. We will use a bigram-based Markov chain model, trained on 75% of the dataset as per the train-test split done in part 1.

**Part 4: Results, Metric, Evaluation**

**Goal:** evaluate the results of the random model, the original dataset, and our generation model, and compare the three. We will also highlight one good sample and one bad sample from our generation model.