# Bachelor project notebook

### Musical patterns for prediction
Before continuing, make sure to download and install <a href="https://abjad.github.io/">abjad</a>, <a href="http://lilypond.org">LilyPond</a> (needed by abjad) and <a href="https://github.com/craffel/pretty-midi">pretty_midi</a>, as well as the standard python librairies, such as numpy.

We first need to import the files for the generation process. All three methods take a path to a midi (.mid) file, and produce a continuation in the same folder.

In [1]:
from simple_first_order_mm import generate_prediction_with_simple_markov
from string_based import generate_prediction_with_string_based
from translation_based import generate_prediction_with_translation_based

Then we'll set some global variables that will be used throughout this notebook. The former is the path a midi (.mid) file, for which a continuation has to be produced. The latter is the path to a dataset from <a href="https://www.music-ir.org/mirex/wiki/2019:Patterns_for_Prediction#Data">MIREX 2019: Patterns for Prediction</a>. Both paths can be either relative or absolute paths.

In [2]:
midi_filename = "midi_sample_c_major.mid"

dataset_filepath = "../Datasets/PPDD-Sep2018_sym_mono_small/"

Let's generate a continuation with the simple Markov model algorithm. Warnings from pretty_midi or abjad can be ignored.

In [3]:
generate_prediction_with_simple_markov(midi_filename)
# Optionally, we can tweak some parameters for the generation, default is:
# generate_prediction_with_simple_markov(midi_filename, notes_to_generate = 4, with_smoothing=False, probability_known_states=0.9)



NOTE: The Pärt demo requires abjad-ext-tonality


Now, try to generate with the string-based approach.

In [4]:
generate_prediction_with_string_based(midi_filename)
# Same options can be applied, default is:
# generate_prediction_with_string_based(midi_filename, patterns_to_generate = 4, with_smoothing=False, probability_known_patterns=0.9)



Finally, we can generate with the translation-based algorithm.

In [5]:
generate_prediction_with_translation_based(midi_filename)
# Default is:
# generate_prediction_with_translation_based(midi_filename, patterns_to_generate = 4, with_smoothing=False, probability_known_patterns=0.9)

### Generation for datasets
All of this was for individual files. However, for the evaluation, we need to do this procedure for a whole dataset. First import the needed functions.

In [6]:
from simple_first_order_mm import generate_prediction_with_simple_markov_for_dataset
from string_based import generate_prediction_with_string_based_for_dataset
from translation_based import generate_prediction_with_translation_based_for_dataset

Before going any further, note that I assume the dataset that will be used is from <a href="https://www.music-ir.org/mirex/wiki/2019:Patterns_for_Prediction#Data">MIREX 2019: Patterns for Prediction</a> (it has to have a prime_csv subfolder). The functions work with csv files, and produce both csv and midi files, so that the results can be heard by using <a href="https://musescore.org/en">MuseScore</a>, for example. For all three methods to work, you will first need to create multiple folders, at the same location as the "prime_csv" folder:
- For the simple first order Markov model: "markov_without_prediction_midi" and "markov_without_prediction_csv".
- For the string-based approach: "markov_with_prediction_midi" and "markov_with_prediction_csv".
- For the translation-based approach: "markov_with_non_exact_prediction_midi" and "markov_with_non_exact_prediction_csv".

Once done, we can safely run the three functions below.

In [7]:
generate_prediction_with_simple_markov_for_dataset(dataset_filepath)
# Default is generate_prediction_with_simple_markov_for_dataset(dataset_filepath, notes_to_generate = 30,with_smoothing=True,probability_known_states=0.9)


[A[A
Progress: 1.0%
[A[A
Progress: 2.0%
[A[A
Progress: 3.0%
[A[A
Progress: 4.0%
[A[A
Progress: 5.0%
[A[A
Progress: 6.0%
[A[A
Progress: 7.0%
[A[A
Progress: 8.0%
[A[A
Progress: 9.0%
[A[A
Progress: 10.0%
[A[A
Progress: 11.0%
[A[A
Progress: 12.0%
[A[A
Progress: 13.0%
[A[A
Progress: 14.0%
[A[A
Progress: 15.0%
[A[A
Progress: 16.0%
[A[A
Progress: 17.0%
[A[A
Progress: 18.0%
[A[A
Progress: 19.0%
[A[A
Progress: 20.0%
[A[A
Progress: 21.0%
[A[A
Progress: 22.0%
[A[A
Progress: 23.0%
[A[A
Progress: 24.0%
[A[A
Progress: 25.0%
[A[A
Progress: 26.0%
[A[A
Progress: 27.0%
[A[A
Progress: 28.0%
[A[A
Progress: 29.0%
[A[A
Progress: 30.0%
[A[A
Progress: 31.0%
[A[A
Progress: 32.0%
[A[A
Progress: 33.0%
[A[A
Progress: 34.0%
[A[A
Progress: 35.0%
[A[A
Progress: 36.0%
[A[A
Progress: 37.0%
[A[A
Progress: 38.0%
[A[A
Progress: 39.0%
[A[A
Progress: 40.0%
[A[A
Progress: 41.0%
[A[A
Progress: 42.0%
[A[A
Progress: 43.0%
[A[A
Progress: 44

In [8]:
generate_prediction_with_string_based_for_dataset(dataset_filepath)
# Default is generate_prediction_with_string_based_for_dataset(dataset_filepath, patterns_to_generate = 20,with_smoothing=True,probability_known_states=0.9)


[A[A
Progress: 1.0%
[A[A
Progress: 2.0%
[A[A
Progress: 3.0%
[A[A
Progress: 4.0%
[A[A
Progress: 5.0%
[A[A
Progress: 6.0%
[A[A
Progress: 7.0%
[A[A
Progress: 8.0%
[A[A
Progress: 9.0%
[A[A
Progress: 10.0%
[A[A
Progress: 11.0%
[A[A
Progress: 12.0%
[A[A
Progress: 13.0%
[A[A
Progress: 14.0%
[A[A
Progress: 15.0%
[A[A
Progress: 16.0%
[A[A
Progress: 17.0%
[A[A
Progress: 18.0%
[A[A
Progress: 19.0%
[A[A
Progress: 20.0%
[A[A
Progress: 21.0%
[A[A
Progress: 22.0%
[A[A
Progress: 23.0%
[A[A
Progress: 24.0%
[A[A
Progress: 25.0%
[A[A
Progress: 26.0%
[A[A
Progress: 27.0%
[A[A
Progress: 28.0%
[A[A
Progress: 29.0%
[A[A
Progress: 30.0%
[A[A
Progress: 31.0%
[A[A
Progress: 32.0%
[A[A
Progress: 33.0%
[A[A
Progress: 34.0%
[A[A
Progress: 35.0%
[A[A
Progress: 36.0%
[A[A
Progress: 37.0%
[A[A
Progress: 38.0%
[A[A
Progress: 39.0%
[A[A
Progress: 40.0%
[A[A
Progress: 41.0%
[A[A
Progress: 42.0%
[A[A
Progress: 43.0%
[A[A
Progress: 44

In [9]:
generate_prediction_with_translation_based_for_dataset(dataset_filepath)
# Default is generate_prediction_with_translation_based_for_dataset(dataset_filepath, patterns_to_generate = 20,with_smoothing=True,probability_known_states=0.9)


[A[A
Progress: 1.0%
[A[A
Progress: 2.0%
[A[A
Progress: 3.0%
[A[A
Progress: 4.0%
[A[A
Progress: 5.0%
[A[A
Progress: 6.0%
[A[A
Progress: 7.0%
[A[A
Progress: 8.0%
[A[A
Progress: 9.0%
[A[A
Progress: 10.0%
[A[A
Progress: 11.0%
[A[A
Progress: 12.0%
[A[A
Progress: 13.0%
[A[A
Progress: 14.0%
[A[A
Progress: 15.0%
[A[A
Progress: 16.0%
[A[A
Progress: 17.0%
[A[A
Progress: 18.0%
[A[A
Progress: 19.0%
[A[A
Progress: 20.0%
[A[A
Progress: 21.0%
[A[A
Progress: 22.0%
[A[A
Progress: 23.0%
[A[A
Progress: 24.0%
[A[A
Progress: 25.0%
[A[A
Progress: 26.0%
[A[A
Progress: 27.0%
[A[A
Progress: 28.0%
[A[A
Progress: 29.0%
[A[A
Progress: 30.0%
[A[A
Progress: 31.0%
[A[A
Progress: 32.0%
[A[A
Progress: 33.0%
[A[A
Progress: 34.0%
[A[A
Progress: 35.0%
[A[A
Progress: 36.0%
[A[A
Progress: 37.0%
[A[A
Progress: 38.0%
[A[A
Progress: 39.0%
[A[A
Progress: 40.0%
[A[A
Progress: 41.0%
[A[A
Progress: 42.0%
[A[A
Progress: 43.0%
[A[A
Progress: 44

Note that each method is much slower than the previous one.

### Evaluation
With the three outputs generated, we can now run the evaluation code. Please follow the instructions <a href="https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019">here</a>.

Since my project focuses on the first MIREX subtask, no need to change anything in "DISCRIM_MONO_FILES" and "DISCRIM_POLY_FILES" in config.py. An example config.py would be as follows (replace the corresponding paths):

In [10]:
# Location to write output to
OUTPUT_FOLDER = 'PATH/TO/OUTPUT/FOLDER' # no "/" at the end. ### CHANGE THIS ###
# point the dataset path to the appropriate path on your file system
DATASET_PATH = "/PATH/TO/PPDD-Sep2018_sym_mono_small" ### CHANGE THIS ###

MODEL_DIRS = { ### CHANGE THIS ###
    'Translation-based': '/PATH/TO/PPDD-Sep2018_sym_mono_small/markov_with_non_exact_prediction_csv',
    'Baseline': '/PATH/TO/PPDD-Sep2018_sym_mono_small/cont_foil_csv',
    'String-based': '/PATH/TO/PPDD-Sep2018_sym_mono_small/markov_with_prediction_csv',
    'Simple': '/PATH/TO/PPDD-Sep2018_sym_mono_small/markov_without_prediction_csv'
}
MODEL_KEYS = { ### NO NEED TO CHANGE THIS ###
    'Translation-based': ['onset', 'pitch', 'morph', 'dur', 'ch'],
    'Baseline': ['onset', 'pitch', 'morph', 'dur', 'ch'],
    'String-based': ['onset', 'pitch', 'morph', 'dur', 'ch'],
    'Simple': ['onset', 'pitch', 'morph', 'dur', 'ch']
}
DISCRIM_MONO_FILES = {
    'mdl1': 'path/to/mono1.csv',
    'mdl2': 'path/to/mono2.csv'
}
DISCRIM_POLY_FILES = {
    'mdl1': 'path/to/poly1.csv',
    'mdl2': 'path/to/poly2.csv'
}
### CHANGE THIS ###
FILENAME_FRAGMENT = "FILENAME"


You might need to update the function score_cs in cs.py to:

In [11]:
def score_cs(fn_list, alg_names, files_dict, cont_true, prime):
    card_scores = []
    for alg in alg_names:
        print(f'Scoring {alg} with cardinality score')
        for fn in tqdm(fn_list):
            # the generated file name may have additions to original file name
            generated_fn = next(
                (alg_fn for alg_fn in files_dict[alg].keys()
                 if re.search(fn, alg_fn)),
                None
            )
            true_df = cont_true[fn]
            gen_df = files_dict[alg][generated_fn]
            prime_final_onset = prime[fn].iloc[-1]['onset']
            cs_score = evaluate_continuation(
                true_df,
                gen_df,
                prime_final_onset,
                0.5, 2.0, 10.0
            )
            cs_score['fn'] = fn
            cs_score['Model'] = alg
            card_scores.append(cs_score)
    card_df = pd.concat(card_scores, axis=0)
    data = card_df.melt(
        id_vars=['fn', 'Onset', 'Model'], 
        value_vars=['Precision', 'Recall', 'F1'], 
        var_name="measure",
        value_name="Score"
    )
    
    ### START MODIFICATION
    data['Onset'] = data['Onset'].astype(float) 
    data['Score'] = data['Score'].astype(float)
    ### END MODIFICATION
    
    plt.figure()
    sns.set_style("whitegrid")
    g = sns.FacetGrid(
        data,
        col='measure',
        hue='Model',
        hue_order=config.MODEL_DIRS.keys(),
        hue_kws={
            'marker': ['o', 'v', 's', 'D'],
            'linestyle' : [":","--","-", "-."]
        }
        )
    g = g.map(
        sns.lineplot,
        'Onset',
        'Score',
        # style='Model',
        # style_order=config.MODEL_DIRS.keys(),
        # markers=['o', 'v', 's']
    ).add_legend()
    filename = op.join(config.OUTPUT_FOLDER, '{}_cs_scores.png'.format(config.FILENAME_FRAGMENT))
    plt.savefig(filename, dpi=300)


Then execute evaluate_prediction.py to generate the plots.