# Musical Feature Mining for Codetta using Music21

The AIM of this notebook is demonstrate how how to extract multiple features from a corpus of midi files, into a csv format, so standard data science techniques could be performing, using any of pandas, sklearn etc... . 

#### Created by Corey Ford (c.j.ford@qmul.ac.uk) on the 20th of April 2021

---

## List of Features
---
Below is a list of the different features avalaible in JSymbolic that have been implemented in the Music21 library and that I think could be useful to detect stuff in Codetta...
* Timbre: 
    * ViolinFractionFeature
    * BrassFractionFeature
    * **Note: fractions for Piano, glock, cello, viola, and pad are not avaliable.**

* Melody:
    * MelodicIntervalHistogramFeature
    * AverageMelodicIntervalFeature
    * MostCommonMelodicIntervalFeature
    * DistanceBetweenMostCommonMelodicIntervalsFeature
    * MostCommonMelodicIntervalPrevalenceFeature
    * RelativeStrengthOfMostCommonIntervalsFeature
    * NumberOfCommonMelodicIntervalsFeature
    * AmountOfArpeggiationFeature
    * RepeatedNotesFeature
    * ChromaticMotionFeature
    * StepwieMotionFeature
    * MelodicThirdsFeature
    * MelodicFifthsFeature
    * MelodicTritonesFeature
    * MelodicOctavesFeature
    * DirectionOfMotionFeature
    * DurationOfMelodicArcsFeature
    * SizeOfMelodicArcsFeature

* Pitch:
    * MostCommonPitchPrevalenceFeature
    * MostCommonPitchClassPrevalenceFeature
    * NumberOfCommonPitchesFeature
    * PitchVarietyFeature
    * PitchClassVarietyFeature
    * RangeFeature
    * MostCommonPitchFeature
    * MostCommonPitchClassFeature

* Rhythm:
    * NoteDensityFeature
    * AverageNoteDurationFeature
    * VariabilityOfNoteDurationFeature
    * MaximumNoteDurationFeature
    * MinimumNoteDurationFeature

    * AverageTimeBetweenAttacksFeature
    * VariabilityOfTimeBetweenAttacksFeature

    * DurationFeature
    
A feature of Codetta is that it has a floating timeline. This sadly means that any data on time signature changes/tempo is best collected from the codetta XML files.

In [6]:
import music21 as m21
import os 
import pandas as pd

In [16]:

#here is a dictonary of useful features 
features = {
    "usefulFeature1" : [],
    "usefulFeature2" : []
}


# This is a folder full of the Codetta midi files...
directory = '/Users/coreyford/Documents/UWE/Codetta Examples/Jingles/midi-files/'

for filename in os.listdir(directory):
    if filename.endswith(".midi"):# for each of the files
        
        #load the midi file
        file = m21.converter.parse(directory+filename, format="midi")
        
        # Store the features
        # Note the .extract().vector[0] at the end, for any of the histograms remove the [0]
        features["usefulFeature1"].append(m21.features.jSymbolic.AcousticGuitarFractionFeature(file).extract().vector[0])
        features["usefulFeature2"].append(m21.features.jSymbolic.DirectionOfMotionFeature(file).extract().vector[0])
        # you can find more here: https://web.mit.edu/music21/doc/moduleReference/moduleFeaturesJSymbolic.html
        

In [17]:
# Create a dataframe of the saved features 
df = pd.DataFrame.from_dict(features)
df

Unnamed: 0,usefulFeature1,usefulFeature2
0,[0.0],[0.75]
1,[0.0],[0.65625]
2,[0.0],[0.5289256198347108]


In [None]:
#Do some analysis on it 