# Assignment : Feature extraction and classification - MIR 2022/2023

The goal of this assignment is to reproduce the classification experiment presented in a Jupyter Notebook and try to adapt it to another classification task, trying to obtain the best possible accuracy.

In my case, I decided to chose the Beatport EDM Key Dataset (https://zenodo.org/record/1101082) from the MIRDATA database. This dataset is  composed of : 
* 1486 excerpts from various EDM subgenres (2mn audio)
* Those excerpts are taken from Beatport, an online store mainly focussed on Electronic Music.
* Out of this 1486 excerpts, 785 of them have metadata information present in an associated metadata file. 

This dataset is mainly intended to assess the performance of computational key estimation algorithms in electronic dance music subgenres.

Here, we are going to try to have a classifier for the electronic music genre. 

We are going to split the work in several subtasks :
1. Setting up ; understanding & balancing the dataset
2. Feature Extraction
3. Genre classification

## 1 - Setting up ; understanding & balancing the dataset

This task is done within this notebook.
The goal is to :
* Import the libraries and download the dataset
* Understanding how to use the dataset
* Balancing the dataset


### 1.1 - Setup
In this part, we import the relevant librairies and download the dataset.

In [1]:
from traitlets.traitlets import ForwardDeclaredInstance
#If not installed, install Essentia. 
# This cell is for running the notebook in Colab
import importlib.util
if importlib.util.find_spec('essentia') is None:
    !pip install essentia

!pip install mirdata
!pip install pandas
!pip install pydub

#Basic imports
import os
import matplotlib.pyplot as plt
import numpy as np
import IPython.display as ipd
import random
import time
from pathlib import Path
from pydub import AudioSegment

# Imports to support MIR
import mirdata
import essentia.standard as ess
import pandas as pd

# To specify that we want to test things
testing = False

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting essentia
  Downloading essentia-2.1b6.dev858-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.6/13.6 MB[0m [31m71.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: essentia
Successfully installed essentia-2.1b6.dev858
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting mirdata
  Downloading mirdata-0.3.7-py3-none-any.whl (14.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.9/14.9 MB[0m [31m71.7 MB/s[0m eta [36m0:00:00[0m
Collecting jams
  Downloading jams-0.3.4.tar.gz (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.3/51.3 KB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting Deprecated>=1.2.13
  Downloading

In [2]:
# Downloading the dataset and validating the installation with a CRC check.
beatport_key = mirdata.initialize("beatport_key")
beatport_key.download()
beatport_key.validate()

376kB [00:00, 523kB/s]                             
904kB [00:00, 935kB/s]                             
1.98GB [00:16, 128MB/s]                            
100%|██████████| 1486/1486 [00:06<00:00, 244.02it/s]


({'tracks': {}}, {'tracks': {}})

In [3]:
from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive


### 1.2 - Understanding the dataset
We manipulate the dataset to understand how every element is linked together. Using the basic API furnished by the dataset for this.

https://mirdata.readthedocs.io/en/stable/source/mirdata.html#module-mirdata.datasets.beatport_key 

In [4]:
# Loading a random track 
track = beatport_key.choice_track()  # load a random track
x, sr = track.audio

print(track.audio_path)
ipd.display(ipd.Audio(data='..' + track.audio_path, rate=44100))
print(track)  # see what data a track contains

beatport_ids = beatport_key.track_ids
beatport_data = beatport_key.load_tracks()
print(f'\nWe have {len(beatport_data)} in the dataset\n')

# For the track n°488 in the dataset 
print("-----\nInfos of the 488'th track in the dataset :")
print(beatport_key.load_genre(beatport_data['488'].metadata_path))
print(beatport_data['488'])
print(beatport_data['488'].key)
print(beatport_data['488'].genres)
print(beatport_data['488'].tempo)



/root/mir_datasets/beatport_key/audio/3607700 Chewie - Survival (Riskotheque Remix).mp3


Track(
  audio_path="/root/mir_datasets/beatport_key/audio/3607700 Chewie - Survival (Riskotheque Remix).mp3",
  keys_path="/root/mir_datasets/beatport_key/keys/3607700 Chewie - Survival (Riskotheque Remix).txt",
  metadata_path="/root/mir_datasets/beatport_key/meta/3607700 Chewie - Survival (Riskotheque Remix).json",
  title="3607700 Chewie - Survival (Riskotheque Remix)",
  track_id="801",
  artists: ,
  audio: The track's audio

        Returns,
  genres: ,
  key: ,
  tempo: ,
)

We have 1486 in the dataset

-----
Infos of the 488'th track in the dataset :
{'genres': ['House'], 'sub_genres': []}
Track(
  audio_path="...eatport_key/audio/298193 Marschmellows, Sugar B. - Swoundosophy feat. Sugar B. (Moodorama Remix).mp3",
  keys_path="...beatport_key/keys/298193 Marschmellows, Sugar B. - Swoundosophy feat. Sugar B. (Moodorama Remix).txt",
  metadata_path="...eatport_key/meta/298193 Marschmellows, Sugar B. - Swoundosophy feat. Sugar B. (Moodorama Remix).json",
  title="298193 Marschmel

  print(beatport_key.load_genre(beatport_data['488'].metadata_path))


### 1.3 - Analyzing the dataset

In this subtask, we are going to create a list of all the track having some metadata in the .json files loaded with the dataset.

We are going to work with those tracks.

We are going to gather all the different genres that compose the dataset,
then try to balance the dataset in order to have the same number of elements for every genre.

In [5]:
# Creating a list of every track. If there is no metadata file associated with the track, then we don't consider the track.

list_tracks = []
for i in beatport_data:
  if beatport_data[str(i)].metadata_path != None:
    list_tracks.append(beatport_data[str(i)])


print(f'We have {len(list_tracks)} tracks that we are going to consider.')
print("----- Information for the 1st track in the list :---")
print(list_tracks[0])
print(list_tracks[0].audio_path)
print(list_tracks[0].genres['genres'][0])


We have 792 tracks that we are going to consider.
----- Information for the 1st track in the list :---
Track(
  audio_path="/root/mir_datasets/beatport_key/audio/100066 Lindstrom - Monsteer (Original Mix).mp3",
  keys_path="/root/mir_datasets/beatport_key/keys/100066 Lindstrom - Monsteer (Original Mix).txt",
  metadata_path="/root/mir_datasets/beatport_key/meta/100066 Lindstrom - Monsteer (Original Mix).json",
  title="100066 Lindstrom - Monsteer (Original Mix)",
  track_id="1",
  artists: ,
  audio: The track's audio

        Returns,
  genres: ,
  key: ,
  tempo: ,
)
/root/mir_datasets/beatport_key/audio/100066 Lindstrom - Monsteer (Original Mix).mp3
Electronica / Downtempo


In [6]:
# We then compute the list of uniques genres

genre_names = []
for i in list_tracks:
  tmp = i.genres['genres']
  if len(tmp) > 1:
    print("--- Error ; shouldn't happen")
    break                        
  genre_names.append(tmp[0])

genre_names_unique = np.unique(genre_names)
print("Here's the unique genres that we have in the dataset :")
print(genre_names_unique)

Here's the unique genres that we have in the dataset :
['Big Room' 'Breaks' 'Deep House' 'Drum & Bass' 'Dubstep' 'Electro House'
 'Electronica / Downtempo' "Funky / Groove / Jackin' House" 'Hard Dance'
 'Hip-Hop / R&B' 'House' 'Minimal / Deep Tech' 'Progressive House'
 'Psy-Trance' 'Tech House' 'Techno' 'Trance']


In [7]:
# Create a dictionary : key = genre name and value = list of tracks of this genre.
genre_dict = {item: [] for item in genre_names_unique}

# Adding every track to the right list
for i in list_tracks:
  genre_dict[i.genres['genres'][0]].append(i)

for i in genre_dict:
  print(f'Number of elements for {i} : {len(genre_dict.get(i))}')

Number of elements for Big Room : 1
Number of elements for Breaks : 49
Number of elements for Deep House : 48
Number of elements for Drum & Bass : 41
Number of elements for Dubstep : 41
Number of elements for Electro House : 54
Number of elements for Electronica / Downtempo : 99
Number of elements for Funky / Groove / Jackin' House : 8
Number of elements for Hard Dance : 54
Number of elements for Hip-Hop / R&B : 51
Number of elements for House : 44
Number of elements for Minimal / Deep Tech : 54
Number of elements for Progressive House : 50
Number of elements for Psy-Trance : 43
Number of elements for Tech House : 49
Number of elements for Techno : 52
Number of elements for Trance : 54


In [8]:
# We can see that the track number is not balanced. We therefore have to balance it.
# We remove two genres that don't have enough excerpts : 'Big Room' & 'Funky / Groove / Jackin' House'
# We take randomly 41 elements of each of the remaining categories.

balanced_genre_dict = {}
files_list = []

for i in genre_dict:
  if i in {'Big Room', 'Funky / Groove / Jackin\' House'}:
    print(f"This genre : {i} is discarded.")
    pass
  else:
    balanced_genre_dict[i] = random.sample(genre_dict.get(i),41)

genre_dict = balanced_genre_dict

for i in genre_dict:
  print(f'Number of elements for {i} : {len(genre_dict.get(i))}')
  [files_list.append(j) for j in genre_dict.get(i)]

print(f'\nWe have {len(files_list)} files for {len(genre_dict)} different classes')

This genre : Big Room is discarded.
This genre : Funky / Groove / Jackin' House is discarded.
Number of elements for Breaks : 41
Number of elements for Deep House : 41
Number of elements for Drum & Bass : 41
Number of elements for Dubstep : 41
Number of elements for Electro House : 41
Number of elements for Electronica / Downtempo : 41
Number of elements for Hard Dance : 41
Number of elements for Hip-Hop / R&B : 41
Number of elements for House : 41
Number of elements for Minimal / Deep Tech : 41
Number of elements for Progressive House : 41
Number of elements for Psy-Trance : 41
Number of elements for Tech House : 41
Number of elements for Techno : 41
Number of elements for Trance : 41

We have 615 files for 15 different classes


## 2 - Feature Extraction

This task is done within this notebook. The goal of this task is to :

* Select a set of features that will be used
* Extract the features for each sound excerpt and store it in a .csv file

For this, we are going to use essentia.

### 2.1 - Feature choice

After trying with only the low-levels features (84), we had between 20-30% of accuracy.
We are going to try with adding tonal features and rhythm features, which makes a total of 124 features (only those that can be store as floats).

In [9]:
features, features_frames = ess.MusicExtractor(lowlevelSilentFrames='drop',
                                                      lowlevelFrameSize = 2048,
                                                      lowlevelHopSize = 1024,
                                                      lowlevelStats = ['mean', 'stdev'],
                                                      mfccStats = ["mean", "cov"],
                                                      rhythmStats = ["mean", "var", "stdev", "median"],
                                                      tonalFrameSize = 2048,
                                                      tonalHopSize = 1024,
                                                      tonalSilentFrames = "drop",
                                                      tonalStats = ["mean", "var", "stdev", "median", "min", "max"])(beatport_data[beatport_ids[10]].audio_path)
scalar_lowlevel_descriptors = [descriptor for descriptor in features.descriptorNames() if 'metadata' not in descriptor and isinstance(features[descriptor], float)]
print("Subset of features to be considered:\n",scalar_lowlevel_descriptors)
print(f'This subset contains {len(scalar_lowlevel_descriptors)} descriptors')

Subset of features to be considered:
 ['lowlevel.average_loudness', 'lowlevel.barkbands_crest.mean', 'lowlevel.barkbands_crest.stdev', 'lowlevel.barkbands_flatness_db.mean', 'lowlevel.barkbands_flatness_db.stdev', 'lowlevel.barkbands_kurtosis.mean', 'lowlevel.barkbands_kurtosis.stdev', 'lowlevel.barkbands_skewness.mean', 'lowlevel.barkbands_skewness.stdev', 'lowlevel.barkbands_spread.mean', 'lowlevel.barkbands_spread.stdev', 'lowlevel.dissonance.mean', 'lowlevel.dissonance.stdev', 'lowlevel.dynamic_complexity', 'lowlevel.erbbands_crest.mean', 'lowlevel.erbbands_crest.stdev', 'lowlevel.erbbands_flatness_db.mean', 'lowlevel.erbbands_flatness_db.stdev', 'lowlevel.erbbands_kurtosis.mean', 'lowlevel.erbbands_kurtosis.stdev', 'lowlevel.erbbands_skewness.mean', 'lowlevel.erbbands_skewness.stdev', 'lowlevel.erbbands_spread.mean', 'lowlevel.erbbands_spread.stdev', 'lowlevel.hfc.mean', 'lowlevel.hfc.stdev', 'lowlevel.loudness_ebu128.integrated', 'lowlevel.loudness_ebu128.loudness_range', 'lowlev

### 2.2 - Actual feature extraction

In this part, we are doing the extraction process.

We algorithm is going to be the following :
```
For every file in the list that we created :
    we divide the audio (120s long) in 6 smaller audio of 20s each (creating mp3 files)
    For each every small audio : 
        We run the essentia analysis on the fragment
        We store the results in an excel file
    We delete every mp3 file created in the process
```

The whole process should take nearly 3 hours.

In [11]:
# Preparation
data_file = 'gdrive/MyDrive/Term2/MIR/analysis_moredescrip_split_final.csv'
start = time.monotonic()
file_count = 0
iter = 0

with open(data_file, 'w') as writer:
    # Preparing first line of the CSV file
    line2write = ','.join(scalar_lowlevel_descriptors + ['tempo'] + ['key'] + ['genre'] + ['name']).replace('lowlevel.','') + '\n'
    writer.write(line2write)

    # For every element in our list containing the music 
    for iter in range(len(files_list)):
        filename = Path(files_list[iter].audio_path)
        file_count +=1
        if file_count % 20 == 0: # print name of a file every 20 files
          print(file_count, "files processed, current file: ", filename)

        # Dividing in smaller mp3's
        sound = AudioSegment.from_mp3(filename)
        files_divided_list = []
        for i in range(1,7):
          part_sound = sound[(i-1)*20000:i*20000] #20000ms = 20s because we divide 120s by 6.
          # If we have less than 5s remaining, we don't count the data - particular case
          if(len(sound[(i-1)*20000:]) < 5000): 
            pass
          else:
            part_sound_path = f'{filename.parents[0]}/{filename.stem}_{i}{filename.suffix}'
            part_sound.export(part_sound_path, format="mp3")
            files_divided_list.append(part_sound_path)

        # For every smaller mp3
        for part_file in files_divided_list:
          # Compute and write features for file
          features, features_frames = ess.MusicExtractor(lowlevelFrameSize = 2048,
                                                      lowlevelHopSize = 1024,
                                                      lowlevelStats = ['mean', 'stdev'],
                                                      mfccStats = ["mean", "cov"],
                                                      rhythmStats = ["mean", "var", "stdev", "median"],
                                                      tonalFrameSize = 2048,
                                                      tonalHopSize = 1024,
                                                      tonalSilentFrames = "drop",
                                                      tonalStats = ["mean", "var", "stdev", "median", "min", "max"])(part_file)
          selected_features = [features[descriptor] for descriptor in scalar_lowlevel_descriptors]
          
          # Adding elements like tempo ; key ; genres ... to the .csv file
          if files_list[iter].tempo != None:
            bpm_from_json = str(int(files_list[iter].tempo))
          else:
            bpm_from_json = ''
          key_from_json = str(files_list[iter].key).replace(',','-')
          genre_from_json = str(files_list[iter].genres['genres'][0])
          name = Path(filename).name.replace(',','')
          line2write = str(selected_features)[1:-1] + ',' + name + '\n'
          line2write = f'{str(selected_features)[1:-1]},{bpm_from_json},{key_from_json},{genre_from_json},{name}\n'
          writer.write(line2write)
          
          # Erasing the smaller file after it being processed
          Path(part_file).unlink()

print("A total of ", file_count, "files processed")
end = time.monotonic()
total = end - start
print('This operation took {:.2f} seconds'.format(total))

20 files processed, current file:  /root/mir_datasets/beatport_key/audio/1398692 Phantom Lord - Stay Puft (XUL Mix).mp3
40 files processed, current file:  /root/mir_datasets/beatport_key/audio/108844 Jason Sparks - Gangsters (Si Begg Remix).mp3
60 files processed, current file:  /root/mir_datasets/beatport_key/audio/6002570 Dalem Osuno - Door's Secret (Original Mix).mp3
80 files processed, current file:  /root/mir_datasets/beatport_key/audio/852917 Anton Lanski - Not Enough (Original Mix).mp3
100 files processed, current file:  /root/mir_datasets/beatport_key/audio/497517 Rufige Kru - Vip Riders Chost (The Origin) (Original Mix).mp3
120 files processed, current file:  /root/mir_datasets/beatport_key/audio/299497 DJ Rilla - Jah Hear Me (Original Mix).mp3
140 files processed, current file:  /root/mir_datasets/beatport_key/audio/2423336 Syndaesia - Bukkake (Hulk Remix).mp3
160 files processed, current file:  /root/mir_datasets/beatport_key/audio/4411864 Zhink - The River (Original Mix).mp

# TESTING

In [57]:
# Extracting features and writing in data.csv file in the segments folder
#  each line in the data.csv file represents a sample with features and the class information as the last element

if testing:
  data_file = 'gdrive/MyDrive/Term2/MIR/testing.csv'
  start = time.monotonic()
  file_count = 0
  iter = 0

  with open(data_file, 'w') as writer:
      #adding column names as the first line in csv
      line2write = ','.join(scalar_lowlevel_descriptors + ['tempo'] + ['key'] + ['genre'] + ['name']).replace('lowlevel.','') + '\n'
      writer.write(line2write)
      for iter in range(len(files_test)):
          file_count +=1
        
          filename = Path(files_test[iter].audio_path)
          
          #if file_count % 20 == 0: #print name of a file every 20 files
          print(file_count, "files processed, current file: ", filename)

          # Dividing in smaller mp3's
          sound = AudioSegment.from_mp3(filename)
          files_divided_list = []
          for i in range(1,7):
            part_sound = sound[(i-1)*20000:i*20000] #20000ms = 20s because we divide 120s by 6.
            part_sound_path = f'{filename.parents[0]}/{filename.stem}_{i}{filename.suffix}'
            part_sound.export(part_sound_path, format="mp3")
            #ipd.display(ipd.Audio(part_sound_path, rate=44100))
            files_divided_list.append(part_sound_path)
          
          print(files_divided_list)

          for part_file in files_divided_list:
            #Compute and write features for file
            features, features_frames = ess.MusicExtractor(lowlevelSilentFrames='drop',
                                                        lowlevelFrameSize = 2048,
                                                        lowlevelHopSize = 1024,
                                                        lowlevelStats = ['mean', 'stdev'],
                                                        mfccStats = ["mean", "cov"],
                                                        rhythmStats = ["mean", "var", "stdev", "median"],
                                                        tonalFrameSize = 2048,
                                                        tonalHopSize = 1024,
                                                        tonalSilentFrames = "drop",
                                                        tonalStats = ["mean", "var", "stdev", "median", "min", "max"])(part_file)
            selected_features = [features[descriptor] for descriptor in scalar_lowlevel_descriptors]
            
            if files_test[iter].tempo != None:
              bpm_from_json = str(int(files_test[iter].tempo))
            else:
              bpm_from_json = ''
            key_from_json = str(files_test[iter].key).replace(',','-')
            genre_from_json = str(files_test[iter].genres['genres'][0])
            name = Path(filename).name.replace(',','')
            line2write = str(selected_features)[1:-1] + ',' + name + '\n'
            line2write = f'{str(selected_features)[1:-1]},{bpm_from_json},{key_from_json},{genre_from_json},{name}\n'
            writer.write(line2write)
            print(f'Unlinking {part_file}')
            Path(part_file).unlink()


  print("A total of ", file_count, "files processed")
  end = time.monotonic()
  total = end - start
  print('This operation took {:.2f} seconds'.format(total))

1 files processed, current file:  /root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix).mp3
['/root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_1.mp3', '/root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_2.mp3', '/root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_3.mp3', '/root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_4.mp3', '/root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_5.mp3', '/root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_6.mp3']
Unlinking /root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is Mine (Original Mix)_1.mp3
Unlinking /root/mir_datasets/beatport_key/audio/299465 Plastic Operator - The Pleasure Is M