# FeatureExtractionClassifier

Use various features extracted from the WAV file to classify genres.
Architecture implementation based on features derived at: https://www.kaggle.com/code/dramirdatascience/gtzan-music-classification-using-ml-acc-93-24

In [None]:
!pip install "ray[tune]"
import torch
import torch.nn as nn
import numpy as np
from utils import *
import torchvision.transforms as transforms
import torch.utils.data as Data
from scipy.io import wavfile
from ray import air
import os
from ray.tune.schedulers import ASHAScheduler

## Mount drive
Mount google drive if running on google colab

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Constant parameters used in training

Run `setup.sh` to mount Google Drive containing GTZAN

In [None]:
GTZAN_WAV = "/content/drive/MyDrive/GTZAN/Data/genres_original/"

GENRES = {'blues': 0, 'classical': 1, 'country': 2, 'disco': 3,
          'hiphop': 4, 'jazz': 5, 'metal': 6, 'pop': 7, 'reggae': 8,
          'rock': 9}

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device", DEVICE)

## Training

Create a `Dataset` for the audio files

Obtain the features we are intested in from the WAV file. There are a total of 57 distinct features that are provided in the CSVs attached along with GTZAN. Using my understanding from [This analysis by the dataset author](https://www.kaggle.com/code/dramirdatascience/gtzan-music-classification-using-ml-acc-93-24), I have described the use of each of those features and how they were obtained in brief. For more in-depth understanding of how the features were obtained, the author has attached source code as well.

We split each song into 10 parts as the features we extract are relevant only within a short time-frame, such as features regarding pitch/frequency.

### Chroma STFT

The same principle of using STFT to extract frequency magnitudes accross time-intervals as used in `mel_spec_classifier`. However, the frequency classes used now match the 12 pitch classes in western music instead. In effect, we end up looking at far less data and specifically the 12 pitches that are most useful when trying to understand genre.

We obtain the mean and variance w.r.t magnitude to determine the base pitch of the song.

### MFCC

Again, same principle of using STFT to extract frequency magnitudes accross time-intervals as used in `mel_spec_classifier`. We don't  just analyze the spectogram here, but instead obtain the mean and variance of frequencies to again determine features regarding the frequency of the song.

### RMS
Average power of the song. Not unreasonable to theorize that Metal/Rock would be 'louder' on average.

### Spectral Centroid Mean
'Centre of mass' of frequencies in a time interval. Understanding of 'brightness' of a song, as higher C.O.M -> higher frequency -> perceived brightness.

### Spectral Bandwith Mean
Provides information on mean spread of frequencies.

### Spectral Rolloff Mean
Another metric of audio signal bandwidth. Gives frequency bin under which 50% of total energy exists.

### Tempo Mean

Tempo Mean calculates avg. perceived tempo by beat tracking, which essentailly measures power overtime and deterining BPM from the periodicity.

### Zero Crossing Rate Mean

Average rate that audio signal crosses zero axis over time. Tells us about rapid changes in audio signal

### MFCC

The Mel Spec

