# Composer Classification

## Abstract (Vy)

The aim of this project was to develop a machine learning model capable of differentiating the compositions of a classical music composer. Specifically, our focus centered on determining whether or not a given piece of classic music was composed by Beethoven. The MusicNet data set, consisting of 330 .wav files, served as the primary data source for training and evaluation. Convolutional Neural Network (CNN) models and Recurrent Neural Network (RNN) models were developed and trained using appropriate preprocessing techniques and feature extraction methods. The RNN models achieved an average accuracy rate of 76%, while the CNN models achieved an average accuracy rate of [insert rate here]. While the achieved accuracies fell below the desired results, this project provided valuable insights into the challenges of attributing classical music compositions.

GitHub Repository: [MLProject](https://github.com/vydiep/MLProject)

## Introduction

When listening to music, it's common to encounter an unfamiliar song that one wishes to quickly identify. To address this challenge, our project leverages the MusicNet dataset to classify classical music and classify whether a given piece is composed by Beethoven. This project aimed to tackle this seemingly small but persistent issue of identifying music, with the hope of contributing to the larger problem of identifying unknown music, which is beyond our current capabilities.  

Since composers may experiment and deviate from their established patterns in their music, there is no guarentee that there will be patterns in artists music audio files, thus aren't many resources on how others within the Machine Learning field addressed this problem. On the other hand, genre classification using music audio files is a widely explored problem with numerous approaches available. @costa2017evaluation used the ISMIR 2004 Database, a western music collection, the LMD database, a collection of Latin American music, and a collection of field recordings of ethnic African music to train a model to classify music genres. They found that the spectrograms of audio when used with a convolutional neural network performs just as well if not better than individual classifiers trained with visual representation on all datasets. @khamees2021classifying used the GTZAN dataset for music classification. In their studies, they created and compared the performance of CNN and RNN models. They found that CNN with Max-Pooling outperformed RNN with LSTM, yielding a testing accuracy of 74%, although it took longer. @kakarla2022recurrent also used the GTZAN dataset, worked with the MFCCs of audio files, and found that a 5-layed stacked independent RNN was able to achieve 84% accuracy. 

While our project may not be as complex as music genre classification, we used this research as guidance and created a variety of simple CNNs and RNNs with LSTM to work towards this problem of classifying the composer of classical music. 

## Values Statement

As avid music listeners, we were drawn to the idea of creating a project focused on analyzing music using its audio files. Our project, which works to classify a piece of classical music as Beethoven or not, could potentially be used by music theory researchers, classical music listeners, and anyone interested in exploring this genre. If we kept the labels indicating the original composer rather than changing it to "Other," we could potentially identify patterns and influences among different composers. It's also important to note that our models were only trained on classical music from Western composers, excluding non-Western composers and potentially perpetuate inequalities in the music industry. Misclassifications could also result in the improper attribution of credit to certain composers. While our project is still small in scale, we recognize that as it grows, it may require additional computational resources, which could introduce environmental concerns.  

While our project can do a better job of incorporating classical music from non-Western composers, we still believe that it could bring joy and value to the world of classical music. 

## Materials and Methods 


### Data (Vy)

We utilized the MusicNet data set, which encompassed a collection of 330 audio files of classical music, each with varying durations. Approximately 48% of the data set consisted of pieces composed by Beethoven, while the remaining files were by other composers. The audio files exhibited a diverse range of instrumentation.

### Approach (Vy)

To ensure a fair comparison of the models' ability to distinguish Beethoven's music, we standardized the data set by using only the first 45 seconds of each audio file. Furthermore, we divided this 45 seconds segment into 15 smaller pieces, each lasting 3 seconds. This subdivision allowed us to generate additional data points, which was beneficial for training and evaluating our neural network models. 

After segmenting the audio files, we obtained a total of 4950 pieces. To ensure an effective training process, we split the data using an 80-20 ratio. Specifcally, 80% of the data was allocated for training purposes, while the remaining was reserved for testing the models' performance. Within the training data, we further divided it into an 80-20 ratio, designating 80% for training and 20% for validation. 

To evaluate the performance of our models during training and validation phases, we tracked the history of loss and accuracy metrics. Given the large storage requirements of audio files, we conducted our work primarily on Google Colab.

#### CNN
The CNN models we developed utilized mel spectrograms, which are visual representations of audio data, as the input features. Mel spectrograms, also known as Mel-frequency spectrograms, share a similar structure to regular spectrograms, featuring a two-dimensional image format where the x-axis represents time and the y-axis represents frequency. The intensity of each point in the mel spectrogram image corresponds to the magnitude of the associated frequency component. Notably, mel spectrograms differ from regular spectrograms in that they apply a frequency warping known as mel scale, aligning the frequency representation to better match human auditory perception. For our targets, we used our composer labels. Given the straightfordward nature of our task, we proceeded to train two versions of our model, which consisted of four convolutional layers followed by a max-pooling layer. The two variations included one model with a dropout layer and another model without it.

#### RNN 
Another model we used to analyze our clips of music was a recursive neural network (RNN) with long short-term memory (LSTM) architecture, which are good at understanding order and analyzing sequential data. We utilized the mel-frequency cepstral coefficients (MFCC) of our audio clips as the input features. MFCCs are commonly used frequency domain features that represent the frequencies perceived by the human ear and capture the "brightness" of sound. For this model, we also used our composer labels as targets. Since our task was fairly simple, we trained four variations of simple LSTM models. We first experimented with 1 LSTM layer and 2 LSTM layers. Then to prevent overfitting, we experimented with 1 LSTM layer with a dropout layer and 2 LSTM layers with a dropout layer, bother with a of probability of 0.2. 

## Results 

### CNN

### RNN
Below are the history of our training and validation accuracy for our RNN models with 1 LSTM layer, 2 LSTM layers, 1 LSTM layer and a dropout layer, and 2 LSTM layers and a dropout layer.  
![Graph of training and validation accuracy and loss for RNN model with 1 LSTM layer](https://github.com/vydiep/MLProject/blob/main/RNN/Models/LSTM/LSTM-1-Layer-graph.png?raw=true)  
![Graph of training and validation accuracy and loss for RNN model with 2 LSTM layers](https://github.com/vydiep/MLProject/blob/main/RNN/Models/LSTM/LSTM-2-Layers-graph.png?raw=true)  
![Graph of training and validation accuracy and loss for RNN model with 1 LSTM layer and Dropout layer](https://github.com/vydiep/MLProject/blob/main/RNN/Models/LSTM-Dropout/LSTM-1-Layer-Dropout-graph.png?raw=true)  
![Graph of training and validation accuracy and loss for RNN model with 2 LSTM layers and Dropout layer](https://github.com/vydiep/MLProject/blob/main/RNN/Models/LSTM-Dropout/LSTM-2-Layers-Dropout-graph.png?raw=true)
As conveyed in the graphs, we trained for about 100 epochs, for that's when it looked like the training validation accuracy started to plateau, and were never able to surpass a training validation accuracy of 80%. For our model with 1 LSTM layer, we achieved a testing accuracy of 74%. For our model with 2 LSTM layers, we achieved a testing accuracy of 78%. For our model with 1 LSTM layer and a dropout layer, we achieved a testing accuracy of 76%. Lastly, for our model with 2 LSTM layers and a dropout we achieved a testing accuracy of 76%. While these accuracies are not the best, they are better than randomly guessing given that 48% of our data is Beethoven. 

We would also like to acknowledge that our results are based on one round of training on our data. In order to achieve more accurate results, we understand that training our models multiple times and averaging their results would provide a more accurate representation of how our models do. 

## Concluding Discussion

In what ways did our project work?
Did we meet the goals that we set at the beginning of the project?
How do our results compare to the results of others who have also studied similar problems?
If we had more time, data, or computational resources, what might we do differently in order to improve further?

## Group Contribution Statement 

The work for this project was fairly evenly distributed.  

Vy Diep wrote the code for the data prepartion and the CNN model training/experimentation (maybe add more context). For our presentation, she led the discussion on our approach and CNN results. For the blog post, she led the writing for the Abstract, Data, Approach, and CNN results sections. 

Katie Macalintal wrote the code for the RNN model training/experimentation. For our presentation, she led the discussion on the overview of our project and RNN results. For the blog post, she led the writing for the Introduction, Values Statement, and RNN results sections. 

There were also a handful of tasks that we'd do together, but only one person would commit. We would often clean up our GitHub repository together and for the blog post, we worked together to craft the concluding discussion. 

## Personal Reflections

What did you learn from the process of researching, implementing, and communicating about your project?
How do you feel about what you achieved? Did meet your initial goals? Did you exceed them or fall short? In what ways?
In what ways will you carry the experience of working on this project into your next courses, career stages, or personal life?