# MoodWave: Voice-Driven Emotion Detection

## Data Guidelines

Your dataset must be:

- Appropriate for classification. It should have a categorical outcome or the data needed to engineer one.

- Usable to solve a specific business problem. This solution must rely on your classification model.

- Somewhat complex. It should contain a minimum of 1000 rows and 10 features.

- Unfamiliar. It can't be one we've already worked with during the course or that is commonly used for demonstration purposes (e.g. Titanic).

- Manageable. Stick to datasets that you can model using the techniques introduced in Phase 3.


### Phase 3 Concepts used in this project:

- Logistic Regression:

> Logistic regression is a fundamental classification algorithm that's well-suited for binary and multiclass classification tasks. It's a good choice if your dataset has clear decision boundaries.

- Decision Trees:

> Decision trees are versatile and interpretable models that can handle both categorical and continuous data. They are particularly useful when you want to understand the decision-making process of your model.

- Evaluation Metrics (Confusion Matrices, ROC Curves, AUC):

> These metrics are essential for assessing the performance of your classification model. They will help you understand how well your model distinguishes between different emotional states.

- Hyperparameter Tuning and Pruning:

> When using decision trees, tuning hyperparameters and pruning are important to avoid overfitting and to ensure your model generalizes well to new data.

## Data Preperation

Here 4 most popular datasets in English: Crema, Ravdess, Savee and Tess. Each of them contains audio in .wav format with some main labels.

Because our data isn't inherinantly in a csv / dataframe format, we will have to create it from scratch!

First, we will pull all data into their own dataframe, making note of *where* the file is, so we can pull our features from each audio file:

- Mel-frequency cepstral coefficients (MFCCs)
- Spectral centroid
- Chroma features
- Zero-crossing rate
- RMS energy
- Pitch

And then of course, our target feature: **Emotion**

In [25]:
import pandas as pd
import numpy as np
import warnings
import zipfile
import librosa
import os


In [2]:
# data is zipped, and stored in folders for which dataset they came from:

# Define the path to the zipped dataset
zip_file_path = 'dataset.zip'
extracted_folder_path = 'dataset'

# Unzip the dataset
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extracted_folder_path)

# Crema
# Ravdess
# Savee
# Tess

In [3]:
# Define the path to the Crema folder
crema_folder_path = os.path.join(extracted_folder_path, 'Crema')

# Verify that we can access the files and extract emotion labels
data = []

# Loop through each file in the Crema folder
for file_name in os.listdir(crema_folder_path):
    if file_name.endswith('.wav'):
        # Extract the emotion label from the filename
        parts = file_name.split('_')
        emotion_code = parts[2]
        
        # Map the emotion code to the actual emotion label
        emotion_map = {
            'SAD': 'sadness',
            'ANG': 'angry',
            'DIS': 'disgust',
            'FEA': 'fear',
            'HAP': 'happy',
            'NEU': 'neutral'
        }
        emotion_label = emotion_map.get(emotion_code, 'unknown')
        
        # Store the data with the directory path minus the filename
        data.append({'filename': file_name, 'emotion': emotion_label, 'path': crema_folder_path})

# Convert the data into a DataFrame for easy access
df_crema = pd.DataFrame(data)

# Display the first few rows to verify
print(df_crema.head())

              filename  emotion           path
0  1001_DFA_ANG_XX.wav    angry  dataset\Crema
1  1001_DFA_DIS_XX.wav  disgust  dataset\Crema
2  1001_DFA_FEA_XX.wav     fear  dataset\Crema
3  1001_DFA_HAP_XX.wav    happy  dataset\Crema
4  1001_DFA_NEU_XX.wav  neutral  dataset\Crema


In [4]:
# Define the path to the Tess folder
tess_folder_path = os.path.join(extracted_folder_path, 'Tess')

# Prepare to store the data
data = []

# Loop through each emotion folder in the Tess directory
for emotion_folder in os.listdir(tess_folder_path):
    # Get the full path to the emotion folder
    emotion_folder_path = os.path.join(tess_folder_path, emotion_folder)
    
    # Extract the emotion from the folder name (e.g., "OAF_angry" -> "angry")
    emotion_label = emotion_folder.split('_')[1]
    
    # Loop through each file in the emotion folder
    for file_name in os.listdir(emotion_folder_path):
        if file_name.endswith('.wav'):
            # Store the data with the directory path minus the filename
            data.append({
                'filename': file_name, 
                'emotion': emotion_label, 
                'path': emotion_folder_path
            })

# Convert the data into a DataFrame for easy access
df_tess = pd.DataFrame(data)

# Display the first few rows to verify
print(df_tess.head())

             filename emotion                    path
0  OAF_back_angry.wav   angry  dataset\Tess\OAF_angry
1   OAF_bar_angry.wav   angry  dataset\Tess\OAF_angry
2  OAF_base_angry.wav   angry  dataset\Tess\OAF_angry
3  OAF_bath_angry.wav   angry  dataset\Tess\OAF_angry
4  OAF_bean_angry.wav   angry  dataset\Tess\OAF_angry


In [5]:
# Define the path to the Savee folder
savee_folder_path = os.path.join(extracted_folder_path, 'Savee')

# Prepare to store the data
data = []

# Define the emotion mapping based on the prefixes
emotion_map = {
    'a': 'anger',
    'd': 'disgust',
    'f': 'fear',
    'h': 'happiness',
    'n': 'neutral',
    'sa': 'sadness',
    'su': 'surprise'
}

# Loop through each file in the Savee folder
for file_name in os.listdir(savee_folder_path):
    if file_name.endswith('.wav'):
        # Extract the prefix from the filename to determine the emotion
        prefix = file_name.split('_')[1][:2]
        
        # Map the prefix to the corresponding emotion
        emotion_label = emotion_map.get(prefix, 'unknown')
        
        # Store the data with the directory path minus the filename
        data.append({
            'filename': file_name, 
            'emotion': emotion_label, 
            'path': savee_folder_path
        })

# Convert the data into a DataFrame for easy access
df_savee = pd.DataFrame(data)

# Display the first few rows to verify
print(df_savee.head())

     filename  emotion           path
0  DC_a01.wav  unknown  dataset\Savee
1  DC_a02.wav  unknown  dataset\Savee
2  DC_a03.wav  unknown  dataset\Savee
3  DC_a04.wav  unknown  dataset\Savee
4  DC_a05.wav  unknown  dataset\Savee


In [6]:
# Define the path to the Ravdess folder
ravdess_folder_path = os.path.join(extracted_folder_path, 'Ravdess', 'audio_speech_actors_01-24')

# Prepare to store the data
data = []

# Define the emotion mapping based on the third component in the filename
emotion_map = {
    '01': 'neutral',
    '02': 'calm',
    '03': 'happy',
    '04': 'sad',
    '05': 'angry',
    '06': 'fearful',
    '07': 'disgust',
    '08': 'surprised'
}

# Loop through each actor's folder in the Ravdess directory
for actor_folder in os.listdir(ravdess_folder_path):
    actor_folder_path = os.path.join(ravdess_folder_path, actor_folder)
    
    # Loop through each file in the actor's folder
    for file_name in os.listdir(actor_folder_path):
        if file_name.endswith('.wav'):
            # Extract the third component from the filename to determine the emotion
            emotion_code = file_name.split('-')[2]
            
            # Map the emotion code to the corresponding emotion label
            emotion_label = emotion_map.get(emotion_code, 'unknown')
            
            # Store the data with the directory path minus the filename
            data.append({
                'filename': file_name, 
                'emotion': emotion_label, 
                'path': actor_folder_path
            })

# Convert the data into a DataFrame for easy access
df_ravdess = pd.DataFrame(data)

# Display the first few rows to verify
print(df_ravdess.head())

                   filename  emotion  \
0  03-01-01-01-01-01-01.wav  neutral   
1  03-01-01-01-01-02-01.wav  neutral   
2  03-01-01-01-02-01-01.wav  neutral   
3  03-01-01-01-02-02-01.wav  neutral   
4  03-01-02-01-01-01-01.wav     calm   

                                                path  
0  dataset\Ravdess\audio_speech_actors_01-24\Acto...  
1  dataset\Ravdess\audio_speech_actors_01-24\Acto...  
2  dataset\Ravdess\audio_speech_actors_01-24\Acto...  
3  dataset\Ravdess\audio_speech_actors_01-24\Acto...  
4  dataset\Ravdess\audio_speech_actors_01-24\Acto...  


### combining datasets 

We will merge the datsets into one dataframe, and assign unique identifiers
- Concatenate the DataFrames for each dataset.
- Assign a unique ID to each entry based on the dataset.

In [7]:
# Add a unique ID column to each dataset
df_crema['id'] = ['c_{:04d}'.format(i + 1) for i in range(len(df_crema))]
df_tess['id'] = ['t_{:04d}'.format(i + 1) for i in range(len(df_tess))]
df_savee['id'] = ['s_{:04d}'.format(i + 1) for i in range(len(df_savee))]
df_ravdess['id'] = ['r_{:04d}'.format(i + 1) for i in range(len(df_ravdess))]

# Merge the datasets into a single DataFrame
merged_data = pd.concat([df_crema, df_tess, df_savee, df_ravdess], ignore_index=True)

# Reorder columns to have 'id' as the first column
merged_data = merged_data[['id', 'filename', 'emotion', 'path']]

# Display the first few rows of the combined DataFrame
print(merged_data.head())

       id             filename  emotion           path
0  c_0001  1001_DFA_ANG_XX.wav    angry  dataset\Crema
1  c_0002  1001_DFA_DIS_XX.wav  disgust  dataset\Crema
2  c_0003  1001_DFA_FEA_XX.wav     fear  dataset\Crema
3  c_0004  1001_DFA_HAP_XX.wav    happy  dataset\Crema
4  c_0005  1001_DFA_NEU_XX.wav  neutral  dataset\Crema


In [8]:
# remember, we need at least 1000 rows to meet our requirements. 
print(f"Total rows in dataset: {merged_data.shape[0]}")

Total rows in dataset: 12162


### Extracting Features

Again, these are the features we will extract:

- **Mel-frequency cepstral coefficients (MFCCs):** Represents the short-term power spectrum of sound, commonly used in speech and audio processing to capture the timbral texture of audio.
- **Spectral centroid:** Indicates the "center of mass" of the spectrum and is often associated with the perceived brightness of a sound.
- **Chroma features:** Represents the 12 different pitch classes and captures harmonic and melodic characteristics of music / voice.
- **Zero-crossing rate:** Measures the rate at which the signal changes sign, giving insight into the noisiness or percussiveness of the sound.
- **RMS energy:** Reflects the root mean square of the audio signal and indicates the energy or loudness of the sound.
- **Pitch:** Refers to the perceived frequency of a sound, determining how high or low a sound is.

We will be using the `librosa` package to process these audio features. [Here](https://librosa.org/doc/latest/index.html) is a link to the librosa documentation.

**Note**: *adding suppression for UserWarning: Trying to estimate tuning from empty frequency set. This is likely do to either:* **silence / low energy** *(too quiet to perform reliable pitch estimation), or the file had* **too short of a duration**. *This warning shows up even when setting the pitch to 0 in this case.*
- 

In [32]:
def extract_features(file_path):
    y, sr = librosa.load(file_path, sr=None)
    
    # Extract MFCCs
    mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
    mfccs_mean = np.mean(mfccs, axis=1)

    # Extract Spectral Centroid
    spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
    spectral_centroid_mean = np.mean(spectral_centroid)

    # Extract Chroma Features
    chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
    chroma_stft_mean = np.mean(chroma_stft, axis=1)

    # Extract Zero-Crossing Rate
    zero_crossing_rate = librosa.feature.zero_crossing_rate(y)
    zero_crossing_rate_mean = np.mean(zero_crossing_rate)

    # Extract RMS Energy
    rms = librosa.feature.rms(y=y)
    rms_mean = np.mean(rms)

    # Extract Pitch
    pitches, magnitudes = librosa.core.piptrack(y=y, sr=sr)
    
    # Avoid estimating pitch from an empty frequency set
    if pitches.size > 0 and np.sum(pitches) > 0:
        pitch = np.mean(pitches[pitches > 0])
    else:
        pitch = 0  # too quiet to 

    return mfccs_mean, spectral_centroid_mean, chroma_stft_mean, zero_crossing_rate_mean, rms_mean, pitch

In [33]:
# testing our extract_features function:

first_row = merged_data.iloc[0]
file_path = os.path.join(first_row['path'], first_row['filename'])

# Extract features
features = extract_features(file_path)

# Print out each feature with its corresponding values
print("MFCCs:", features[0])
print("Spectral Centroid:", features[1])
print("Chroma Features:", features[2])
print("Zero-Crossing Rate:", features[3])
print("RMS Energy:", features[4])
print("Pitch:", features[5])

MFCCs: [-306.0274       92.670235      8.491312     23.965403      7.4779935
   -5.759455    -11.883088     -9.676736     -3.9967465   -13.352565
    0.40819725   -9.709486     -6.1271243 ]
Spectral Centroid: 1584.9930703294388
Chroma Features: [0.37491405 0.37949282 0.41722107 0.39018238 0.4148401  0.2977837
 0.28898865 0.3575554  0.35190624 0.42918485 0.6879576  0.5454907 ]
Zero-Crossing Rate: 0.10186767578125
RMS Energy: 0.041986194
Pitch: 1211.9507


### Validating the Values:

- **MFCCs:** Typically, MFCC values range from -400 to 400, depending on the scale of the input signal.
> All values: **pass**

- **Spectral Centroid:** This value represents the "center of mass" of the spectrum and typically ranges between 0 and the - Nyquist frequency (half the sampling rate).
> 1584.99: **pass**

- **Chroma Features:** These represent the energy distribution across 12 pitch classes. They are normalized, so values between 0 and 1 are expected.
> All values: **pass**

- **Zero-Crossing Rate:** This rate indicates how frequently the signal changes sign. It ranges from 0 to 1. 
> 0.1018: **pass**

- **RMS Energy:** This value should be within the range of 0 to 1 for normalized signals.
> 0.0419: **pass**

- **Pitch:** Pitch values are measured in Hz, and depends on the type of audio.
> 1211.95: **pass**

Now that we've validated our extract_features function, we can apply it to the rest of our dataframe.

**Notes**: 
- This cell can take a while to run! About 5 minutes
- suppressed UserWarning: Trying to estimate tuning from empty frequency set. 

In [None]:
# empty lists to store features
mfccs_list = []
spectral_centroid_list = []
chroma_list = []
zero_crossing_rate_list = []
rms_list = []
pitch_list = []

# Iterate over each row in the DataFrame
for index, row in merged_data.iterrows():
    file_path = os.path.join(row['path'], row['filename'])
    mfccs, spectral_centroid, chroma, zcr, rms, pitch = extract_features(file_path)
    
    mfccs_list.append(mfccs)
    spectral_centroid_list.append(spectral_centroid)
    chroma_list.append(chroma)
    zero_crossing_rate_list.append(zcr)
    rms_list.append(rms)
    pitch_list.append(pitch)

# Add the features to the DataFrame
merged_data['mfccs'] = mfccs_list
merged_data['spectral_centroid'] = spectral_centroid_list
merged_data['chroma'] = chroma_list
merged_data['zero_crossing_rate'] = zero_crossing_rate_list
merged_data['rms'] = rms_list
merged_data['pitch'] = pitch_list


In [15]:
merged_data.head()

Unnamed: 0,id,filename,emotion,path,mfccs,spectral_centroid,chroma,zero_crossing_rate,rms,pitch
0,c_0001,1001_DFA_ANG_XX.wav,angry,dataset\Crema,"[-306.0274, 92.670235, 8.491312, 23.965403, 7....",1584.99307,"[0.37491405, 0.37949282, 0.41722107, 0.3901823...",0.101868,0.041986,1211.950684
1,c_0002,1001_DFA_DIS_XX.wav,disgust,dataset\Crema,"[-346.39963, 95.83912, 10.516282, 31.619215, 1...",1531.650487,"[0.47289878, 0.4768195, 0.33598945, 0.34610763...",0.093061,0.015996,1256.617188
2,c_0003,1001_DFA_FEA_XX.wav,fear,dataset\Crema,"[-321.42026, 94.76091, 8.155397, 23.323242, 11...",1489.088839,"[0.3272673, 0.39935032, 0.35215598, 0.38248017...",0.084286,0.045776,992.574402
3,c_0004,1001_DFA_HAP_XX.wav,happy,dataset\Crema,"[-303.30374, 92.52889, 4.231231, 27.970133, 10...",1555.376035,"[0.3150873, 0.31478375, 0.30918238, 0.3423785,...",0.084878,0.0423,1102.953003
4,c_0005,1001_DFA_NEU_XX.wav,neutral,dataset\Crema,"[-335.4959, 100.39331, 9.384935, 30.160904, 11...",1495.394997,"[0.4112704, 0.36269408, 0.3349767, 0.32547352,...",0.082031,0.02045,1041.093628
