<a href="https://colab.research.google.com/github/tejask-42/Speech-Emotion-Recognition-Project/blob/main/Week_4/WiDS_Final_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
-------------- Real Project Time -------------
</div>

Use the `tensorflow.keras` library for building and training the models.

In [1]:
# Import libraries
import librosa
import matplotlib.pyplot as plt
from google.colab import drive
drive.mount("/content/drive")
import zipfile
import numpy as np
import os
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import models, layers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, LearningRateScheduler

Mounted at /content/drive


Define a function to extract features from the given audio file using the `librosa` library. You can vary the features however you like, but the preferred features are MFCC, chroma, and mel spectrogram.

In [2]:
def extract_features(wav_file, features_list):
    y, sr = librosa.load(wav_file, sr=None)
    features = []
    if 'mfcc' in features_list:
        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
        features.append(np.mean(mfcc, axis=1))  # Taking mean over time axis
    if 'chroma_stft' in features_list:
        chroma = librosa.feature.chroma_stft(y=y, sr=sr)
        features.append(np.mean(chroma, axis=1))
    if 'melspectrogram' in features_list:
        mel = librosa.feature.melspectrogram(y=y, sr=sr)
        features.append(np.mean(mel, axis=1))

    return np.concatenate(features)

Now, create a function to load the data from the downloaded audio files. Ensure that you handle the file paths and formats properly to enable smooth and efficient data loading.

In [3]:
# Function to extract label from the filename based on the given pattern
def extract_label(wav_file):
    filename = os.path.basename(wav_file).split('.')[0]
    emotion_map = {
        '01': 'neutral', '02': 'calm', '03': 'happy', '04': 'sad',
        '05': 'angry', '06': 'fearful', '07': 'disgust', '08': 'surprised'
    }

    filename_parts = filename.split('-')
    emotion_code = filename_parts[2]
    return emotion_map.get(emotion_code, 'unknown')

In [4]:
folder1_path = "/content/drive/MyDrive/Audio_Data/Audio_Song_Actors_01-24.zip"
folder2_path = "/content/drive/MyDrive/Audio_Data/Audio_Speech_Actors_01-24.zip"
with zipfile.ZipFile(folder1_path, "r") as zip_ref:
  zip_ref.extractall("/content/folder1")
with zipfile.ZipFile(folder2_path, "r") as zip_ref:
  zip_ref.extractall("/content/folder2")

In [5]:
# Function to load and split data
def load_data(test_size, features_list):
    wav_files = []
    labels = []

    for root, dirs, files in os.walk("/content/folder1"):
        for file in files:
            if file.endswith(".wav"):
                wav_files.append(os.path.join(root, file))
                labels.append(extract_label(os.path.join(root, file)))

    for root, dirs, files in os.walk("/content/folder2"):
        for file in files:
            if file.endswith(".wav"):
                wav_files.append(os.path.join(root, file))
                labels.append(extract_label(os.path.join(root, file)))

    train_files, test_files, train_labels, test_labels = train_test_split(wav_files, labels, test_size=test_size, random_state=42)

    train_data = []
    test_data = []

    for wav_file in train_files:
        features = extract_features(wav_file, features_list)
        train_data.append(features)

    for wav_file in test_files:
        features = extract_features(wav_file, features_list)
        test_data.append(features)

    train_data = np.array(train_data)
    test_data = np.array(test_data)

    # Label Encoding: Convert string labels to numeric
    label_encoder = LabelEncoder()
    train_labels = label_encoder.fit_transform(train_labels)
    test_labels = label_encoder.transform(test_labels)

    # One-hot encoding the labels for categorical cross-entropy
    train_labels = to_categorical(train_labels, num_classes=len(np.unique(train_labels)))
    test_labels = to_categorical(test_labels, num_classes=len(np.unique(test_labels)))

    return (train_data, train_labels), (test_data, test_labels)

Now, define the model using a simple neural network.
- The model should have a hidden layer with 300 nodes and an output layer with nodes corresponding to the number of emotions.
- Use ReLU for the hidden layer activation and Softmax for the output layer (feel free to experiment with other activation functions as well).
- Set the loss function to categorical cross-entropy, the optimizer to Adam, and the metric to accuracy.
- You can choose a batch size of 256 and 300 epochs, but these parameters are flexible and can be adjusted based on your needs.

Use the load_data function to load the audio data, and then split it using the `train_test_split` function.

In [6]:
# Split the dataset into training and testing data with testing data = 0.2 of total data
train_data, test_data = load_data(test_size=0.2, features_list=['mfcc', 'chroma_stft', 'melspectrogram'])
train_data, train_labels = train_data
test_data, test_labels = test_data

Now, everything’s easy-peezy!🎉 \
All you have to do is fit the model to the training data just like you always do, and then predict the results for the testing data. Once you've done that, print the accuracy and see how well your model performs!😎 It’s going to be awesome, I promise!

In [17]:
model = models.Sequential()
model.add(layers.BatchNormalization())
model.add(layers.Dense(300, activation='leaky_relu', input_shape=(153,)))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(128, activation='leaky_relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(8, activation='softmax'))  # 8 classes (for 8 emotions)
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=600, batch_size=32, validation_split=0.2)
test_loss, test_acc = model.evaluate(test_data, test_labels)
print('Test accuracy:', test_acc)

Epoch 1/600


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.2735 - loss: 1.8852 - val_accuracy: 0.2366 - val_loss: 2.4340
Epoch 2/600
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.4018 - loss: 1.5977 - val_accuracy: 0.2545 - val_loss: 2.2546
Epoch 3/600
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.4724 - loss: 1.4354 - val_accuracy: 0.3206 - val_loss: 1.8252
Epoch 4/600
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.4928 - loss: 1.3519 - val_accuracy: 0.4097 - val_loss: 1.6256
Epoch 5/600
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.5300 - loss: 1.2619 - val_accuracy: 0.4427 - val_loss: 1.5726
Epoch 6/600
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.5406 - loss: 1.2255 - val_accuracy: 0.4453 - val_loss: 1.4306
Epoch 7/600
[1m49/49[0m [32m━━━━━━━━━━━━━━━

In [None]:
# Tuning Parameters
best_acc = 0
for batch in [16, 32]:
  model = models.Sequential()
  model.add(layers.Dense(300, activation='relu', input_shape=(153,)))
  model.add(layers.Dense(8, activation='softmax'))
  model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
  model.fit(train_data, train_labels, epochs=1000, batch_size=batch, validation_split=0.2)
  test_loss, test_acc = model.evaluate(test_data, test_labels)
  if best_acc < test_acc:
    best_acc = test_acc
    best_epoch = batch
print(f"Best model achieved with {best_epoch} batch with test accuracy: {best_acc}")

Epoch 1/1000


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.1736 - loss: 13.3571 - val_accuracy: 0.2646 - val_loss: 2.5043
Epoch 2/1000
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.2814 - loss: 2.6380 - val_accuracy: 0.3130 - val_loss: 3.1744
Epoch 3/1000
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.3186 - loss: 2.5653 - val_accuracy: 0.3079 - val_loss: 2.6919
Epoch 4/1000
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.3218 - loss: 2.5066 - val_accuracy: 0.2494 - val_loss: 2.4925
Epoch 5/1000
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.3535 - loss: 2.1340 - val_accuracy: 0.3893 - val_loss: 2.0811
Epoch 6/1000
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.3783 - loss: 2.4033 - val_accuracy: 0.3257 - val_loss: 2.3861
Epoch 7/1000
[1m98/98[0m [32m━━━━━━━━

In [18]:
# Predict for testing data and printout the accuracy
test_loss, test_acc = model.evaluate(test_data, test_labels)
print('Test accuracy:', test_acc)

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7436 - loss: 0.9147 
Test accuracy: 0.7352342009544373


Now comes the fun learning part!😊 \
Here's a cool new step for you: after making your predictions, go ahead and print the classification report. It will give you a deeper insight into how well your model is performing across all the classes, highlighting the key metrics that show just how awesome your model really is!

In [21]:
# Print the classification report for the model
from sklearn.metrics import classification_report
y_pred = model.predict(test_data)
y_pred = np.argmax(y_pred, axis=1)
y_true = np.argmax(test_labels, axis=1)
print(classification_report(y_true, y_pred))

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
              precision    recall  f1-score   support

           0       0.83      0.93      0.88        69
           1       0.84      0.87      0.85        67
           2       0.68      0.54      0.60        39
           3       0.68      0.70      0.69        81
           4       0.73      0.73      0.73        84
           5       0.63      0.71      0.67        34
           6       0.71      0.71      0.71        77
           7       0.68      0.53      0.59        40

    accuracy                           0.74       491
   macro avg       0.72      0.71      0.71       491
weighted avg       0.73      0.74      0.73       491



Just for fun, why not explore some cool callbacks you can use in `model.fit()`? \
For resources, check out [Keras Callbacks Documentation](https://keras.io/api/callbacks/) and dive in on your own, or just ask ChatGPT for help! :)

Now, try enhancing your model by using the *EarlyStopping*, *LearningRateScheduler*, and *ModelCheckpoint* callbacks. These will help you control training better, avoid overfitting, and save your model at the best checkpoints. Have fun experimenting!✨

#### Have fun – you just completed the project!🎉 Great job and keep rocking!