# Speech Emotion Detection
In this project, I implemented a Speech Emotion Recognition (SER) system that classifies emotions from human speech. The system processes audio files by extracting important features such as MFCCs (Mel Frequency Cepstral Coefficients), which are commonly used for analyzing human speech.

 Using these features, I trained a Support Vector Machine (SVM) classifier to identify emotions like happiness, sadness, anger, and neutrality in speech. The project uses Python libraries such as librosa for feature extraction and scikit-learn for training and evaluating the machine learning model.



---
This code connects your Google Drive to Colab so that the dataset located on your drive becomes accessible for the program.


In [6]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


 The file path to where the .wav files (audio files) are stored in your Google Drive

In [10]:
# Set dataset path to the 'wav' folder
DATASET_PATH = '/content/drive/My Drive/Datasets/Speech/emoDB/wav'


In [11]:
import os

# Check if the dataset path contains files
print("Files in dataset directory:", os.listdir(DATASET_PATH))


Files in dataset directory: ['03a07Nc.wav', '03b01Fa.wav', '03a01Fa.wav', '03a05Nd.wav', '03a05Tc.wav', '03a07Fb.wav', '03a07Wc.wav', '03a07Fa.wav', '03a02Wb.wav', '03a01Nc.wav', '03a02Wc.wav', '03a04Ad.wav', '03a01Wa.wav', '03a02Nc.wav', '03a04Fd.wav', '03a05Aa.wav', '03a02Ta.wav', '03a04Ta.wav', '03b01Lb.wav', '03a04Nc.wav', '03a05Wb.wav', '03a02Fc.wav', '03a05Wa.wav', '03a04Lc.wav', '03a04Wc.wav', '03a07La.wav', '03a05Fc.wav', '03b01Nb.wav', '03b03Tc.wav', '03b10Na.wav', '08a01Na.wav', '03b10Wc.wav', '08a04Nc.wav', '08a04Tb.wav', '08a02Ab.wav', '03b09Nc.wav', '03b02Wb.wav', '08a02Wc.wav', '08a01Fd.wav', '03b09La.wav', '08a04La.wav', '03b02La.wav', '08a02La.wav', '08a01Wa.wav', '08a04Wc.wav', '03b10Wb.wav', '08a01Ab.wav', '08a02Ac.wav', '03b03Wc.wav', '03b10Ab.wav', '03b01Wa.wav', '08a02Fe.wav', '08a02Na.wav', '08a01Lc.wav', '08a02Tb.wav', '08a01Wc.wav', '03b02Aa.wav', '03b01Wc.wav', '03b09Tc.wav', '03b10Nc.wav', '03b02Tb.wav', '03b09Wa.wav', '03b03Nb.wav', '08a04Ff.wav', '03b10Ec.wa

In [12]:
!pip install librosa soundfile numpy scikit-learn




# Import Libraries

In [13]:
import os
import glob
import numpy as np
import librosa
import soundfile as sf
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report


# Feature Extraction Function
This function reads an audio file and extracts important features like MFCCs (Mel Frequency Cepstral Coefficients), Chroma, and Mel Spectrogram. These features represent the audio in a way that helps the machine learning model classify emotions.

In [14]:
def extract_feature(file_name, mfcc=True, chroma=True, mel=True):
    try:
        with sf.SoundFile(file_name) as sound_file:
            X = sound_file.read(dtype="float32")
            sample_rate = sound_file.samplerate
            result = np.array([])

            if mfcc:
                mfccs = librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40)
                result = np.hstack((result, np.mean(mfccs.T, axis=0)))
            if chroma:
                stft = np.abs(librosa.stft(X))
                chroma = librosa.feature.chroma_stft(S=stft, sr=sample_rate)
                result = np.hstack((result, np.mean(chroma.T, axis=0)))
            if mel:
                mel = librosa.feature.melspectrogram(y=X, sr=sample_rate)
                result = np.hstack((result, np.mean(mel.T, axis=0)))
            return result
    except Exception as e:
        print(f"Error processing {file_name}: {e}")
        return None


# Define Emotion Mapping and Load Data

The load_data() function loads the audio files, extracts features using the extract_feature() function, and maps each audio file to its corresponding emotion label based on the filename. It then splits the data into training and testing sets.

In [15]:
# Define emotions based on the filename convention
emotions = {
    'W': 'anger',
    'L': 'boredom',
    'E': 'disgust',
    'F': 'happiness',
    'N': 'neutral',
    'T': 'sadness',
    'A': 'anxiety'
}

# Only load specific emotions for simplicity
observed_emotions = ['happiness', 'sadness', 'neutral', 'anger']

# Function to load data
def load_data(test_size=0.2):
    x, y = [], []
    files = glob.glob(os.path.join(DATASET_PATH, "*.wav"))
    print(f"Number of audio files found: {len(files)}")  # Debugging output

    for file in files:
        file_name = os.path.basename(file)
        emotion = emotions.get(file_name[5], None)  # Extract emotion from the filename
        if emotion not in observed_emotions:
            continue
        feature = extract_feature(file, mfcc=True, chroma=True, mel=True)
        if feature is not None:
            x.append(feature)
            y.append(emotion)

    print(f"Number of samples loaded: {len(x)}")  # Debugging output
    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

# Load the dataset
x_train, x_test, y_train, y_test = load_data(test_size=0.25)


Number of audio files found: 535
Number of samples loaded: 339


# Train SVM Classifier

This section initializes and trains a Support Vector Machine (SVM) model on the training data, then evaluates its performance on the test data. It calculates the accuracy and prints a classification report.

In [16]:
# Initialize SVM model
model = SVC(kernel='linear', probability=True)

# Train the model
model.fit(x_train, y_train)

# Predict on the test set
y_pred = model.predict(x_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

# Print the detailed classification report
print(classification_report(y_test, y_pred))


Model Accuracy: 76.47%
              precision    recall  f1-score   support

       anger       0.75      0.84      0.79        32
   happiness       0.65      0.55      0.59        20
     neutral       0.75      0.94      0.83        16
     sadness       1.00      0.71      0.83        17

    accuracy                           0.76        85
   macro avg       0.79      0.76      0.76        85
weighted avg       0.78      0.76      0.76        85



Test with a New Audio File

In [18]:
# Path to a new audio file to test
new_audio_file = '/content/drive/My Drive/Datasets/Speech/emoDB/wav/03a01Wa.wav'  # Example file

# Extract features from the new audio file
features = extract_feature(new_audio_file, mfcc=True, chroma=True, mel=True).reshape(1, -1)

# Predict the emotion for the new audio file
predicted_emotion = model.predict(features)

# Output the predicted emotion
print(f"The predicted emotion for the test audio is: {predicted_emotion[0]}")


The predicted emotion for the test audio is: anger
