# Speech Emotion Recognition with Python

The objective of this project is to build an automated emotion recognition for audio and speech files. This project is usable for: 
1. customer care call centres to effectively analyse their calls, record the emotion that each call has so customers' level of satisfaction can be predicted based on the emotions from the calls.
2. customer order calls to identify if orders are urgent or not urgent based on the emotion that their voices carry
3. Policemen to automatically differentiate between fake callers and callers who are genuine and are in need of urgent help.

In this project, we will use the libraries librosa, soundfile, and sklearn (among others) to build a model using an MLPClassifier. This will be able to recognize emotion from sound files. We will load the data, extract features from it, then split the dataset into training and testing sets. Then, we’ll initialize an MLPClassifier and train the model. Finally, we’ll calculate the accuracy of our model.

### The Dataset
For this Python mini project, we’ll use the RAVDESS dataset; this is the Ryerson Audio-Visual Database of Emotional Speech and Song dataset. This dataset has 7356 files rated by 247 individuals 10 times on emotional validity, intensity, and genuineness. The entire dataset is 24.8GB from 24 actors, but we’ve lowered the sample rate on all the files,

In [1]:
!pip install librosa pyaudio soundfile

Collecting pyaudio
  Using cached PyAudio-0.2.11.tar.gz (37 kB)
Building wheels for collected packages: pyaudio
  Building wheel for pyaudio (setup.py): started
  Building wheel for pyaudio (setup.py): finished with status 'error'
  Running setup.py clean for pyaudio
Failed to build pyaudio
Installing collected packages: pyaudio
    Running setup.py install for pyaudio: started

  ERROR: Command errored out with exit status 1:
   command: 'C:\ProgramData\Anaconda3.1\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\oadepoju003\\AppData\\Local\\Temp\\pip-install-cwp6hocw\\pyaudio\\setup.py'"'"'; __file__='"'"'C:\\Users\\oadepoju003\\AppData\\Local\\Temp\\pip-install-cwp6hocw\\pyaudio\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\oadepoju003\AppData\Local\Temp\pip-wheel-vkd26nco'
       cwd: C:\Users\oadepoju003\AppData\Local\Temp\pip-install-cwp6hocw\pyaudio\
  Complete output (9 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.8
  copying src\pyaudio.py -> build\lib.win-amd64-3.8
  running build_ext
  building '_portaudio' extension
  error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Vi


    Running setup.py install for pyaudio: finished with status 'error'


In [2]:
import librosa
import soundfile
import os, glob, pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

In [3]:
#Extract features (mfcc, chroma, mel) from a sound file
def extract_feature(file_name, mfcc, chroma, mel):
    with soundfile.SoundFile(file_name) as sound_file:
        X = sound_file.read(dtype="float32")
        sample_rate=sound_file.samplerate
        if chroma:
            stft=np.abs(librosa.stft(X))
        result=np.array([])
        if mfcc:
            mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result=np.hstack((result, mfccs))
        if chroma:
            chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
            result=np.hstack((result, chroma))
        if mel:
            mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
            result=np.hstack((result, mel))
    return result

1) Extract as many features as possible with Librosa. As you extract more, more information is useful to build a stronger database
2) Get tutorials on how to extract features with librosa
3) 

In [4]:
#Creat a dictionary to get Emotions in the RAVDESS dataset
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}

#Create a list of Emotions to observe
observed_emotions=['calm', 'happy', 'fearful', 'disgust']

The emotion is decoded in the file name. The file name is in elements separated by a '-'. The third element in the file name is the decoder for the emotions. In the next code, we will subset the file name and separate the emotion decoder. 

In [5]:
def load_data(test_size=0.2):
    x,y=[],[]
    for file in glob.glob("C:\\Users\\oadepoju003\\Desktop\\Data science projects\\SER project\\Actor_*\\*.wav"):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]
        if emotion not in observed_emotions:
            continue
        feature=extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

Time to split the dataset into training and testing sets! Let’s keep the test set 25% of everything and use the load_data function for this.

In [6]:
#Split the dataset
x_train,x_test,y_train,y_test=load_data(test_size=0.25)


In [7]:
#Get the shape of the training and testing datasets
print((x_train.shape[0], x_test.shape[0]))

(576, 192)


In [8]:
#Get the number of features extracted
print(f'Features extracted: {x_train.shape[1]}')

Features extracted: 180


In [9]:
#Initialize the Multi Layer Perceptron Classifier
model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, 
                    hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)

In [10]:
#Train the model
model.fit(x_train,y_train)

MLPClassifier(alpha=0.01, batch_size=256, hidden_layer_sizes=(300,),
              learning_rate='adaptive', max_iter=500)

Let’s predict the values for the test set. This gives us y_pred (the predicted emotions for the features in the test set)

In [11]:
#Predict for the test set
y_pred=model.predict(x_test)

In [12]:
#Calculate the accuracy of our model
accuracy=accuracy_score(y_true=y_test, y_pred=y_pred)

#Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))

Accuracy: 75.52%


In this Python mini project, we learned to recognize emotions from speech. We used an MLPClassifier for this and made use of the soundfile library to read the sound file, and the librosa library to extract features from it. As you’ll see, the model delivered an accuracy of 72.4%. That’s good enough for us yet.