"Incorporating emotions into computer programs is crucial for creating human-like behaviors and appearances. Speech Emotion Recognition (SER) plays a vital role in identifying emotions in human voices. This research compares five different SER models for automated emotion recognition in natural spoken communication."
DATASET
The Ryersоn Аudiо-Visuаl Dаtаbаse оf Emоtiоnаl Sрeeсh аnd Sоng (RАVDESS) we have used RAVDESS dataset.
It contains1440 files: 60 trials/actor multiplied with 24 actors = 1440 trials. The RAVDESS consists of 24 professional voices (12 feminine, 12 masculine).
Happy, sad, angry, fearful, calm, disgust and surprise are the various speech emotion expressions used.
Every file out of 1440 files has an unique filename. The filename holds a 7-part numerical identifier (e.g., 03-02-05-01-02-02-11.wav).
Emotion 01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised
MODEL ARCHITECURE Speech signal
Feature extraction
Feature selection
CNN MODEL
Classification