# Exploratory Data Analysis for Speech Emotion Recognition

In this notebook, we will perform exploratory data analysis (EDA) on the speech emotion recognition dataset. We will visualize the dataset and understand the distribution of different emotions.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import librosa
import librosa.display

# Set the style for seaborn
sns.set(style='whitegrid')

# Define the path to the dataset
data_path = '../data/raw/'  # Adjust the path as necessary

# Load the metadata (assuming a CSV file exists)
metadata_file = os.path.join(data_path, 'metadata.csv')
metadata = pd.read_csv(metadata_file)

# Display the first few rows of the metadata
metadata.head()

In [None]:
# Visualize the distribution of emotions
plt.figure(figsize=(10, 6))
sns.countplot(data=metadata, x='emotion')
plt.title('Distribution of Emotions')
plt.xlabel('Emotion')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Function to visualize a sample audio waveform
def plot_waveform(audio_file):
    y, sr = librosa.load(audio_file)
    plt.figure(figsize=(12, 4))
    librosa.display.waveshow(y, sr=sr)
    plt.title('Waveform of {}'.format(os.path.basename(audio_file)))
    plt.xlabel('Time (s)')
    plt.ylabel('Amplitude')
    plt.show()

# Example: Visualize a sample audio file
sample_audio_file = os.path.join(data_path, 'sample.wav')  # Replace with an actual file
plot_waveform(sample_audio_file)

## Conclusion

In this notebook, we explored the dataset and visualized the distribution of emotions. We also visualized a sample audio waveform. This analysis will help us understand the data better and guide our feature extraction and modeling processes.