# Fine-tuning Whisper on Speech Pathology Dataset

## Goal

The goal of the Cleft Palate project (name TBD) at Vanderbilt DSI is to classify audio clips of patients' voices as containing hypernasality (a speech impediment) or not. The patients with hypernasality can then be recommended for speech pathology intervention. This is currently evaluated by human speech pathologists, which requires access to these medical providers. Our hope is to train a model that can classify this speech impediment for expedited patient access to a speech pathologist.

## Model

This notebook trains a Support Vector Machine (SMV) and Random Forest (RF) model to act as a baseline for the LLM training.

## Data

The data in this notebook is publicly available voice recordings featuring hypernasality and control groups. In the future we hope to train our model on private patient data from Vanderbilt University Medical Center (VUMC).

In [None]:
!pip install torch
!pip install datasets
!pip install librosa
!pip install transformers

In [2]:
# import libraries
import datasets
from datasets import load_dataset, DatasetDict,  Audio, load_from_disk
import pandas as pd
import os
import glob
import librosa
import io
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report, accuracy_score
from transformers import WhisperModel, WhisperFeatureExtractor, AdamW
import torch
import torch.nn as nn
import torch.utils.data
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import f1_score, classification_report, accuracy_score

In [3]:
# prompt: mount google drive

from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


### Load data

In [None]:
# load data from disk
train_audio_dataset = load_from_disk('../data/public_samples/train_dataset')
test_audio_dataset = load_from_disk('../data/public_samples/test_dataset')
val_audio_dataset = load_from_disk('../data/public_samples/val_dataset')

### SVM

Support Vector Machine

In [86]:
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler


# Define a function to extract MFCCs from an audio file
def extract_mfcc_features(file_path, n_mfcc=13):
    audio, sample_rate = librosa.load(file_path, sr=None)
    mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=n_mfcc)
    mfccs_scaled = np.mean(mfccs.T, axis=0)  # Taking the average across time
    return mfccs_scaled

# Paths to your audio files (replace these with your actual file paths)
audio_files = train_full_paths + test_full_paths  # Add more paths as needed
labels = train_labels + test_labels  # Corresponding labels for your audio files

# Extract features from each audio file
features = [extract_mfcc_features(file) for file in audio_files]

# Split the dataset into training and testing sets
X_train, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=42)
x_train, x_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3, random_state=42)

# Standardize features by removing the mean and scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(x_train)
X_test = scaler.transform(x_test)

# Initialize and train the SVM classifier
svm_model = SVC(kernel='linear')  # You can experiment with different kernels
svm_model.fit(x_train, y_train)

# Predictions
y_pred = svm_model.predict(x_val)

# Evaluate the model
print("Accuracy:", accuracy_score(y_val, y_pred))
print("Classification Report:", classification_report(y_val, y_pred))


Accuracy: 0.8717948717948718
Classification Report:               precision    recall  f1-score   support

         0.0       0.89      0.84      0.86        19
         1.0       0.86      0.90      0.88        20

    accuracy                           0.87        39
   macro avg       0.87      0.87      0.87        39
weighted avg       0.87      0.87      0.87        39



### Random Forest


In [88]:
from sklearn.ensemble import RandomForestClassifier
# Initialize and train the Random Forest classifier
rf_model = RandomForestClassifier(n_estimators=100)  # You can adjust the number of trees
rf_model.fit(x_train, y_train)

# Make predictions - VAL
y_pred = rf_model.predict(x_val)

# Evaluate the classifier
print("Accuracy:", accuracy_score(y_val, y_pred))
print("Classification Report:", classification_report(y_val, y_pred))

Accuracy: 0.9230769230769231
Classification Report:               precision    recall  f1-score   support

         0.0       0.94      0.89      0.92        19
         1.0       0.90      0.95      0.93        20

    accuracy                           0.92        39
   macro avg       0.92      0.92      0.92        39
weighted avg       0.92      0.92      0.92        39

