## Transcription of Dysarthric Speakers with regular ASR Models 

This notebook shows the results of using a regular ASR model, whisper small.en to transcribe the speech of a dysarthric patient. 

In [1]:
import torch
import whisper
import json
from pathlib import Path
import numpy as np
import os 
from model_loader import load_model_simple, WhisperModelLoader
import wave 
import whisper

In [2]:
def get_wav_length(path):
    with wave.open(path, 'r') as wf:
        frames = wf.getnframes()
        rate = wf.getframerate()
        return frames / float(rate)

In [3]:
model = whisper.load_model("small.en")


In [4]:
audio_dir_path = os.path.expanduser("~/torgo_data/M02/Session1/wav_headMic")
second_path = os.path.expanduser("~/torgo_data/M02/Session1/wav_headMic")
file_lists = [f for f in os.listdir(audio_dir_path) if os.path.isfile(os.path.join(audio_dir_path, f))]
second_list = [f for f in os.listdir(second_path) if os.path.isfile(os.path.join(second_path, f))]

In [26]:
def infer(audio_dir_path, audio_file, prompt_path):
    if prompt_path is None:
        pass
    else:
        audio_path = Path(audio_file)
        print(audio_path)
        txt_file = audio_path.with_suffix(".txt")

        result = model.transcribe(os.path.join(audio_dir_path, audio_file))
        # open text files 
        with open(os.path.join(prompt_path, txt_file)) as txt_file: 
            contents = txt_file.read()
            print(f"Transcribed: {result['text']}, Ground Truth: {contents}")

In [27]:
prompt_path = os.path.expanduser("~/torgo_data/M02/Session1/prompts")

for index, f in enumerate(file_lists):
    print(f"index: {index}")
    infer(audio_dir_path, f, prompt_path)

index: 0
0013.wav
Transcribed:  I like the watch and dishes and the guy putting away dishes and falling off his stool., Ground Truth: input/images/kitchen.jpg
index: 1
0137.wav
Transcribed:  Everything went real smooth, the sheriff said., Ground Truth: Everything went real smooth, the sheriff said. 
index: 2
0220.wav
Transcribed:  feel, Ground Truth: fear
index: 3
0130.wav
Transcribed:  I try to tell people in the community., Ground Truth: I tried to tell people in the community.
index: 4
0227.wav
Transcribed:  much, Ground Truth: much
index: 5
0068.wav
Transcribed:  Immortal, Ground Truth: floor
index: 6
0014.wav
Transcribed:  very good, Ground Truth: dagger
index: 7
0229.wav
Transcribed: , Ground Truth: [relax your mouth in its normal position]
index: 8
0066.wav
Transcribed:  Yet he still thinks that's as ugly as ever., Ground Truth: yet he still thinks as swiftly as ever.
index: 9
0142.wav
Transcribed:  above, Ground Truth: whoop
index: 10
0193.wav
Transcribed:  Go!, Ground Truth: t