Problem running diarization and returning csv transcriptions #1

RicardoGrayson · 2021-12-09T12:36:39Z

Hi I'm trying to follow the video on youtube and I keep running into this issue when I start running my wav files (which i converted to mono). I'm running python 3.7 and dearpygui v0.6.415 on a windows OS and using google cloud services:

Uploading C:\Users\Robin\TTS-dataset-tools\sultansupreme-source\22050/sultan_18.wav to google cloud storage bucket C:\Users\Robin\PycharmProjects\pythonProject\venv\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning) Traceback (most recent call last): File "C:/Users/Robin/TTS-dataset-tools/tools.py", line 70, in run_google_speech_call builder.diarization(get_value("label_wav_file_transcribe"), get_value("input_storage_bucket"), get_value("input_project_name")) File "C:\Users\Robin\TTS-dataset-tools\dataset_builder.py", line 397, in diarization info = mediainfo(wavfile) File "C:\Users\Robin\PycharmProjects\pythonProject\venv\lib\site-packages\pydub\utils.py", line 334, in mediainfo res = Popen(command, stdout=PIPE) File "C:\Users\Robin\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 756, in __init__ restore_signals, start_new_session) File "C:\Users\Robin\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 1155, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

Any help would be appreciated. Thanks!

The text was updated successfully, but these errors were encountered:

RicardoGrayson · 2021-12-10T19:03:04Z

So I got the diarization to work, but as soon as it needs to start the transcription process and after splitting all the audio files, it crashes saying:
Traceback (most recent call last): File "tools.py", line 79, in run_dataset_builder_call builder.build_dataset() File "C:\Users\Robin\TTS-dataset-tools\dataset_builder.py", line 203, in build_dataset text = text.replace("%", " percent") UnboundLocalError: local variable 'text' referenced before assignment

I don't know how to assign the 'text' local variable in dataset_builder.py without conflicting with google cloud speech-to-text.
All help appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running diarization and returning csv transcriptions #1

Problem running diarization and returning csv transcriptions #1

RicardoGrayson commented Dec 9, 2021

RicardoGrayson commented Dec 10, 2021

Problem running diarization and returning csv transcriptions #1

Problem running diarization and returning csv transcriptions #1

Comments

RicardoGrayson commented Dec 9, 2021

RicardoGrayson commented Dec 10, 2021