Whisper fine tuning event 2022 - script modification

Last setup what was used for training best fine tuned model for Whisper in HuggingFace Fine tuning event 2022.

DeepSpeed

First modification was to get access to bigger batch_size without gradient_accumulation_steps using DeepSpeed.

To make it run inside Docker, I've used guide from Zihao's blogpost.

Concatenation of input dataset

It was idea from Bayar. Whisper model uses 30 second batches, but Common Voice dataset is around 3-5 seconds of audio in each sample. We can concatenate audio and text together to fewer samples. To learn from more dense data. It should run faster and learn a lot more from each sample.

Other ideas

According to some details of training Large v2 model in Whisper paper I have some ideas to try in next steps.

SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition
More data collected from other datasets (google/fleurs, common voice 12/13) and combination with Farsipal multistreaming modification.
PyTorch 2.0 optimization
Collect custom dataset to get more training data
- creative commons filter on YouTube and videos with subtitles.
- download videos without subtitles, use whisper to get some and manually fix them ASR corpus creator

Thanks for Whisper Fine tuning event 2022

HuggingFace crew - for event itself and all support on discord
- Sanchit Gandhi,
- Vaibhav Srivastav - VB
LabmdaLabs - for all GPU hours (insane 20k+ !!!)
OpenAI for Whisper model

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
ds_config.json		ds_config.json
requirements.txt		requirements.txt
run.sh		run.sh
run_speech_recognition_seq2seq_streaming_mikr.py		run_speech_recognition_seq2seq_streaming_mikr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

ds_config.json

ds_config.json

requirements.txt

requirements.txt

run.sh

run.sh

run_speech_recognition_seq2seq_streaming_mikr.py

run_speech_recognition_seq2seq_streaming_mikr.py

Repository files navigation

Whisper fine tuning event 2022 - script modification

DeepSpeed

Concatenation of input dataset

Other ideas

Thanks for Whisper Fine tuning event 2022

About

Releases

Packages

Languages

techthiyanes/whisper-event-tuning

Folders and files

Latest commit

History

Repository files navigation

Whisper fine tuning event 2022 - script modification

DeepSpeed

Concatenation of input dataset

Other ideas

Thanks for Whisper Fine tuning event 2022

About

Resources

Stars

Watchers

Forks

Languages