Whisper Speedbench

This Python script utilizes ffmpeg and OpenAI's Whisper API to benchmark the transcription accuracy of audio files at various playback speeds. It speeds up audio files, transcribes them, and calculates the Word Error Rate (WER) from 1.1x to 4x speed increments. The normal 1.0x transcript serves as the baseline for accuracy comparison.

Setup

Environment Setup:
- Copy example.env to .env in the root directory and update it with your OpenAI API key:
```
cp example.env .env
# Open .env and replace your_openai_api_key_here with your actual OpenAI API key
```
- Ensure the .env file is not tracked by Git (included in .gitignore).
Install Dependencies:
```
pip install -r requirements.txt
```

Usage

Execute the script by running:

python benchmark.py path_to_your_audio_file

Optionally, specify the language code if the audio is not in English:
```
python benchmark.py path_to_your_audio_file --language es
```

This script converts the specified audio file to MP3 format, processes it at speeds ranging from 1.1x to 4x, transcribes the audio using the Whisper API, computes the WER for each speed, and outputs the results to a CSV file named benchmark_results.csv.

License

This project is licensed under the MIT License and is open-source.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
example.env		example.env
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper Speedbench

Setup

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

skl-fusion/whisper-speedbench

Folders and files

Latest commit

History

Repository files navigation

Whisper Speedbench

Setup

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages