A powerful and flexible audio transcription toolkit that processes audio files, transcribes them using OpenAI's Whisper model, and performs intelligent text formatting to produce high-quality transcriptions.
This repository contains tools for automated transcription of audio recordings, with specific optimizations for meeting audio. It solves several common challenges:
- Processing large audio files that exceed API size limits
- Handling long recordings efficiently through parallel processing
- Correcting technical terms and framework names in transcriptions
- Maintaining natural language flow while improving text readability
- Audio compression and format conversion
- Automatic bitrate optimization to meet size constraints
- Concurrent audio transcription for faster processing
- Intelligent correction of technical terms and framework names
- Comprehensive error handling and logging
- transcription.py: Core transcription tool for general audio files
- meet_transcription.py: Specialized tool for meeting recordings with advanced features
- config.ini: Configuration file for API keys and settings (not tracked in git)
- Compression & Conversion: Audio files are compressed and converted to compatible formats (MP3/MP4)
- Size Optimization: Files are automatically optimized to meet API size constraints by adjusting:
- Bitrate
- Sample rate
- Audio channels
- Audio Chunking: Larger files are split into manageable chunks
- Parallel Processing: Chunks are processed concurrently for improved speed
- Text Assembly: Transcribed chunks are assembled in the correct order
- Text Correction: Framework and library names are automatically corrected
- Formatting: Grammar and coherence are improved for better readability
- Python 3.7+
- OpenAI API key
- Required Python packages (see Installation)
-
Clone this repository:
git clone https://github.com/yourusername/speech-to-text.git cd speech-to-text
-
Install dependencies:
pip install openai pydub
-
Create a
config.ini
file with your OpenAI API key:[OPENAI] api_key = your_openai_api_key_here
For simple audio file transcription:
python transcription.py
This will:
- Load the audio file specified in the script (default:
audio.m4a
) - Compress and convert it if needed
- Transcribe the audio using OpenAI's Whisper model
- Format the transcription for readability
- Save the result to
transcricao_formatada.txt
For meeting recordings with technical discussions:
python meet_transcription.py
This advanced script will:
- Process the audio file with optimized settings for speech
- Split the audio into multiple chunks for parallel processing
- Transcribe all chunks concurrently
- Correct technical terms, framework names, and libraries
- Save both the raw and corrected transcriptions
You can customize the behavior by modifying:
- Audio file paths in the main functions
- Maximum file size limits (default: 20-25MB)
- Number of processing chunks and workers for parallel processing
- Language settings for transcription (default: Portuguese)
- Formatting prompts for text correction
Both scripts include comprehensive logging that records:
- Process steps and completion status
- File sizes and compression details
- Errors and exceptions
- Processing times and results
Logs are saved to transcription.log
and also output to the console.
The project uses pydub
for audio manipulation, which supports:
- Reading various audio formats
- Bitrate adjustment
- Sample rate conversion
- Audio chunking
Transcription is performed using OpenAI's Whisper model through the OpenAI API, with optimized parameters for accuracy.
The concurrent.futures
module enables efficient parallel processing of audio chunks, significantly reducing processing time for large files.
[Add your license information here]
Contributions, issues, and feature requests are welcome!
- OpenAI for providing the Whisper transcription model
- Contributors and developers of the pydub library