A simple, lightweight command-line tool to transcribe audio files to SRT subtitle format using OpenAI Whisper. Perfect for converting meetings, lectures, podcasts, and videos into searchable, timestamped subtitles.
Features:
- ποΈ Transcribe audio to SRT subtitles with accurate timestamps
- π Auto-detect language and optionally translate to any language
- π Smart audio chunking at natural silence points for better accuracy
- π Simple one-command usage
- β‘ Progress tracking with real-time updates
-
Python 3.10+ - Download here
-
FFmpeg - Required for audio processing
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt install ffmpeg - Windows: Download from ffmpeg.org
- macOS:
-
Verify installation:
python3 --version # Should be 3.10+ ffmpeg -version # Should show version info
Option 1: Quick Setup (Recommended)
# 1. Clone the repository
git clone https://github.com/tableaprogramming-rgb/transcriptor-lite.git
cd transcriptor-lite
# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
python3 -m pip install openai-whisper pydub audioop-lts
# 4. Verify installation
python3 -c "import whisper; print('β
Ready to use!')"Option 2: Using requirements.txt
git clone https://github.com/tableaprogramming-rgb/transcriptor-lite.git
cd transcriptor-lite
python3 -m venv venv
source venv/bin/activate
# Create requirements file if needed
echo "openai-whisper" > requirements.txt
echo "pydub" >> requirements.txt
echo "audioop-lts" >> requirements.txt
python3 -m pip install -r requirements.txtpython3 console.py your_audio.mp3Examples:
python3 console.py recording.m4a
python3 console.py /path/to/meeting.mp3
python3 console.py ~/Videos/lecture.wavOutput: Creates your_audio.srt in the same directory
python3 console.py your_audio.mp3 -t englishpython3 console.py spanish_audio.m4a -t english
python3 console.py french_audio.wav --translate englishpython3 console.py <audio_file> [OPTIONS]| Option | Long Form | Description | Example |
|---|---|---|---|
-t |
--translate |
Translate to specified language | -t english |
-h |
--help |
Show help message | --help |
python3 console.py --help| Format | Extension | Notes |
|---|---|---|
| MP3 | .mp3 |
Most common |
| M4A | .m4a |
iPhone/iTunes audio |
| WAV | .wav |
Uncompressed audio |
| FLAC | .flac |
Lossless compressed |
| OGG | .ogg |
Vorbis codec |
| Any FFmpeg format | .mov, .avi, .mkv, etc. |
Video files work too |
Example 1: Transcribe a meeting recording
python3 console.py Q1-budget-meeting.mp3
# Creates: Q1-budget-meeting.srtExample 2: Transcribe and translate multilingual content
python3 console.py conference-with-spanish-speakers.wav -t english
# Creates: conference-with-spanish-speakers.srt (all in English)Example 3: Extract audio from video and transcribe
# First extract audio from video
ffmpeg -i presentation.mp4 -q:a 0 -map a presentation.mp3
# Then transcribe
python3 console.py presentation.mp3First Run: Whisper downloads the language model (~1.4GB) on first use. This is a one-time download.
Processing Speed: Varies based on hardware (CPU vs GPU). Below are realistic estimates for medium model on typical hardware:
| Audio Duration | CPU (Intel i7/Apple Silicon) | GPU (NVIDIA GTX 1080+) | Notes |
|---|---|---|---|
| 5 minutes | 1-2 min | 20-30 sec | Fast turnaround |
| 10 minutes | 2-4 min | 40 sec - 1 min | Typical podcast episode |
| 15 minutes | 3-5 min | 1-1.5 min | Quick meeting |
| 30 minutes | 5-10 min | 2-3 min | Standard meeting |
| 1 hour | 10-20 min | 4-6 min | Long lecture |
| 2 hours | 20-40 min | 8-12 min | Conference day |
- First run (one-time): 5-10 minutes extra for model download
- Audio analysis: 2-5 seconds to analyze duration and chunks
- Transcription: Varies by length (see table above)
- Conversion to SRT: Usually < 1 second
For faster processing:
- Use GPU if available (NVIDIA CUDA recommended)
- Use smaller model size (if accuracy allows):
--model tinyor--model small - Break very long files into smaller segments
For better accuracy:
- Use "medium" or "large" model (default is medium)
- Ensure good audio quality
- Clear speech (not too much background noise)
Without the -t option, the tool transcribes in the original audio language:
python3 console.py spanish_podcast.mp3
# Output: Spanish subtitlesWith -t, content is transcribed AND translated to your specified language:
python3 console.py spanish_podcast.mp3 -t english
# Output: English subtitles (translated from Spanish)Whisper supports 99+ languages. Common ones:
European: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Greek, Czech Asian: Chinese (Mandarin), Japanese, Korean, Thai, Vietnamese, Filipino Middle Eastern: Arabic, Hebrew, Persian, Turkish African: Swahili, Amharic, Yoruba
See full list: Check Whisper documentation
If your audio contains multiple languages:
- Without translation: Each language preserved as-is
- With translation: All languages translated to target language
Example:
# Audio with Spanish + English mix
python3 console.py mixed_audio.wav -t spanish
# Output: Entire transcript in Spanish (English parts translated)The tool uses a multi-step pipeline for accurate transcription:
- Audio Analysis - Analyzes file to determine duration and number of chunks needed
- Smart Chunking - Splits audio into 5-minute segments at natural silence points (not mid-sentence)
- Language Detection - Detects language for each chunk automatically
- Transcription - Transcribes each chunk using Whisper's
mediummodel - Optional Translation - Translates to target language if
-tflag is used - Merging - Combines all chunks with correct timestamps
- SRT Conversion - Creates standard SRT subtitle format
- File Output - Saves
.srtfile in same directory as audio
- Better Accuracy: Processing shorter segments individually improves accuracy
- Memory Efficient: Prevents memory issues with very long files
- Smart Boundaries: Silence detection finds natural pauses instead of cutting mid-word
- Language Handling: Multi-language audio can be detected and translated per chunk
The .srt file is created in the same directory as your audio file:
Input: /path/to/recording.mp3
Output: /path/to/recording.srt
Standard SubRip format with timestamps:
1
00:00:00,000 --> 00:00:05,230
First subtitle text
2
00:00:05,230 --> 00:00:12,450
Second subtitle text
3
00:00:12,450 --> 00:00:18,920
Third subtitle text
$ python3 console.py spanish_meeting.mp3
============================================================
ποΈ AUDIO TRANSCRIPTION TO SRT
============================================================
π Input: spanish_meeting.mp3
π Size: 23.5 MB
π£οΈ Language: spanish
π Analyzing audio...
Duration: 00:45:30
Chunks: 9
β³ Transcribing...
[ββββββββββββββββββββββββββββ] 100% - Transcribing chunk 9/9...
β
Transcription complete!
π Output: spanish_meeting.srt
β±οΈ Duration: 2730 seconds (45.5 min)
π£οΈ Language: spanish
π Subtitles: 412 entries
Output file content (spanish_meeting.srt):
1
00:00:00,000 --> 00:00:05,230
Buenos dΓas a todos, gracias por asistir
2
00:00:05,230 --> 00:00:12,450
Hoy vamos a discutir los resultados trimestrales
3
00:00:12,450 --> 00:00:18,920
Comencemos con la revisiΓ³n de ingresos
$ python3 console.py spanish_meeting.mp3 -t english
============================================================
ποΈ AUDIO TRANSCRIPTION TO SRT
============================================================
π Input: spanish_meeting.mp3
π Size: 23.5 MB
π£οΈ Translate: english
π Analyzing audio...
Duration: 00:45:30
Chunks: 9
β³ Transcribing...
[ββββββββββββββββββββββββββββ] 100% - Transcribing chunk 9/9...
β
Transcription complete!
π Output: spanish_meeting.srt
β±οΈ Duration: 2730 seconds (45.5 min)
π£οΈ Language: english
π Subtitles: 412 entries
Preview (first 3 subtitles):
------------------------------------------------------------
1
00:00:00,000 --> 00:00:05,230
Good morning everyone, thank you for attending
2
00:00:05,230 --> 00:00:12,450
Today we're going to discuss the quarterly results
3
00:00:12,450 --> 00:00:18,920
Let's start with the revenue review
Output file content (spanish_meeting.srt):
1
00:00:00,000 --> 00:00:05,230
Good morning everyone, thank you for attending
2
00:00:05,230 --> 00:00:12,450
Today we're going to discuss the quarterly results
3
00:00:12,450 --> 00:00:18,920
Let's start with the revenue review
Solution:
# Verify FFmpeg is installed
ffmpeg -version
# If not installed, install it:
# macOS:
brew install ffmpeg
# Ubuntu/Debian:
sudo apt-get update && sudo apt-get install ffmpeg
# Windows:
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg (with Chocolatey)Solution:
# Make sure virtual environment is activated
source venv/bin/activate # or venv\Scripts\activate on Windows
# Reinstall dependencies
python3 -m pip install --upgrade openai-whisperSolution: Convert to MP3 first
# Convert any audio format to MP3
ffmpeg -i your_audio.wav -q:a 0 -map a your_audio.mp3
# Then transcribe
python3 console.py your_audio.mp3Solution:
# Make script executable (Mac/Linux)
chmod +x console.py
# Then run with python3
python3 console.py your_audio.mp3Possible causes and solutions:
-
Using CPU instead of GPU
- Install CUDA for GPU support: https://developer.nvidia.com/cuda-downloads
- Whisper will use GPU automatically if available
-
Large audio file
- Split into smaller files (see examples below)
- Processing time scales with audio length
-
Low RAM available
- Close other applications
- Process shorter files (< 1 hour at a time)
Solution:
# Fall back to CPU
python3 console.py your_audio.mp3Tips for better accuracy:
-
Use clear audio
- Minimize background noise
- Use good quality microphone recordings
-
Use larger model
- Current setup uses "medium" model
- For higher accuracy, could use "large" (slower, ~1.4GB more)
-
Check audio format
- MP3 or WAV recommended
- Some formats compress audio quality
Solution: Use full path or verify file exists
# Full path (works from anywhere)
python3 console.py /full/path/to/audio.mp3
# Or from same directory
ls audio.mp3 # Verify file exists first
python3 console.py audio.mp3A: No, Transcriptor Lite is standalone. It only uses the OpenAI Whisper library. You don't need the full Transcriptor project installed.
A: Yes, the audio is automatically chunked into 5-minute pieces. Processing will take longer but will work.
A: Yes, once the Whisper model is downloaded (first run), everything works offline.
A: Yes! SRT files are plain text. Open with any text editor to manually adjust timestamps or text if needed.
A: Most players support SRT:
- VLC Media Player (desktop)
- FFmpeg/ffprobe (command line)
- Subtitle editors (Subtitle Edit, Aegisub)
- Web players (with subtitle plugins)
A: Whisper's medium model is ~94% accurate with clear audio. Accuracy depends on:
- Audio quality (background noise, clarity)
- Speaker accent (English accents trained well)
- Specialized vocabulary (medical/technical terms may need correction)
A: The current setup uses the "medium" model. You could modify console.py to use tiny, base, small, large for different speed/accuracy tradeoffs.
transcriptor-lite/
βββ console.py # Main transcription script
βββ README.md # This file
βββ .gitignore # Git configuration
-
Transcriptor - Full Django web application with UI
- GitHub: transcriptor
- Includes SharePoint integration, job history, more features
-
OpenAI Whisper - The underlying transcription engine
- GitHub: openai/whisper
MIT License - Feel free to use this project for personal or commercial purposes.
- π Found a bug? Open an issue on GitHub
- π‘ Have a suggestion? Let us know!
- β Like this project? Give it a star!
Last Updated: April 2026 Tested with: Python 3.14, Whisper 1.0+, FFmpeg 6.0+