Transcriptor Lite - Console Edition

A simple, lightweight command-line tool to transcribe audio files to SRT subtitle format using OpenAI Whisper. Perfect for converting meetings, lectures, podcasts, and videos into searchable, timestamped subtitles.

Features:

🎙️ Transcribe audio to SRT subtitles with accurate timestamps
🌍 Auto-detect language and optionally translate to any language
📊 Smart audio chunking at natural silence points for better accuracy
🚀 Simple one-command usage
⚡ Progress tracking with real-time updates

Installation

Prerequisites

Python 3.10+ - Download here
FFmpeg - Required for audio processing
- macOS: brew install ffmpeg
- Ubuntu/Debian: sudo apt install ffmpeg
- Windows: Download from ffmpeg.org

Verify installation:

python3 --version   # Should be 3.10+
ffmpeg -version     # Should show version info

Setup Steps

Option 1: Quick Setup (Recommended)

# 1. Clone the repository
git clone https://github.com/tableaprogramming-rgb/transcriptor-lite.git
cd transcriptor-lite

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
python3 -m pip install openai-whisper pydub audioop-lts

# 4. Verify installation
python3 -c "import whisper; print('✅ Ready to use!')"

Option 2: Using requirements.txt

git clone https://github.com/tableaprogramming-rgb/transcriptor-lite.git
cd transcriptor-lite
python3 -m venv venv
source venv/bin/activate

# Create requirements file if needed
echo "openai-whisper" > requirements.txt
echo "pydub" >> requirements.txt
echo "audioop-lts" >> requirements.txt

python3 -m pip install -r requirements.txt

Quick Start

Basic Usage (Keep Original Language)

python3 console.py your_audio.mp3

Examples:

python3 console.py recording.m4a
python3 console.py /path/to/meeting.mp3
python3 console.py ~/Videos/lecture.wav

Output: Creates your_audio.srt in the same directory

Translate to English

python3 console.py your_audio.mp3 -t english

Translate to Another Language

python3 console.py spanish_audio.m4a -t english
python3 console.py french_audio.wav --translate english

Usage Guide

Basic Syntax

python3 console.py <audio_file> [OPTIONS]

Options

Option	Long Form	Description	Example
`-t`	`--translate`	Translate to specified language	`-t english`
`-h`	`--help`	Show help message	`--help`

Help Command

python3 console.py --help

Supported Audio Formats

Format	Extension	Notes
MP3	`.mp3`	Most common
M4A	`.m4a`	iPhone/iTunes audio
WAV	`.wav`	Uncompressed audio
FLAC	`.flac`	Lossless compressed
OGG	`.ogg`	Vorbis codec
Any FFmpeg format	`.mov`, `.avi`, `.mkv`, etc.	Video files work too

Real-World Examples

Example 1: Transcribe a meeting recording

python3 console.py Q1-budget-meeting.mp3
# Creates: Q1-budget-meeting.srt

Example 2: Transcribe and translate multilingual content

python3 console.py conference-with-spanish-speakers.wav -t english
# Creates: conference-with-spanish-speakers.srt (all in English)

Example 3: Extract audio from video and transcribe

# First extract audio from video
ffmpeg -i presentation.mp4 -q:a 0 -map a presentation.mp3

# Then transcribe
python3 console.py presentation.mp3

Processing Time Estimates

First Run: Whisper downloads the language model (~1.4GB) on first use. This is a one-time download.

Processing Speed: Varies based on hardware (CPU vs GPU). Below are realistic estimates for medium model on typical hardware:

Time Estimates by Audio Length

Audio Duration	CPU (Intel i7/Apple Silicon)	GPU (NVIDIA GTX 1080+)	Notes
5 minutes	1-2 min	20-30 sec	Fast turnaround
10 minutes	2-4 min	40 sec - 1 min	Typical podcast episode
15 minutes	3-5 min	1-1.5 min	Quick meeting
30 minutes	5-10 min	2-3 min	Standard meeting
1 hour	10-20 min	4-6 min	Long lecture
2 hours	20-40 min	8-12 min	Conference day

What to Expect

First run (one-time): 5-10 minutes extra for model download
Audio analysis: 2-5 seconds to analyze duration and chunks
Transcription: Varies by length (see table above)
Conversion to SRT: Usually < 1 second

Optimization Tips

For faster processing:

Use GPU if available (NVIDIA CUDA recommended)
Use smaller model size (if accuracy allows): --model tiny or --model small
Break very long files into smaller segments

For better accuracy:

Use "medium" or "large" model (default is medium)
Ensure good audio quality
Clear speech (not too much background noise)

Translation & Language Support

Default Behavior (No Translation)

Without the -t option, the tool transcribes in the original audio language:

python3 console.py spanish_podcast.mp3
# Output: Spanish subtitles

Translation Feature

With -t, content is transcribed AND translated to your specified language:

python3 console.py spanish_podcast.mp3 -t english
# Output: English subtitles (translated from Spanish)

Supported Languages

Whisper supports 99+ languages. Common ones:

European: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Greek, Czech Asian: Chinese (Mandarin), Japanese, Korean, Thai, Vietnamese, Filipino Middle Eastern: Arabic, Hebrew, Persian, Turkish African: Swahili, Amharic, Yoruba

See full list: Check Whisper documentation

Multi-Language Audio

If your audio contains multiple languages:

Without translation: Each language preserved as-is
With translation: All languages translated to target language

Example:

# Audio with Spanish + English mix
python3 console.py mixed_audio.wav -t spanish
# Output: Entire transcript in Spanish (English parts translated)

How It Works

The tool uses a multi-step pipeline for accurate transcription:

Audio Analysis - Analyzes file to determine duration and number of chunks needed
Smart Chunking - Splits audio into 5-minute segments at natural silence points (not mid-sentence)
Language Detection - Detects language for each chunk automatically
Transcription - Transcribes each chunk using Whisper's medium model
Optional Translation - Translates to target language if -t flag is used
Merging - Combines all chunks with correct timestamps
SRT Conversion - Creates standard SRT subtitle format
File Output - Saves .srt file in same directory as audio

Why Chunking Matters

Better Accuracy: Processing shorter segments individually improves accuracy
Memory Efficient: Prevents memory issues with very long files
Smart Boundaries: Silence detection finds natural pauses instead of cutting mid-word
Language Handling: Multi-language audio can be detected and translated per chunk

Output Format

Output File Location

The .srt file is created in the same directory as your audio file:

Input:  /path/to/recording.mp3
Output: /path/to/recording.srt

SRT Subtitle Format

Standard SubRip format with timestamps:

1
00:00:00,000 --> 00:00:05,230
First subtitle text

2
00:00:05,230 --> 00:00:12,450
Second subtitle text

3
00:00:12,450 --> 00:00:18,920
Third subtitle text

Example: Original Language

$ python3 console.py spanish_meeting.mp3

============================================================
🎙️  AUDIO TRANSCRIPTION TO SRT
============================================================

📁 Input:     spanish_meeting.mp3
📏 Size:      23.5 MB
🗣️  Language:  spanish

🔍 Analyzing audio...
   Duration:   00:45:30
   Chunks:     9

⏳ Transcribing...

[████████████████████████████] 100% - Transcribing chunk 9/9...

✅ Transcription complete!

📄 Output:    spanish_meeting.srt
⏱️  Duration:  2730 seconds (45.5 min)
🗣️  Language:  spanish
📊 Subtitles: 412 entries

Output file content (spanish_meeting.srt):

1
00:00:00,000 --> 00:00:05,230
Buenos días a todos, gracias por asistir

2
00:00:05,230 --> 00:00:12,450
Hoy vamos a discutir los resultados trimestrales

3
00:00:12,450 --> 00:00:18,920
Comencemos con la revisión de ingresos

Example: Translation to English

$ python3 console.py spanish_meeting.mp3 -t english

============================================================
🎙️  AUDIO TRANSCRIPTION TO SRT
============================================================

📁 Input:     spanish_meeting.mp3
📏 Size:      23.5 MB
🗣️  Translate: english

🔍 Analyzing audio...
   Duration:   00:45:30
   Chunks:     9

⏳ Transcribing...

[████████████████████████████] 100% - Transcribing chunk 9/9...

✅ Transcription complete!

📄 Output:    spanish_meeting.srt
⏱️  Duration:  2730 seconds (45.5 min)
🗣️  Language:  english
📊 Subtitles: 412 entries

Preview (first 3 subtitles):
------------------------------------------------------------
1
00:00:00,000 --> 00:00:05,230
Good morning everyone, thank you for attending

2
00:00:05,230 --> 00:00:12,450
Today we're going to discuss the quarterly results

3
00:00:12,450 --> 00:00:18,920
Let's start with the revenue review

Output file content (spanish_meeting.srt):

1
00:00:00,000 --> 00:00:05,230
Good morning everyone, thank you for attending

2
00:00:05,230 --> 00:00:12,450
Today we're going to discuss the quarterly results

3
00:00:12,450 --> 00:00:18,920
Let's start with the revenue review

Troubleshooting

Issues & Solutions

❌ "FFmpeg not found" or "ffmpeg: command not found"

Solution:

# Verify FFmpeg is installed
ffmpeg -version

# If not installed, install it:
# macOS:
brew install ffmpeg

# Ubuntu/Debian:
sudo apt-get update && sudo apt-get install ffmpeg

# Windows:
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg (with Chocolatey)

❌ "ModuleNotFoundError: No module named 'whisper'"

Solution:

# Make sure virtual environment is activated
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
python3 -m pip install --upgrade openai-whisper

❌ "Audio file not supported" or decode error

Solution: Convert to MP3 first

# Convert any audio format to MP3
ffmpeg -i your_audio.wav -q:a 0 -map a your_audio.mp3

# Then transcribe
python3 console.py your_audio.mp3

❌ "Permission denied" when running script

Solution:

# Make script executable (Mac/Linux)
chmod +x console.py

# Then run with python3
python3 console.py your_audio.mp3

❌ Processing is very slow

Possible causes and solutions:

Using CPU instead of GPU
- Install CUDA for GPU support: https://developer.nvidia.com/cuda-downloads
- Whisper will use GPU automatically if available
Large audio file
- Split into smaller files (see examples below)
- Processing time scales with audio length
Low RAM available
- Close other applications
- Process shorter files (< 1 hour at a time)

❌ "CUDA out of memory" (GPU processing)

Solution:

# Fall back to CPU
python3 console.py your_audio.mp3

❌ Poor transcription quality

Tips for better accuracy:

Use clear audio
- Minimize background noise
- Use good quality microphone recordings
Use larger model
- Current setup uses "medium" model
- For higher accuracy, could use "large" (slower, ~1.4GB more)
Check audio format
- MP3 or WAV recommended
- Some formats compress audio quality

❌ File not found error

Solution: Use full path or verify file exists

# Full path (works from anywhere)
python3 console.py /full/path/to/audio.mp3

# Or from same directory
ls audio.mp3  # Verify file exists first
python3 console.py audio.mp3

FAQs

Q: Do I need to use the Transcriptor project?

A: No, Transcriptor Lite is standalone. It only uses the OpenAI Whisper library. You don't need the full Transcriptor project installed.

Q: Can I use this with very long files (3+ hours)?

A: Yes, the audio is automatically chunked into 5-minute pieces. Processing will take longer but will work.

Q: Does it work offline?

A: Yes, once the Whisper model is downloaded (first run), everything works offline.

Q: Can I edit the SRT file after?

A: Yes! SRT files are plain text. Open with any text editor to manually adjust timestamps or text if needed.

Q: What video players support SRT files?

A: Most players support SRT:

VLC Media Player (desktop)
FFmpeg/ffprobe (command line)
Subtitle editors (Subtitle Edit, Aegisub)
Web players (with subtitle plugins)

Q: How accurate is the transcription?

A: Whisper's medium model is ~94% accurate with clear audio. Accuracy depends on:

Audio quality (background noise, clarity)
Speaker accent (English accents trained well)
Specialized vocabulary (medical/technical terms may need correction)

Q: Can I use a different Whisper model?

A: The current setup uses the "medium" model. You could modify console.py to use tiny, base, small, large for different speed/accuracy tradeoffs.

File Structure

transcriptor-lite/
├── console.py          # Main transcription script
├── README.md           # This file
└── .gitignore          # Git configuration

Related Projects

Transcriptor - Full Django web application with UI
- GitHub: transcriptor
- Includes SharePoint integration, job history, more features
OpenAI Whisper - The underlying transcription engine
- GitHub: openai/whisper

License

MIT License - Feel free to use this project for personal or commercial purposes.

Support & Feedback

🐛 Found a bug? Open an issue on GitHub
💡 Have a suggestion? Let us know!
⭐ Like this project? Give it a star!

Last Updated: April 2026 Tested with: Python 3.14, Whisper 1.0+, FFmpeg 6.0+

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
console.py		console.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Transcriptor Lite - Console Edition

Table of Contents

Installation

Prerequisites

Setup Steps

Quick Start

Basic Usage (Keep Original Language)

Translate to English

Translate to Another Language

Usage Guide

Basic Syntax

Options

Help Command

Supported Audio Formats

Real-World Examples

Processing Time Estimates

Time Estimates by Audio Length

What to Expect

Optimization Tips

Translation & Language Support

Default Behavior (No Translation)

Translation Feature

Supported Languages

Multi-Language Audio

How It Works

Why Chunking Matters

Output Format

Output File Location

SRT Subtitle Format

Example: Original Language

Example: Translation to English

Troubleshooting

Issues & Solutions

❌ "FFmpeg not found" or "ffmpeg: command not found"

❌ "ModuleNotFoundError: No module named 'whisper'"

❌ "Audio file not supported" or decode error

❌ "Permission denied" when running script

❌ Processing is very slow

❌ "CUDA out of memory" (GPU processing)

❌ Poor transcription quality

❌ File not found error

FAQs

Q: Do I need to use the Transcriptor project?

Q: Can I use this with very long files (3+ hours)?

Q: Does it work offline?

Q: Can I edit the SRT file after?

Q: What video players support SRT files?

Q: How accurate is the transcription?

Q: Can I use a different Whisper model?

File Structure

Related Projects

License

Support & Feedback

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages