Skip to content

tableaprogramming-rgb/transcriptor-lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Transcriptor Lite - Console Edition

A simple, lightweight command-line tool to transcribe audio files to SRT subtitle format using OpenAI Whisper. Perfect for converting meetings, lectures, podcasts, and videos into searchable, timestamped subtitles.

Features:

  • πŸŽ™οΈ Transcribe audio to SRT subtitles with accurate timestamps
  • 🌍 Auto-detect language and optionally translate to any language
  • πŸ“Š Smart audio chunking at natural silence points for better accuracy
  • πŸš€ Simple one-command usage
  • ⚑ Progress tracking with real-time updates

Table of Contents

  1. Installation
  2. Quick Start
  3. Usage Guide
  4. Processing Time Estimates
  5. Supported Languages
  6. Troubleshooting

Installation

Prerequisites

  1. Python 3.10+ - Download here

  2. FFmpeg - Required for audio processing

  3. Verify installation:

    python3 --version   # Should be 3.10+
    ffmpeg -version     # Should show version info

Setup Steps

Option 1: Quick Setup (Recommended)

# 1. Clone the repository
git clone https://github.com/tableaprogramming-rgb/transcriptor-lite.git
cd transcriptor-lite

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
python3 -m pip install openai-whisper pydub audioop-lts

# 4. Verify installation
python3 -c "import whisper; print('βœ… Ready to use!')"

Option 2: Using requirements.txt

git clone https://github.com/tableaprogramming-rgb/transcriptor-lite.git
cd transcriptor-lite
python3 -m venv venv
source venv/bin/activate

# Create requirements file if needed
echo "openai-whisper" > requirements.txt
echo "pydub" >> requirements.txt
echo "audioop-lts" >> requirements.txt

python3 -m pip install -r requirements.txt

Quick Start

Basic Usage (Keep Original Language)

python3 console.py your_audio.mp3

Examples:

python3 console.py recording.m4a
python3 console.py /path/to/meeting.mp3
python3 console.py ~/Videos/lecture.wav

Output: Creates your_audio.srt in the same directory

Translate to English

python3 console.py your_audio.mp3 -t english

Translate to Another Language

python3 console.py spanish_audio.m4a -t english
python3 console.py french_audio.wav --translate english

Usage Guide

Basic Syntax

python3 console.py <audio_file> [OPTIONS]

Options

Option Long Form Description Example
-t --translate Translate to specified language -t english
-h --help Show help message --help

Help Command

python3 console.py --help

Supported Audio Formats

Format Extension Notes
MP3 .mp3 Most common
M4A .m4a iPhone/iTunes audio
WAV .wav Uncompressed audio
FLAC .flac Lossless compressed
OGG .ogg Vorbis codec
Any FFmpeg format .mov, .avi, .mkv, etc. Video files work too

Real-World Examples

Example 1: Transcribe a meeting recording

python3 console.py Q1-budget-meeting.mp3
# Creates: Q1-budget-meeting.srt

Example 2: Transcribe and translate multilingual content

python3 console.py conference-with-spanish-speakers.wav -t english
# Creates: conference-with-spanish-speakers.srt (all in English)

Example 3: Extract audio from video and transcribe

# First extract audio from video
ffmpeg -i presentation.mp4 -q:a 0 -map a presentation.mp3

# Then transcribe
python3 console.py presentation.mp3

Processing Time Estimates

First Run: Whisper downloads the language model (~1.4GB) on first use. This is a one-time download.

Processing Speed: Varies based on hardware (CPU vs GPU). Below are realistic estimates for medium model on typical hardware:

Time Estimates by Audio Length

Audio Duration CPU (Intel i7/Apple Silicon) GPU (NVIDIA GTX 1080+) Notes
5 minutes 1-2 min 20-30 sec Fast turnaround
10 minutes 2-4 min 40 sec - 1 min Typical podcast episode
15 minutes 3-5 min 1-1.5 min Quick meeting
30 minutes 5-10 min 2-3 min Standard meeting
1 hour 10-20 min 4-6 min Long lecture
2 hours 20-40 min 8-12 min Conference day

What to Expect

  1. First run (one-time): 5-10 minutes extra for model download
  2. Audio analysis: 2-5 seconds to analyze duration and chunks
  3. Transcription: Varies by length (see table above)
  4. Conversion to SRT: Usually < 1 second

Optimization Tips

For faster processing:

  • Use GPU if available (NVIDIA CUDA recommended)
  • Use smaller model size (if accuracy allows): --model tiny or --model small
  • Break very long files into smaller segments

For better accuracy:

  • Use "medium" or "large" model (default is medium)
  • Ensure good audio quality
  • Clear speech (not too much background noise)

Translation & Language Support

Default Behavior (No Translation)

Without the -t option, the tool transcribes in the original audio language:

python3 console.py spanish_podcast.mp3
# Output: Spanish subtitles

Translation Feature

With -t, content is transcribed AND translated to your specified language:

python3 console.py spanish_podcast.mp3 -t english
# Output: English subtitles (translated from Spanish)

Supported Languages

Whisper supports 99+ languages. Common ones:

European: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Greek, Czech Asian: Chinese (Mandarin), Japanese, Korean, Thai, Vietnamese, Filipino Middle Eastern: Arabic, Hebrew, Persian, Turkish African: Swahili, Amharic, Yoruba

See full list: Check Whisper documentation

Multi-Language Audio

If your audio contains multiple languages:

  • Without translation: Each language preserved as-is
  • With translation: All languages translated to target language

Example:

# Audio with Spanish + English mix
python3 console.py mixed_audio.wav -t spanish
# Output: Entire transcript in Spanish (English parts translated)

How It Works

The tool uses a multi-step pipeline for accurate transcription:

  1. Audio Analysis - Analyzes file to determine duration and number of chunks needed
  2. Smart Chunking - Splits audio into 5-minute segments at natural silence points (not mid-sentence)
  3. Language Detection - Detects language for each chunk automatically
  4. Transcription - Transcribes each chunk using Whisper's medium model
  5. Optional Translation - Translates to target language if -t flag is used
  6. Merging - Combines all chunks with correct timestamps
  7. SRT Conversion - Creates standard SRT subtitle format
  8. File Output - Saves .srt file in same directory as audio

Why Chunking Matters

  • Better Accuracy: Processing shorter segments individually improves accuracy
  • Memory Efficient: Prevents memory issues with very long files
  • Smart Boundaries: Silence detection finds natural pauses instead of cutting mid-word
  • Language Handling: Multi-language audio can be detected and translated per chunk

Output Format

Output File Location

The .srt file is created in the same directory as your audio file:

Input:  /path/to/recording.mp3
Output: /path/to/recording.srt

SRT Subtitle Format

Standard SubRip format with timestamps:

1
00:00:00,000 --> 00:00:05,230
First subtitle text

2
00:00:05,230 --> 00:00:12,450
Second subtitle text

3
00:00:12,450 --> 00:00:18,920
Third subtitle text

Example: Original Language

$ python3 console.py spanish_meeting.mp3

============================================================
πŸŽ™οΈ  AUDIO TRANSCRIPTION TO SRT
============================================================

πŸ“ Input:     spanish_meeting.mp3
πŸ“ Size:      23.5 MB
πŸ—£οΈ  Language:  spanish

πŸ” Analyzing audio...
   Duration:   00:45:30
   Chunks:     9

⏳ Transcribing...

[β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 100% - Transcribing chunk 9/9...

βœ… Transcription complete!

πŸ“„ Output:    spanish_meeting.srt
⏱️  Duration:  2730 seconds (45.5 min)
πŸ—£οΈ  Language:  spanish
πŸ“Š Subtitles: 412 entries

Output file content (spanish_meeting.srt):

1
00:00:00,000 --> 00:00:05,230
Buenos dΓ­as a todos, gracias por asistir

2
00:00:05,230 --> 00:00:12,450
Hoy vamos a discutir los resultados trimestrales

3
00:00:12,450 --> 00:00:18,920
Comencemos con la revisiΓ³n de ingresos

Example: Translation to English

$ python3 console.py spanish_meeting.mp3 -t english

============================================================
πŸŽ™οΈ  AUDIO TRANSCRIPTION TO SRT
============================================================

πŸ“ Input:     spanish_meeting.mp3
πŸ“ Size:      23.5 MB
πŸ—£οΈ  Translate: english

πŸ” Analyzing audio...
   Duration:   00:45:30
   Chunks:     9

⏳ Transcribing...

[β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 100% - Transcribing chunk 9/9...

βœ… Transcription complete!

πŸ“„ Output:    spanish_meeting.srt
⏱️  Duration:  2730 seconds (45.5 min)
πŸ—£οΈ  Language:  english
πŸ“Š Subtitles: 412 entries

Preview (first 3 subtitles):
------------------------------------------------------------
1
00:00:00,000 --> 00:00:05,230
Good morning everyone, thank you for attending

2
00:00:05,230 --> 00:00:12,450
Today we're going to discuss the quarterly results

3
00:00:12,450 --> 00:00:18,920
Let's start with the revenue review

Output file content (spanish_meeting.srt):

1
00:00:00,000 --> 00:00:05,230
Good morning everyone, thank you for attending

2
00:00:05,230 --> 00:00:12,450
Today we're going to discuss the quarterly results

3
00:00:12,450 --> 00:00:18,920
Let's start with the revenue review


Troubleshooting

Issues & Solutions

❌ "FFmpeg not found" or "ffmpeg: command not found"

Solution:

# Verify FFmpeg is installed
ffmpeg -version

# If not installed, install it:
# macOS:
brew install ffmpeg

# Ubuntu/Debian:
sudo apt-get update && sudo apt-get install ffmpeg

# Windows:
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg (with Chocolatey)

❌ "ModuleNotFoundError: No module named 'whisper'"

Solution:

# Make sure virtual environment is activated
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
python3 -m pip install --upgrade openai-whisper

❌ "Audio file not supported" or decode error

Solution: Convert to MP3 first

# Convert any audio format to MP3
ffmpeg -i your_audio.wav -q:a 0 -map a your_audio.mp3

# Then transcribe
python3 console.py your_audio.mp3

❌ "Permission denied" when running script

Solution:

# Make script executable (Mac/Linux)
chmod +x console.py

# Then run with python3
python3 console.py your_audio.mp3

❌ Processing is very slow

Possible causes and solutions:

  1. Using CPU instead of GPU

  2. Large audio file

    • Split into smaller files (see examples below)
    • Processing time scales with audio length
  3. Low RAM available

    • Close other applications
    • Process shorter files (< 1 hour at a time)

❌ "CUDA out of memory" (GPU processing)

Solution:

# Fall back to CPU
python3 console.py your_audio.mp3

❌ Poor transcription quality

Tips for better accuracy:

  1. Use clear audio

    • Minimize background noise
    • Use good quality microphone recordings
  2. Use larger model

    • Current setup uses "medium" model
    • For higher accuracy, could use "large" (slower, ~1.4GB more)
  3. Check audio format

    • MP3 or WAV recommended
    • Some formats compress audio quality

❌ File not found error

Solution: Use full path or verify file exists

# Full path (works from anywhere)
python3 console.py /full/path/to/audio.mp3

# Or from same directory
ls audio.mp3  # Verify file exists first
python3 console.py audio.mp3

FAQs

Q: Do I need to use the Transcriptor project?

A: No, Transcriptor Lite is standalone. It only uses the OpenAI Whisper library. You don't need the full Transcriptor project installed.

Q: Can I use this with very long files (3+ hours)?

A: Yes, the audio is automatically chunked into 5-minute pieces. Processing will take longer but will work.

Q: Does it work offline?

A: Yes, once the Whisper model is downloaded (first run), everything works offline.

Q: Can I edit the SRT file after?

A: Yes! SRT files are plain text. Open with any text editor to manually adjust timestamps or text if needed.

Q: What video players support SRT files?

A: Most players support SRT:

  • VLC Media Player (desktop)
  • FFmpeg/ffprobe (command line)
  • Subtitle editors (Subtitle Edit, Aegisub)
  • Web players (with subtitle plugins)

Q: How accurate is the transcription?

A: Whisper's medium model is ~94% accurate with clear audio. Accuracy depends on:

  • Audio quality (background noise, clarity)
  • Speaker accent (English accents trained well)
  • Specialized vocabulary (medical/technical terms may need correction)

Q: Can I use a different Whisper model?

A: The current setup uses the "medium" model. You could modify console.py to use tiny, base, small, large for different speed/accuracy tradeoffs.


File Structure

transcriptor-lite/
β”œβ”€β”€ console.py          # Main transcription script
β”œβ”€β”€ README.md           # This file
└── .gitignore          # Git configuration

Related Projects

  • Transcriptor - Full Django web application with UI

    • GitHub: transcriptor
    • Includes SharePoint integration, job history, more features
  • OpenAI Whisper - The underlying transcription engine


License

MIT License - Feel free to use this project for personal or commercial purposes.


Support & Feedback

  • πŸ› Found a bug? Open an issue on GitHub
  • πŸ’‘ Have a suggestion? Let us know!
  • ⭐ Like this project? Give it a star!

Last Updated: April 2026 Tested with: Python 3.14, Whisper 1.0+, FFmpeg 6.0+

About

This is a lightweight implementation of a transcription using the model in hugging face: whisper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages