Karaoke Maker

A user-friendly GUI application for creating karaoke videos from MP3 files. Select an MP3, let AI transcribe the lyrics, and generate a karaoke video with animated lyrics.

Example generated video

Features

AI-Powered Lyrics Transcription:
- Automatic transcription using OpenAI's Whisper
- Word-level timestamps for perfect sync
- Editable lyrics with preserved timing - fix any transcription errors while keeping perfect synchronization
High-Quality Audio Processing:
- Vocal separation using Meta's Demucs (optional)
- Clean instrumental tracks for true karaoke experience
Customizable Visuals:
- Multiple gradient background presets
- Custom background image support
- Smooth animated lyrics with word highlighting
Real-Time Progress Tracking:
- Dual progress bars showing overall and detailed progress
- Live progress from AI models (Demucs, Whisper, FFmpeg)
Cross-Platform:
- Windows (.exe coming soon)
- Linux (Flatpak coming soon)
- macOS (source installation)

Requirements

System Requirements

Python 3.8 or higher
FFmpeg (must be installed and in PATH)
CUDA-compatible GPU (optional, for faster processing)

Installing FFmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Linux (Fedora):

sudo dnf install ffmpeg

macOS:

brew install ffmpeg

Windows: Download from ffmpeg.org and add to PATH.

Installation

From Source

Clone the repository:

git clone https://github.com/usr-wwelsh/karaoke-maker.git
cd karaoke-maker

Run the setup script (recommended):

./setup.sh  # Linux/macOS
# or
setup.bat   # Windows

OR Manual installation:

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Run the application:

python main.py

Optional: High-Quality Vocal Separation

By default, the app works without vocal separation (uses original audio for transcription). For better karaoke quality with isolated vocals, you can optionally install Demucs.

See INSTALL_DEMUCS.md for detailed installation instructions.

Quick install (Linux/macOS):

# Install system dependencies (Ubuntu/Debian)
sudo apt install lame liblame-dev

# Activate venv and install
source venv/bin/activate
pip install lameenc
pip install -U git+https://github.com/facebookresearch/demucs#egg=demucs

First Run

On first run, the application will download AI models:

Whisper transcription model (~140MB for base model)
Demucs vocal separation model (~300MB, only if installed)

This only happens once. Models are cached for future use.

Note: Without Demucs, the app will use the original audio (with background music) for transcription. This works fine but may be slightly less accurate than using isolated vocals. The karaoke video will include vocals in the background (not true karaoke). For best results, install Demucs to get instrumental-only tracks.

Usage

Select MP3 File: Click "Browse" to select your MP3 file
Auto-Transcribe: The app automatically transcribes lyrics with AI
- Lyrics appear with timestamps: [0.00-0.50] word
- Edit any words if transcription isn't perfect
- Timing is preserved when you edit words
- Click "Reload Lyrics" to re-transcribe if needed
Select Background: Choose a gradient preset or upload a custom image
Generate: Click "Generate Karaoke Video" and choose save location
Watch Progress: Two progress bars show overall step and detailed AI progress
Enjoy: Your karaoke video is ready!

Processing Time

Typical processing time for a 4-minute song:

With GPU (CUDA): 2-3 minutes
CPU only: 4-6 minutes

Breakdown:

Vocal separation: 1-3 minutes
Transcription/alignment: 30 seconds - 2 minutes
Video rendering: 30 seconds - 1 minute

Troubleshooting

"FFmpeg not found"

Make sure FFmpeg is installed and in your system PATH. Test with:

ffmpeg -version

Out of memory errors

Try closing other applications
Use a smaller Whisper model (edit lyrics_handler.py, change 'base' to 'tiny')
Process shorter songs first

GPU errors (CUDA kernel image, compatibility issues)

Your GPU may be too old for PyTorch 2.0+
The app automatically detects this and uses CPU mode
See GPU_TROUBLESHOOTING.md for details
Quick fix: export CUDA_VISIBLE_DEVICES=-1 before running

Poor transcription quality

Ensure audio has clear vocals
Try a larger Whisper model ('small' or 'medium' in lyrics_handler.py)
Edit the transcribed lyrics to fix any errors - timing will be preserved
Songs with heavy background music may transcribe better with Demucs installed

Development

Project Structure

karaoke-maker/
├── main.py                 # Main GUI application
├── audio_processor.py      # Demucs vocal separation
├── lyrics_handler.py       # Whisper AI transcription
├── alignment.py            # Forced alignment for lyrics
├── video_renderer.py       # FFmpeg video generation
├── utils.py                # Metadata extraction, helpers
├── requirements.txt        # Python dependencies
├── LICENSE                 # MIT License
└── README.md              # This file

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Whisper: OpenAI - Speech recognition and transcription - github.com/openai/whisper
Demucs: Meta AI - Music source separation - github.com/facebookresearch/demucs
CustomTkinter: Modern UI library - github.com/TomSchimansky/CustomTkinter
FFmpeg: Video encoding and processing - ffmpeg.org

Author

Created by usr-wwelsh

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
packaging/flatpak		packaging/flatpak
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
GPU_TROUBLESHOOTING.md		GPU_TROUBLESHOOTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
alignment.py		alignment.py
audio_processor.py		audio_processor.py
create_default_background.py		create_default_background.py
install-demucs.sh		install-demucs.sh
lyrics_handler.py		lyrics_handler.py
main.py		main.py
requirements-optional.txt		requirements-optional.txt
requirements.txt		requirements.txt
run-cpu.sh		run-cpu.sh
run.bat		run.bat
run.sh		run.sh
setup.bat		setup.bat
setup.sh		setup.sh
test_setup.py		test_setup.py
utils.py		utils.py
video_renderer.py		video_renderer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Karaoke Maker

Example generated video

Features

Requirements

System Requirements

Installing FFmpeg

Installation

From Source

Optional: High-Quality Vocal Separation

First Run

Usage

Processing Time

Troubleshooting

"FFmpeg not found"

Out of memory errors

GPU errors (CUDA kernel image, compatibility issues)

Poor transcription quality

Development

Project Structure

Contributing

License

Credits

Author

Roadmap

About

Uh oh!

Releases

Languages

License

usr-wwelsh/karaoke-maker

Folders and files

Latest commit

History

Repository files navigation

Karaoke Maker

Example generated video

Features

Requirements

System Requirements

Installing FFmpeg

Installation

From Source

Optional: High-Quality Vocal Separation

First Run

Usage

Processing Time

Troubleshooting

"FFmpeg not found"

Out of memory errors

GPU errors (CUDA kernel image, compatibility issues)

Poor transcription quality

Development

Project Structure

Contributing

License

Credits

Author

Roadmap

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Languages