Open-Lyrics

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude.

Key Features:

Well preprocessed audio to reduce hallucination (Loudness Norm & optional Noise Suppression).
Context-aware translation to improve translation quality. Check prompt for details.

New 🚨

2024.3.29: Claude models are now available for translation. According to the testing, Claude 3 Sonnet performs way better than GPT-3.5 Turbo. We recommend using Claude 3 Sonnet for non-english audio (source language) translation (For now, the default model are still GPT-3.5 Turbo):
```
lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
```
2024.4.4: Add basic streamlit GUI support. Try openlrc gui to start the GUI.

2024.5.7:

Add custom endpoint (base_url) support for OpenAI & Anthropic:

lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',
                               'anthropic': 'https://api.g4f.icu'})

Generating bilingual subtitles

lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)

Installation ⚙️

Please install CUDA 11.x and cuDNN 8 for CUDA 11 first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

faster-whisper also needs cuBLAS for CUDA 11 installed.

For Windows Users (click to expand)

(For Windows Users only) Windows user can Download the libraries from Purfview's repository:

Purfview's whisper-standalone-win provides the required NVIDIA libraries for Windows in a single archive. Decompress the archive and place the libraries in a directory included in the PATH.
Add LLM API keys, you can either:
- Add your OpenAI API key to environment variable OPENAI_API_KEY.
- Add your Anthropic API key to environment variable ANTHROPIC_API_KEY.

Install PyTorch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Install latest fast-whisper

pip install git+https://github.com/guillaumekln/faster-whisper

Install ffmpeg and add bin directory to your PATH.
This project can be installed from PyPI:
```
pip install openlrc
```
or install directly from GitHub:
```
pip install git+https://github.com/zh-plus/openlrc
```

Usage 🐍

GUI

Note

We are migrating the GUI from streamlit to Gradio. The GUI is still under development.

openlrc gui

Python code

from openlrc import LRCer

if __name__ == '__main__':
    lrcer = LRCer()

    # Single file
    lrcer.run('./data/test.mp3',
              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

    # Multiple files
    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
    # Note we run the transcription sequentially, but run the translation concurrently for each file.

    # Path can contain video
    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt

    # Use context.yaml to improve translation
    lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')

    # To skip translation process
    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

    # Change asr_options or vad_options, check openlrc.defaults for details
    vad_options = {"threshold": 0.1}
    lrcer = LRCer(vad_options=vad_options)
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Enhance the audio using noise suppression (consume more time).
    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

    # Change the LLM model for translation
    lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Clear temp folder after processing done
    lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp_folder=True)

    # Change base_url
    lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',
                                   'anthropic': 'https://api.g4f.icu'})

    # Bilingual subtitle
    lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)

Check more details in Documentation.

Context

Utilize the available context to enhance the quality of your translation. Save them as context.yaml in the same directory as your audio file.

Note

The improvement of translation quality from Context is NOT guaranteed.

background: "This is a multi-line background.
This is a basic example."
audio_type: Movie
description_map: {
  movie_name1 (without extension): "This
  is a multi-line description for movie1.",
  movie_name2 (without extension): "This
  is a multi-line description for movie2.",
  movie_name3 (without extension): "This is a single-line description for movie 3.",
}

Pricing 💰

pricing data from OpenAI and Anthropic

Model Name	Pricing for 1M Tokens (Input/Output) (USD)	Cost for 1 Hour Audio (USD)
`gpt-3.5-turbo-0125`	0.5, 1.5	0.01
`gpt-3.5-turbo`	0.5, 1.5	0.01
`gpt-4-0125-preview`	10, 30	0.5
`gpt-4-turbo-preview`	10, 30	0.5
`claude-3-haiku-20240307`	0.25, 1.25	0.015
`claude-3-sonnet-20240229`	3, 15	0.2
`claude-3-opus-20240229`	15, 75	1

Note the cost is estimated based on the token count of the input and output text. The actual cost may vary due to the language and audio speed.

Recommended translation model

For english audio, we recommend using gpt-3.5-turbo.

For non-english audio, we recommend using claude-3-sonnet-20240229.

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
.github		.github
openlrc		openlrc
resources		resources
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

License

zh-plus/openlrc

Folders and files

Latest commit

History

Repository files navigation

Open-Lyrics

Key Features:

New 🚨

Installation ⚙️

Usage 🐍

GUI

Python code

Context

Pricing 💰

Recommended translation model

Todo

Credits

Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Languages