Modern TTS

A unified, extensible, and future-proof Python toolkit for locally running state-of-the-art LLM-based Text-to-Speech (TTS) synthesis models.

✨ Features

🧩 25+ Models — MeloTTS, ChatTTS, CosyVoice, Fish Speech, Parler-TTS, XTTS, GPT-SoVITS, F5-TTS, Qwen3-TTS, GLM-TTS, Index-TTS, MaskGCT, and more
🔌 Plugin Architecture — Add new models with @register_model decorator
🚀 Hot-Swap — Switch models at runtime without restarting
🌍 Multi-Language — Chinese, English, Japanese, Korean, and more
🎯 Multi-Task — Speech synthesis, voice cloning, emotion control, style transfer, streaming
💻 Local-First — All inference on-device. No APIs. No data leaves your machine.
🐍 Modern Python — uv-native packaging, Pydantic configs, rich CLI
📦 Zero-Config for select models — GLM-TTS and Index-TTS automatically download their official code repositories on first use

📦 Installation

# Clone the repository
git clone https://github.com/vra/modern-tts.git
cd modern-tts

# Sync all dependencies (recommended)
uv sync --all-extras

# Or install specific extras only
uv sync --extra melotts --extra chattts --extra glm --extra index

# Or just core dependencies
uv sync

Python 3.10+ recommended. Some models (e.g. Index-TTS) require specific PyTorch / transformers versions—see per-model notes below.

🚀 Quick Start

from modern_tts import TTSPipeline

# Synthesize with MeloTTS
pipe = TTSPipeline("melotts-zh")
result = pipe("你好世界，这是语音合成测试。")
result.save("output.wav")

# Switch to ChatTTS for emotional speech
pipe.switch_model("chattts")
result = pipe("这是一个带有情感的语音合成。")
result.save("output_emotion.wav")

# Voice cloning with CosyVoice
pipe.switch_model("cosyvoice-300m")
result = pipe("这是克隆的声音。", task="clone", reference_audio="reference.wav")
result.save("cloned.wav")

# Zero-config voice cloning with GLM-TTS (auto-downloads code)
pipe.switch_model("glm-tts")
result = pipe("你好，这是 GLM-TTS 的语音克隆测试。", task="clone", reference_audio="ref.wav")
result.save("glm_cloned.wav")

# Zero-config voice cloning with Index-TTS (auto-downloads code)
pipe.switch_model("index-tts")
result = pipe("你好，这是 Index-TTS 的语音克隆测试。", task="clone", reference_audio="ref.wav")
result.save("index_cloned.wav")

🎙️ Supported Models

✅ Ready to use (loadable out-of-the-box)

Model ID	Type	Languages	Modes	Install Extra	Notes
`melotts-zh`	TTS	zh, en	speak, emotion	`--extra melotts`	Many text-processing deps (pypinyin, jieba, etc.)
`melotts-en`	TTS	zh, en	speak, emotion	`--extra melotts`	English variant
`chattts`	TTS	zh, en	speak, clone, emotion	`--extra chattts`	Emotional prosody control
`f5-tts`	ZS-VC	zh, en, ja, ko	speak, clone, emotion	`--extra f5`	Requires reference audio for synthesis
`glm-tts`	ZS-VC	zh, en	speak, clone	`--extra glm`	Auto-downloads official repo. Heavy deps (transformers, onnxruntime, peft).
`index-tts`	ZS-VC	zh, en, ja, ko, yue	speak, clone, emotion, style	`--extra index`	Auto-downloads official repo. Requires Python ≥ 3.10.
`moss-tts`	TTS	zh, en, ja, ko	speak, emotion	`--extra moss`	MOSS-TTS-Nano (0.1B), CPU-friendly
`piper-tts`	TTS	15+	speak	`--extra piper`	ONNX-based, edge-optimized
`qwen3-tts-0.6b`	ZS-VC	11+	speak, clone	`--extra qwen3-tts`	Requires `qwen-tts` package
`qwen3-tts-1.7b`	ZS-VC	11+	speak, clone	`--extra qwen3-tts`	Larger Qwen3-TTS variant
`xtts-v1`	ZS-VC	13+	speak, clone	`--extra xtts`	Requires `coqui-tts`
`xtts-v2`	ZS-VC	13+	speak, clone	`--extra xtts`	Adds Chinese support
`xtts-v2.1`	ZS-VC	13+	speak, clone, streaming	`--extra xtts`	Adds streaming mode

ZS-VC = Zero-Shot Voice Cloning (requires a reference_audio sample).

⚠️ Requires manual setup

These models need you to manually clone their official repositories and/or download weights before use. Calling load() will raise a RuntimeError with setup instructions.

Model ID	Type	Languages	Modes	Install Extra	Setup Notes
`bertvits2-zh`	TTS	zh, en	speak, emotion	`--extra bertvits2`	Clone repo + download weights
`bertvits2-en`	TTS	en	speak, emotion	`--extra bertvits2`	Clone repo + download weights
`bertvits2-jp`	TTS	ja, en	speak, emotion	`--extra bertvits2`	Clone repo + download weights
`cosyvoice-300m`	ZS-VC	zh, en, yue, ja, ko	speak, clone, emotion, style	`--extra cosyvoice`	Clone repo + download weights
`cosyvoice-300m-sft`	ZS-VC	zh, en, yue, ja, ko	speak, clone, emotion, style	`--extra cosyvoice`	SFT variant
`cosyvoice-300m-instruct`	ZS-VC	zh, en, yue, ja, ko	speak, clone, emotion, style	`--extra cosyvoice`	Instruct variant
`fishspeech-1.5`	ZS-VC	zh, en, ja, ko	speak, clone, emotion	`--extra fishspeech`	Clone repo + weights; pyaudio needs system headers
`gptsovits`	ZS-VC	zh, en, ja, yue	speak, clone	`--extra gptsovits`	Clone repo + download weights
`redfire-tts`	ZS-VC	zh, en, yue	speak, clone, emotion	`--extra redfire`	fairseq needs C++ build headers

❌ Temporarily unavailable

Model ID	Reason
`maskgct`	Custom tokenizer incompatible with generic `TextToAudioLLMModel` loader
`parler-tts-mini`	`parler-tts` package incompatible with `transformers >= 4.50`
`parler-tts-large`	Same compatibility issue as `parler-tts-mini`
`pocket-tts`	No public repository or weights found (reserved for future implementation)

📋 Changelog & API Changes

Latest

New Models

glm-tts — LLM + Flow Matching zero-shot TTS (Zhipu AI). Merged previous glm-tts-nano-2512 and glm-tts-2512 into a single glm-tts model ID.
index-tts — Industrial-level multilingual zero-shot voice cloning (IndexTeam).

Zero-Config Auto-Download

GLM-TTS and Index-TTS no longer require manual environment variables (GLM_TTS_REPO_PATH, INDEX_TTS_REPO_PATH) or PYTHONPATH manipulation.
On first use, the framework automatically:
1. Clones the official repository to ~/.cache/modern-tts/repos/
2. Injects the path into sys.path
3. Proceeds with model loading
You can still override the auto-download path via config.extra["glm_tts_repo_path"] / config.extra["index_tts_repo_path"] or the corresponding environment variables.

New Infrastructure Modules

modern_tts.core.hf_hub — HuggingFace Hub download helpers (download_hf_model, get_hf_model_path) so custom-code adapters don't re-implement caching logic.
modern_tts.core.repo_manager — Generic git repository auto-downloader (ensure_repo, inject_repo_path) used by adapters that depend on upstream code not on PyPI.

Base Class Improvements

TextToAudioLLMModel.load() now raises a clear NotImplementedError when a subclass has not set PROCESSOR_CLS / MODEL_CLS, signaling that the subclass must override load() for custom loading logic.

Model ID Changes

Old ID	New ID	Note
`glm-tts-nano-2512`	`glm-tts`	Merged into unified `glm-tts`
`glm-tts-2512`	`glm-tts`	Merged into unified `glm-tts`

🏗️ Architecture

Modern TTS is built on three layers:

TTSPipeline — Unified user API. Handles text normalization, task dispatch, model lifecycle.
TTSModel / TextToAudioLLMModel — Adapter layer. New models often need only 8 lines of config via TextToAudioLLMModel.
Backends — Transformers, vLLM, ONNX Runtime.

Adding a New Model

from modern_tts.core.audio_llm import TextToAudioLLMModel
from modern_tts.core.registry import register_model

@register_model("my-tts-1b")
class MyTTS1B(TextToAudioLLMModel):
    HF_PATH = "org/MyTTS-1B"
    PROCESSOR_CLS = "transformers.AutoTokenizer"
    MODEL_CLS = "transformers.AutoModelForTextToWaveform"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    DEFAULT_SAMPLE_RATE = 24000

    @property
    def model_id(self) -> str:
        return "my-tts-1b"

That's it. The registry auto-discovers it at runtime.

🤝 Contributing

See Contributing Guide for development setup, code style, and PR checklist.

📄 License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/modern_tts		src/modern_tts
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern TTS

✨ Features

📦 Installation

🚀 Quick Start

🎙️ Supported Models

✅ Ready to use (loadable out-of-the-box)

⚠️ Requires manual setup

❌ Temporarily unavailable

📋 Changelog & API Changes

Latest

New Models

Zero-Config Auto-Download

New Infrastructure Modules

Base Class Improvements

Model ID Changes

🏗️ Architecture

Adding a New Model

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modern TTS

✨ Features

📦 Installation

🚀 Quick Start

🎙️ Supported Models

✅ Ready to use (loadable out-of-the-box)

⚠️ Requires manual setup

❌ Temporarily unavailable

📋 Changelog & API Changes

Latest

New Models

Zero-Config Auto-Download

New Infrastructure Modules

Base Class Improvements

Model ID Changes

🏗️ Architecture

Adding a New Model

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages