narrate

Turn a script into a multi-voice audio file with one command. Runs locally, costs nothing.

Chatterbox TTS can clone a voice from a 10-second clip and produce audio comparable to ElevenLabs. But using it directly means writing Python, managing per-speaker generation, chunking long text so the model doesn't choke, stitching segments together with silence gaps, normalizing loudness, and handling a dozen upstream library quirks. narrate does all of that for you.

What you get:

Script in, audio out. Pass a JSONL file with speaker labels and text, get a single MP3. No Python required.
Multi-speaker. Each line can use a different voice. Voices are just WAV files in a directory.
Smart chunking. Long text is split at sentence boundaries to stay within model limits, then stitched seamlessly.
Expression tags. [laugh], [sigh], [whisper] — inline in your text, passed through to the engine.
Voice library. 28 pre-made voices downloadable with one command, or clone your own from any audio clip.
Engine-agnostic. Ships with Chatterbox (local, free) and ElevenLabs (cloud, paid) backends. Same script format for both — swap engines with a flag.
Clean output. No upstream library warnings, no progress bar spam. Just your audio.

Powered by Chatterbox TTS (Resemble AI, MIT license).

Install

curl -sSL https://raw.githubusercontent.com/zackham/narrate/master/install.sh | bash

Or if you already have uv:

uv tool install git+https://github.com/zackham/narrate.git

Quick Start

# Download pre-made voices (~28 voices)
narrate install-voices

# Create a script
cat > script.jsonl << 'EOF'
{"voice": "jordan", "text": "Wait, so I can just write a script and get a multi-voice podcast out of it?"}
{"voice": "adrian", "text": "One command. It chunks your text, generates each speaker, stitches it together, normalizes the audio."}
{"voice": "olivia", "text": "And it runs locally. No API keys, no billing, no rate limits. [chuckle] I was mass-producing podcasts on ElevenLabs before this. My credit card is relieved."}
{"voice": "jordan", "text": "[gasp] OK so what's the catch?"}
{"voice": "adrian", "text": "There isn't one. It's open source. You just need a ten second voice clip and you're cloning."}
EOF

# Generate
narrate script.jsonl -o podcast.mp3

Voice Setup

Option 1: Pre-made Voices

Download ~28 pre-made voices with a single command:

narrate install-voices

This downloads voices from devnen/Chatterbox-TTS-Server (MIT license) into ~/.narrate/voices/.

Option 2: Your Own Voices

Drop WAV files (10-30 second reference clips) into ~/.narrate/voices/:

~/.narrate/voices/
  zack.wav
  adrian.wav
  olivia.wav

The voice field in your script maps to the filename (e.g. "voice": "zack" → ~/.narrate/voices/zack.wav). Voice matching is case-insensitive. Clips must be at least 5 seconds long.

Override the voices directory with --voices-dir or in narrate.toml.

List Available Voices

narrate voices

Script Formats

JSONL (primary)

{"voice": "adrian", "text": "So here's the thing. [laugh] It's obvious in hindsight."}
{"voice": "olivia", "text": "Yeah, I had the same reaction. [sigh] But nobody talks about it."}

ElevenLabs-Compatible JSON

{
  "inputs": [
    {"voice": "adrian", "text": "So here's the thing."},
    {"voice": "olivia", "text": "Yeah, I agree."}
  ]
}

Plain Text

narrate essay.txt --voice adrian -o output.mp3

Inline Text

narrate --text "Hello world" --voice adrian -o hello.mp3

Expression Tags

Chatterbox supports inline expression tags that control vocal delivery:

Tag	Effect
`[laugh]`	Laughter
`[chuckle]`	Chuckle
`[sigh]`	Sigh
`[gasp]`	Gasp
`[cough]`	Cough
`[groan]`	Groan
`[sniff]`	Sniff
`[shush]`	Shush
`[clear throat]`	Throat clearing
`[yawn]`	Yawn
`[whisper]`	Whispered speech

Use them inline: "So here's the thing. [laugh] It's obvious in hindsight."

Engine Configuration

Chatterbox (default, local)

No configuration needed. Runs locally on CPU or CUDA. First run downloads the model (~1.4GB) from HuggingFace.

# Use GPU
narrate script.jsonl -o output.mp3 --device cuda

# Force CPU
narrate script.jsonl -o output.mp3 --device cpu

ElevenLabs (cloud, requires API key)

export ELEVENLABS_API_KEY="your-key-here"
narrate script.jsonl --engine elevenlabs -o output.wav

Configure voice ID mapping in ~/.narrate/narrate.toml:

[elevenlabs]
api_key = ""  # or use ELEVENLABS_API_KEY env var
voice_map = { adrian = "21m00Tcm4TlvDq8ikWAM", olivia = "voice_id_here" }

Configuration

narrate looks for config in two places (working directory overrides global):

./narrate.toml (project-local)
~/.narrate/narrate.toml (global default)

[voices]
dir = "~/.narrate/voices"
default = "adrian"

[output]
format = "mp3"
turn_gap = 0.4

[engine]
default = "chatterbox"
device = "auto"  # auto | cpu | cuda

[elevenlabs]
api_key = ""
voice_map = { adrian = "voice_id_here" }

CLI flags override config file values.

CLI Reference

narrate [FILE] [OPTIONS]
narrate install-voices [--voices-dir DIR]
narrate voices [--voices-dir DIR]

Options:
  FILE                    Script file (JSONL, JSON, or plain text)
  --text TEXT             Inline text to synthesize
  --voice NAME            Default voice name
  --voices-dir DIR        Voice WAV files directory (default: ~/.narrate/voices)
  --engine ENGINE         TTS engine: chatterbox, elevenlabs
  --device DEVICE         Device: auto, cpu, cuda
  -o, --output PATH       Output file path (default: INPUT.mp3 or output.mp3)
  --format FORMAT         Output format: wav, mp3
  --turn-gap SECONDS      Silence between speaker turns (default: 0.4)
  --no-normalize          Disable LUFS audio normalization
  --help                  Show help

Troubleshooting

"voice clip 'X' is N.Ns -- must be at least 5 seconds" Your voice reference WAV is too short. Chatterbox needs at least 5 seconds of speech. Record or find a longer clip (10-30 seconds with varied intonation is ideal).

"ffmpeg not found" MP3 output requires ffmpeg. Install it via your package manager (brew install ffmpeg, apt install ffmpeg, etc.) or use --format wav.

"no voice file found for 'X'" The voice WAV file is missing from your voices directory. Run narrate install-voices to download pre-made voices, or add your own WAV file.

"CUDA device requested but no GPU available" Use --device cpu or --device auto (default) to run on CPU.

First run is slow Chatterbox downloads the model (~1.4GB) from HuggingFace on first use. Subsequent runs use the cached model.

ElevenLabs errors

401: Check your API key (ELEVENLABS_API_KEY env var or narrate.toml)
429: Rate limit exceeded. Wait and retry, or check your plan quota.
Voice not configured: Add the voice ID to [elevenlabs.voice_map] in narrate.toml

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
narrate		narrate
tests		tests
voices		voices
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
narrate.toml.example		narrate.toml.example
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

narrate

Install

Quick Start

Voice Setup

Option 1: Pre-made Voices

Option 2: Your Own Voices

List Available Voices

Script Formats

JSONL (primary)

ElevenLabs-Compatible JSON

Plain Text

Inline Text

Expression Tags

Engine Configuration

Chatterbox (default, local)

ElevenLabs (cloud, requires API key)

Configuration

CLI Reference

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

narrate

Install

Quick Start

Voice Setup

Option 1: Pre-made Voices

Option 2: Your Own Voices

List Available Voices

Script Formats

JSONL (primary)

ElevenLabs-Compatible JSON

Plain Text

Inline Text

Expression Tags

Engine Configuration

Chatterbox (default, local)

ElevenLabs (cloud, requires API key)

Configuration

CLI Reference

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages