tts

A Claude Code skill that turns any content (file, URL, raw text, image, PDF) into spoken audio narration. Default backend is local Kokoro: free, no API key, no watermark, fully offline. Pass --gemini to use the Gemini 2.5 Flash TTS API instead.

You say "read this to me" and Claude does. The skill rewrites the source into a spoken-style script first, then streams audio chunk-by-chunk while later chunks synthesize, so audio starts within ~5s.

Quick install

User-scoped (recommended):

git clone https://github.com/wickdninja/tts.git ~/.claude/skills/tts

Project-scoped:

git clone https://github.com/wickdninja/tts.git <project>/.claude/skills/tts

Then in Claude Code:

/tts                              # ask what to read
/tts ./README.md                  # narrate a file
/tts https://example.com/post     # narrate a URL
/tts --gemini ./README.md         # use the Gemini API instead of Kokoro
/tts --voices                     # list Kokoro voices
/tts --voices --gemini            # list Gemini voices

Backends

Kokoro (default, local, free)

macOS or Linux
node (any modern version)
uv (install)

brew install uv node

The first run downloads ~340MB to ~/.cache/uv/ (kokoro + torch wheels) and ~/.cache/huggingface/ (the 82M Kokoro weights). Subsequent runs are fast and fully offline.

Output is 24kHz mono WAV. MIT-licensed model, no watermark, no audible AI disclaimer.

Gemini 2.5 Flash TTS (cloud)

GOOGLE_API_KEY (get one)
network connectivity at synthesis time

export GOOGLE_API_KEY="..."

Pass --gemini to route through Google. Costs are negligible at typical volumes; audio is billed per character.

Voices

Set TTS_VOICE=<name> to pick a voice. Voice names are not interchangeable across backends.

Backend	Default	Popular alternates
Kokoro	`af_heart` (American female, warm)	`am_adam`, `bf_emma`, `bm_george`, `am_puck`, `af_bella`
Gemini	`Kore` (female, warm)	`Orus`, `Charon`, `Puck`, `Leda`

Full catalog (including Japanese, Mandarin, Spanish, French, Hindi, Italian, Brazilian Portuguese for Kokoro): see voices.md.

How it works

Phase 1, extract. Read the file, fetch the URL, describe the image, etc.
Phase 2, rewrite. Convert the source into a spoken narrative: no markdown, no bullets, no code blocks. Spoken transitions ("here's the thing"), expanded jargon, ~150 wpm.
Phase 3, synthesize. Stream audio chunk-by-chunk with --play, or batch-render to a single WAV without it.
Phase 4, replay. The combined WAV is always written to /tmp/tts-output.wav so you can replay with ! afplay /tmp/tts-output.wav (macOS).

Audio chunking is automatic. Kokoro chunks on sentence boundaries; Gemini pre-chunks at ~1500 chars (streaming) or ~8000 chars (batch).

When to use `--gemini`

uv isn't installed and you don't want to install it.
You want a specific Gemini voice (Kore, Orus, etc.).
Long-form content (>30 min) where Gemini's faster end-to-end synthesis matters more than privacy or cost.

Otherwise stay on Kokoro: free, fully offline, no watermark.

Optional environment variables

Var	Backend	Default	Notes
`TTS_VOICE`	both	`af_heart` / `Kore`	voice selection
`TTS_LANG`	Kokoro	inferred from voice prefix	override (e.g. `a`, `b`, `j`, `z`)
`TTS_SPEED`	Kokoro	`1.0`	playback rate multiplier
`TTS_MODEL`	Gemini	`gemini-2.5-flash-preview-tts`	rarely needs changing

Layout

SKILL.md         main skill (read first)
setup.md         install + verification steps
voices.md        full voice catalog
tts.js           dispatcher (Node); handles the --gemini path directly
tts_kokoro.py    local Kokoro helper (uv self-bootstrapping)

Notes

--play uses afplay (macOS). On Linux, the WAV is still written; play it with aplay, paplay, mpv, etc.
Never use .mp3 as the output extension. The file is PCM WAV.
Kokoro is MIT-licensed and runs entirely on your machine. Gemini routes audio through Google.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tts

Quick install

Backends

Kokoro (default, local, free)

Gemini 2.5 Flash TTS (cloud)

Voices

How it works

When to use `--gemini`

Optional environment variables

Layout

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
setup.md		setup.md
tts.js		tts.js
tts_kokoro.py		tts_kokoro.py
voices.md		voices.md

Folders and files

Latest commit

History

Repository files navigation

tts

Quick install

Backends

Kokoro (default, local, free)

Gemini 2.5 Flash TTS (cloud)

Voices

How it works

When to use --gemini

Optional environment variables

Layout

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

When to use `--gemini`

Packages