Skip to content

wickdninja/tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tts

A Claude Code skill that turns any content (file, URL, raw text, image, PDF) into spoken audio narration. Default backend is local Kokoro: free, no API key, no watermark, fully offline. Pass --gemini to use the Gemini 2.5 Flash TTS API instead.

You say "read this to me" and Claude does. The skill rewrites the source into a spoken-style script first, then streams audio chunk-by-chunk while later chunks synthesize, so audio starts within ~5s.

Quick install

User-scoped (recommended):

git clone https://github.com/wickdninja/tts.git ~/.claude/skills/tts

Project-scoped:

git clone https://github.com/wickdninja/tts.git <project>/.claude/skills/tts

Then in Claude Code:

/tts                              # ask what to read
/tts ./README.md                  # narrate a file
/tts https://example.com/post     # narrate a URL
/tts --gemini ./README.md         # use the Gemini API instead of Kokoro
/tts --voices                     # list Kokoro voices
/tts --voices --gemini            # list Gemini voices

Backends

Kokoro (default, local, free)

  • macOS or Linux
  • node (any modern version)
  • uv (install)
brew install uv node

The first run downloads ~340MB to ~/.cache/uv/ (kokoro + torch wheels) and ~/.cache/huggingface/ (the 82M Kokoro weights). Subsequent runs are fast and fully offline.

Output is 24kHz mono WAV. MIT-licensed model, no watermark, no audible AI disclaimer.

Gemini 2.5 Flash TTS (cloud)

  • GOOGLE_API_KEY (get one)
  • network connectivity at synthesis time
export GOOGLE_API_KEY="..."

Pass --gemini to route through Google. Costs are negligible at typical volumes; audio is billed per character.

Voices

Set TTS_VOICE=<name> to pick a voice. Voice names are not interchangeable across backends.

Backend Default Popular alternates
Kokoro af_heart (American female, warm) am_adam, bf_emma, bm_george, am_puck, af_bella
Gemini Kore (female, warm) Orus, Charon, Puck, Leda

Full catalog (including Japanese, Mandarin, Spanish, French, Hindi, Italian, Brazilian Portuguese for Kokoro): see voices.md.

How it works

  1. Phase 1, extract. Read the file, fetch the URL, describe the image, etc.
  2. Phase 2, rewrite. Convert the source into a spoken narrative: no markdown, no bullets, no code blocks. Spoken transitions ("here's the thing"), expanded jargon, ~150 wpm.
  3. Phase 3, synthesize. Stream audio chunk-by-chunk with --play, or batch-render to a single WAV without it.
  4. Phase 4, replay. The combined WAV is always written to /tmp/tts-output.wav so you can replay with ! afplay /tmp/tts-output.wav (macOS).

Audio chunking is automatic. Kokoro chunks on sentence boundaries; Gemini pre-chunks at ~1500 chars (streaming) or ~8000 chars (batch).

When to use --gemini

  • uv isn't installed and you don't want to install it.
  • You want a specific Gemini voice (Kore, Orus, etc.).
  • Long-form content (>30 min) where Gemini's faster end-to-end synthesis matters more than privacy or cost.

Otherwise stay on Kokoro: free, fully offline, no watermark.

Optional environment variables

Var Backend Default Notes
TTS_VOICE both af_heart / Kore voice selection
TTS_LANG Kokoro inferred from voice prefix override (e.g. a, b, j, z)
TTS_SPEED Kokoro 1.0 playback rate multiplier
TTS_MODEL Gemini gemini-2.5-flash-preview-tts rarely needs changing

Layout

SKILL.md         main skill (read first)
setup.md         install + verification steps
voices.md        full voice catalog
tts.js           dispatcher (Node); handles the --gemini path directly
tts_kokoro.py    local Kokoro helper (uv self-bootstrapping)

Notes

  • --play uses afplay (macOS). On Linux, the WAV is still written; play it with aplay, paplay, mpv, etc.
  • Never use .mp3 as the output extension. The file is PCM WAV.
  • Kokoro is MIT-licensed and runs entirely on your machine. Gemini routes audio through Google.

License

MIT. See LICENSE.

About

Claude Code skill: turn any content into spoken audio. Local Kokoro by default (free, offline, MIT); Gemini 2.5 Flash TTS as cloud fallback.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors