ActVoice

Accessible-first audio drama production service for humans and AI agents.

Domain target: actvoice.xyz

ActVoice is designed as a server-side audio drama studio:

humans use a web/API interface;
AI agents use MCP/API tools;
the server stores project manifests;
the server renders voices, ambience, cues, and final MP3 artifacts;
no built-in LLM dependency is required for the MVP.

MVP Modes

TTS provider modes:

edge — free/default neural Microsoft voices.
rhvoice — local/offline fallback and explicit open-source mode.
openai_byo_key — future user-provided OpenAI key, fast/paid by user.

Current implementation includes EdgeTTSProvider, RHVoiceProvider, an Edge→RHVoice fallback chain, SQLite-backed project/agent/job storage, in-process RenderQueue, Openverse CC0 SFX lookup, timing anchors, and downloadable artifact endpoints.

Requirements

Ubuntu packages:

apt-get install -y rhvoice rhvoice-russian rhvoice-english ffmpeg

Python dependencies include edge-tts. If Edge is unavailable at runtime, ActVoice falls back to RHVoice.

Python:

python3 -m pip install -e .[dev]

Run API

uvicorn app.main:app --host 0.0.0.0 --port 8080

With aaPanel Python Manager, configure the same command as the managed Python app/start command. aaPanel can replace a separate systemd service for process supervision if it is configured for auto-start after reboot, restart on crash, environment variables, and log access.

Runtime data is stored under data/ by default:

data/actvoice.sqlite3 — SQLite database for projects, render jobs, and agent key hashes.
data/projects/PROJECT_ID/ — rendered WAV/MP3/manifest artifacts and caches.

Back up both the SQLite file and data/projects/.

Run MCP Server

Stdio MCP server for local clients such as Hermes:

python -m app.mcp_server

Hermes config example:

mcp_servers:
  actvoice:
    command: "python"
    args: ["-m", "app.mcp_server"]
    env:
      ACTVOICE_API_KEY: "[REDACTED]"

Run from the project directory or set PYTHONPATH to the repo root. The MCP server exposes the same service layer as the REST API.

HTTP/streamable transport is available for later deployment:

ACTVOICE_MCP_TRANSPORT=streamable-http python -m app.mcp_server

For MVP, write tools still require an ActVoice key, either through ACTVOICE_API_KEY or an api_key tool argument.

TTS Provider Chain

Default free mode is Edge first with RHVoice fallback:

edge -> rhvoice

Useful runtime modes:

# Default quality-first free mode
export ACTVOICE_TTS_MODE=free
export ACTVOICE_TTS_DEFAULT=edge

# Fully local/offline open-source mode
export ACTVOICE_TTS_MODE=rhvoice

# Optional RHVoice fallback voice when Edge fails
export ACTVOICE_RHVOICE_FALLBACK_VOICE=aleksandr

Characters default to provider=edge and voice=ru-RU-DmitryNeural. Set provider=rhvoice and voice=aleksandr when a project must use local RHVoice only. The render manifest records the actual provider/voice for each line plus whether fallback was used.

Agent Registration / Minimal Anti-Spam

ActVoice does not implement full login/password auth in the MVP. Instead, agents register and receive a bearer key.

Optional invite protection:

export ACTVOICE_REGISTRATION_CODE='change-me'

Register an agent:

curl -X POST http://localhost:8080/api/agents/register \
  -H 'content-type: application/json' \
  -d '{"agent_name":"Hermes","purpose":"demo","registration_code":"change-me"}'

The response includes an api_key once. Write/render endpoints require:

Authorization: Bearer [REDACTED]

This is intentionally simpler than full MCP/OAuth auth, but prevents casual write/render spam by just knowing the endpoint.

Basic API Flow

ACTVOICE_KEY='paste-the-registration-response-key-here'

curl -X POST http://localhost:8080/api/projects \
  -H "authorization: Bearer $ACTVOICE_KEY" \
  -H 'content-type: application/json' \
  -d '{"title":"Demo","language":"ru"}'

Then add characters, scenes, lines, sound cues, and render.

Render requests are queued and return 202 Accepted with a durable SQLite-backed job id:

curl -X POST http://localhost:8080/api/projects/PROJECT_ID/render \
  -H "authorization: Bearer $ACTVOICE_KEY"

Poll job status:

curl http://localhost:8080/api/jobs/JOB_ID

When done, fetch metadata and files:

GET /api/projects/PROJECT_ID/artifact
GET /api/projects/PROJECT_ID/artifact.mp3
GET /api/projects/PROJECT_ID/artifact.wav
GET /api/projects/PROJECT_ID/render-manifest.json

Render jobs are persisted in SQLite. Completed/failed jobs survive aaPanel or process restarts. If a process restarts while a job is rendering, ActVoice marks it back to queued and can resume it on startup.

Production Sound Timing

ActVoice does not need a built-in AI to place sounds. External agents or human tools send deterministic timing instructions; the server resolves them during render.

Sound cues can use absolute timing:

{
  "type": "footsteps",
  "start_ms": 31000,
  "duration_ms": 9000,
  "level": 0.2
}

Or production-friendly relative anchors:

{
  "type": "laptop_close",
  "duration_ms": 1200,
  "level": 0.45,
  "anchor": {
    "type": "after_line",
    "line_id": "wife_asks_walk",
    "offset_ms": 500
  }
}

Supported anchor types:

scene_start with offset_ms
scene_end with offset_ms
before_line with line_id and offset_ms
after_line with line_id and offset_ms

During render, ActVoice measures each generated speech line, writes a timing_map into the render manifest, resolves anchors into final start_ms, and mixes SFX from those resolved positions. This keeps the core AI-free while letting any MCP agent build sound design from text.

Development

python3 -m pytest -q

If pytest is unavailable, install dev dependencies first.

Notes

data/ is local runtime storage and is gitignored.
MCP should call the same service layer as REST; do not duplicate business logic.
Sound cues are semantic (footsteps, water_drip) instead of raw filenames.
SFX rendering prefers Openverse CC0 audio for known cue types (brook, birds, footsteps, laptop_close) and falls back to synthetic generation only when search/download fails or no mapping exists.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
tests		tests
.gitignore		.gitignore
README.md		README.md
plan.md		plan.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_actvoice.py		run_actvoice.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ActVoice

MVP Modes

Requirements

Run API

Run MCP Server

TTS Provider Chain

Agent Registration / Minimal Anti-Spam

Basic API Flow

Production Sound Timing

Development

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ActVoice

MVP Modes

Requirements

Run API

Run MCP Server

TTS Provider Chain

Agent Registration / Minimal Anti-Spam

Basic API Flow

Production Sound Timing

Development

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages