Skip to content

web3blind/actvoice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ActVoice

Accessible-first audio drama production service for humans and AI agents.

Domain target: actvoice.xyz

ActVoice is designed as a server-side audio drama studio:

  • humans use a web/API interface;
  • AI agents use MCP/API tools;
  • the server stores project manifests;
  • the server renders voices, ambience, cues, and final MP3 artifacts;
  • no built-in LLM dependency is required for the MVP.

MVP Modes

TTS provider modes:

  1. edge — free/default neural Microsoft voices.
  2. rhvoice — local/offline fallback and explicit open-source mode.
  3. openai_byo_key — future user-provided OpenAI key, fast/paid by user.

Current implementation includes EdgeTTSProvider, RHVoiceProvider, an Edge→RHVoice fallback chain, SQLite-backed project/agent/job storage, in-process RenderQueue, Openverse CC0 SFX lookup, timing anchors, and downloadable artifact endpoints.

Requirements

Ubuntu packages:

apt-get install -y rhvoice rhvoice-russian rhvoice-english ffmpeg

Python dependencies include edge-tts. If Edge is unavailable at runtime, ActVoice falls back to RHVoice.

Python:

python3 -m pip install -e .[dev]

Run API

uvicorn app.main:app --host 0.0.0.0 --port 8080

With aaPanel Python Manager, configure the same command as the managed Python app/start command. aaPanel can replace a separate systemd service for process supervision if it is configured for auto-start after reboot, restart on crash, environment variables, and log access.

Runtime data is stored under data/ by default:

  • data/actvoice.sqlite3 — SQLite database for projects, render jobs, and agent key hashes.
  • data/projects/PROJECT_ID/ — rendered WAV/MP3/manifest artifacts and caches.

Back up both the SQLite file and data/projects/.

Run MCP Server

Stdio MCP server for local clients such as Hermes:

python -m app.mcp_server

Hermes config example:

mcp_servers:
  actvoice:
    command: "python"
    args: ["-m", "app.mcp_server"]
    env:
      ACTVOICE_API_KEY: "[REDACTED]"

Run from the project directory or set PYTHONPATH to the repo root. The MCP server exposes the same service layer as the REST API.

HTTP/streamable transport is available for later deployment:

ACTVOICE_MCP_TRANSPORT=streamable-http python -m app.mcp_server

For MVP, write tools still require an ActVoice key, either through ACTVOICE_API_KEY or an api_key tool argument.

TTS Provider Chain

Default free mode is Edge first with RHVoice fallback:

edge -> rhvoice

Useful runtime modes:

# Default quality-first free mode
export ACTVOICE_TTS_MODE=free
export ACTVOICE_TTS_DEFAULT=edge

# Fully local/offline open-source mode
export ACTVOICE_TTS_MODE=rhvoice

# Optional RHVoice fallback voice when Edge fails
export ACTVOICE_RHVOICE_FALLBACK_VOICE=aleksandr

Characters default to provider=edge and voice=ru-RU-DmitryNeural. Set provider=rhvoice and voice=aleksandr when a project must use local RHVoice only. The render manifest records the actual provider/voice for each line plus whether fallback was used.

Agent Registration / Minimal Anti-Spam

ActVoice does not implement full login/password auth in the MVP. Instead, agents register and receive a bearer key.

Optional invite protection:

export ACTVOICE_REGISTRATION_CODE='change-me'

Register an agent:

curl -X POST http://localhost:8080/api/agents/register \
  -H 'content-type: application/json' \
  -d '{"agent_name":"Hermes","purpose":"demo","registration_code":"change-me"}'

The response includes an api_key once. Write/render endpoints require:

Authorization: Bearer [REDACTED]

This is intentionally simpler than full MCP/OAuth auth, but prevents casual write/render spam by just knowing the endpoint.

Basic API Flow

ACTVOICE_KEY='paste-the-registration-response-key-here'

curl -X POST http://localhost:8080/api/projects \
  -H "authorization: Bearer $ACTVOICE_KEY" \
  -H 'content-type: application/json' \
  -d '{"title":"Demo","language":"ru"}'

Then add characters, scenes, lines, sound cues, and render.

Render requests are queued and return 202 Accepted with a durable SQLite-backed job id:

curl -X POST http://localhost:8080/api/projects/PROJECT_ID/render \
  -H "authorization: Bearer $ACTVOICE_KEY"

Poll job status:

curl http://localhost:8080/api/jobs/JOB_ID

When done, fetch metadata and files:

GET /api/projects/PROJECT_ID/artifact
GET /api/projects/PROJECT_ID/artifact.mp3
GET /api/projects/PROJECT_ID/artifact.wav
GET /api/projects/PROJECT_ID/render-manifest.json

Render jobs are persisted in SQLite. Completed/failed jobs survive aaPanel or process restarts. If a process restarts while a job is rendering, ActVoice marks it back to queued and can resume it on startup.

Production Sound Timing

ActVoice does not need a built-in AI to place sounds. External agents or human tools send deterministic timing instructions; the server resolves them during render.

Sound cues can use absolute timing:

{
  "type": "footsteps",
  "start_ms": 31000,
  "duration_ms": 9000,
  "level": 0.2
}

Or production-friendly relative anchors:

{
  "type": "laptop_close",
  "duration_ms": 1200,
  "level": 0.45,
  "anchor": {
    "type": "after_line",
    "line_id": "wife_asks_walk",
    "offset_ms": 500
  }
}

Supported anchor types:

  • scene_start with offset_ms
  • scene_end with offset_ms
  • before_line with line_id and offset_ms
  • after_line with line_id and offset_ms

During render, ActVoice measures each generated speech line, writes a timing_map into the render manifest, resolves anchors into final start_ms, and mixes SFX from those resolved positions. This keeps the core AI-free while letting any MCP agent build sound design from text.

Development

python3 -m pytest -q

If pytest is unavailable, install dev dependencies first.

Notes

  • data/ is local runtime storage and is gitignored.
  • MCP should call the same service layer as REST; do not duplicate business logic.
  • Sound cues are semantic (footsteps, water_drip) instead of raw filenames.
  • SFX rendering prefers Openverse CC0 audio for known cue types (brook, birds, footsteps, laptop_close) and falls back to synthetic generation only when search/download fails or no mapping exists.

About

Accessible-first audio drama studio with REST/MCP control, RHVoice TTS, Openverse SFX, and server-side rendering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages