A local, multimodal conversational AI assistant. Single Go binary with an embedded React frontend, talking to local inference servers for speech, vision, and LLM.
The interactive surface is at / (Immersive — camera + talking head),
/onboard (first-run wizard), and /admin (configuration panel).
- Speech-to-text via sherpa-onnx (Whisper / Moonshine models)
- LLM inference via
llama.cppserver (OpenAI-compatible endpoint) - Text-to-speech via sherpa-onnx (Kokoro voices)
- Face recognition via dlib / go-face (128-dim embeddings, per-person memory across sessions)
- Tool calling — Home Assistant control, Obsidian vault read/write, Spotify playback, web search (SearXNG), local Wikipedia semantic search, per-person memory (Qdrant), timers, generic MCP servers
- Autonomous background tasks with named profiles:
- researcher — multi-step web/wiki research with Obsidian write-up
- coder — bash + file editing in a workspace, autonomous code changes
- default — general-purpose loops, scheduled or ad-hoc, with findings persisted back to Qdrant
- Ambient sensors — periodic observers (Home Assistant state, time-of-day, Spotify now-playing, MCP pushes) that feed the assistant without the user asking
- Skills — markdown capability guides (e.g. "weekly review", "research note organization") that get semantically routed into the prompt only on turns where they apply
- Self-improvement loop — the LLM can propose new tools, sensors,
prompt changes, and skills via dedicated proposal queues in
/admin
# 1. Install model files under deploy/models/ — see docs/running.md.
# Always: whisper-small-en/, kokoro-en-v0_19/, dlib/.
# Plus a GGUF only if you'll run the bundled llama-server (step 5);
# skip if you're pointing at a hosted endpoint or Ollama instead.
# 2. Configuration
cp .env.example .env
$EDITOR .env # set CHAT_URL, CHAT_MODEL at minimum
# 3. Bootstrap (npm install + buf generate)
task setup
# 4. Backing services (Dolt :3307, Qdrant :6333, SearXNG :8888)
task up
# 5. Optional: spawn the llama-server container if you have an NVIDIA GPU
task up:llm
# 6. Build + run
task build
task run # ./zarl, with .env loaded by TaskfileFor frontend hot-reload during development, run the binary in one
terminal and task frontend:dev in another (Vite on :5173, proxies
RPC to :8080).
The Dockerfile produces a slim Debian image (~150 MB) with the Go
binary, dlib, cblas, and libjpeg already wired up.
docker build -t zarl:local .
task up # backing services
docker run --rm --network host \
--env-file .env \
-v "$MODELS_DIR:/models:ro" \
zarl:localCI builds and tests this image on every PR.
task doctor # preflight: toolchain, .env, models, servicesOpen http://localhost:8080. On a fresh database go to
http://localhost:8080/onboard first to enrol your face, voice, and
agent settings via the wizard. After that, / is the conversational
view, /admin is the admin panel.
Smoke test:
curl -fsS http://localhost:8080/ # 200 = SPA served
curl -fsS http://localhost:8081/v1/models # llama-server (only with task up:llm)
curl -fsS http://localhost:6333/healthz # qdrantThe default config targets a single 24 GB NVIDIA GPU running Qwen3.6-35B locally. You don't need that — zarl talks to any OpenAI-compatible endpoint.
- Mac / Linux without an NVIDIA GPU: Ollama replaces the llama-server container.
- Smaller NVIDIA GPU (8–16 GB): swap in a smaller GGUF and adjust
deploy/docker-compose.yml. - Hosted endpoint (zero local compute): point at OpenRouter, Groq, Together, etc.
Recipes for each tier in
docs/running.md.
Single binary. Protobuf is the source of truth for the API (ConnectRPC, gRPC-Web compatible). Go backend serves both the RPC API and the embedded React SPA.
cmd/zarl/ entry point, wires everything
service/ business logic (LLM, STT, TTS, face, session, tools)
repository/ data access (sqlc against Dolt / MySQL)
qdrant/ vector store client (memory, wiki, task findings)
transport/grpc/ ConnectRPC handlers
proto/zarl/v1/ API contract (.proto files)
migrations/ DB schema (consumed by sqlc + mounted into Dolt)
frontend/ React 19 + Vite + Tailwind v4
taskrunner/ autonomous background task loops
sensor/ periodic ambient observers
subscribers/ event-bus subscribers (session lifecycle, memory, etc.)
events/ in-process bus that sensors and subscribers ride
tools/ tool implementations (homeassistant/, memory/, mcp/, ...)
deploy/ docker-compose.yml + searxng config + models/ (gitignored)
For request flow, the tool system, taskrunner internals, and where
state lives, see
docs/architecture.md.
Essentials (full table in
docs/running.md):
| Variable | Default | Purpose |
|---|---|---|
CHAT_URL |
— | OpenAI-compatible chat endpoint |
CHAT_MODEL |
— | Model name on that endpoint |
MODELS_DIR |
./deploy/models |
STT/TTS/dlib/GGUF root (fixed subpaths) |
DOLT_DSN |
root:@tcp(localhost:3307)/zarl?parseTime=true |
Database DSN |
EMBED_URL |
http://localhost:11434/v1 |
OpenAI-compatible /v1 embeddings endpoint |
.env.example is the canonical reference; copy it.
See CONTRIBUTING.md for branch/style/commit
conventions.
MIT — see LICENSE.





