Skip to content

webbertakken/studio-worker

studio-worker

Checks Build Coverage

A single self-contained Rust binary that pulls image, LLM, audio (STT/TTS), and video jobs from the minis.gg studio API, runs them locally, and posts the results back.

Install the worker on any PC, register once, and it will hold a hibernatable WebSocket session to the studio API's WorkerConnections Durable Object. The studio pushes job offers over the socket as soon as they're queued; the worker accepts, runs the engine, and posts the result back the same way (or via a single HTTP multipart route for image / audio / video bytes). The worker also auto-updates itself between jobs.

  studio-worker binary <----- WebSocket -----> WorkerConnections DO <-> D1
         ^                                          ^
         |     HTTP multipart /complete             |
         +------------------------------------------+ (binary outputs only)

Replaces the previous push-based studio-proxy + cloudflared topology and the intermediate pull-based polling pipeline. All five legacy worker HTTP routes (heartbeat, claim, complete-json, fail, logs) are now WS frame types.

Tasks supported

Kind Wire kind Synthetic engine (default) Real engine (planned)
Image image real WEBP / PNG via the image crate image-candle / sd-cpp
LLM llm OpenAI-shape JSON (chat.completion) llama (llama.cpp)
Audio STT audio_stt Whisper-shape JSON whisper (whisper.cpp)
Audio TTS audio_tts real WAV (sine wave keyed by hash(text)) tts-piper
Video video real WebP image (single-frame stand-in) video-ffmpeg

The synthetic engine is the default and exercises the full pipeline end-to-end with no GPU, no model downloads, and ~0 ms per task — exactly what the unattended CI suite uses. Real high-performance backends (llama.cpp, whisper.cpp, candle, Piper, ffmpeg) are wired in via feature flags and are deferred to a follow-up iteration (the trait, contract, and dispatch are already in place).

Desktop UI (on by default)

The worker ships a native desktop window built on egui/eframe that surfaces every config knob, the live job in flight, the recent-jobs history, the rolling log tail, and a system-tray icon with Open / Pause-Resume / Quit. It is on by defaultcargo install studio-worker gives you the windowed worker, and studio-worker ui launches it.

The UI build is free of GTK: the window uses eframe/glow (OpenGL via dlopen), notifications use notify-rust (pure-Rust zbus on Linux), and the system tray uses ksni (pure-Rust StatusNotifierItem) on Linux and the native tray-icon APIs on macOS / Windows. So a source build needs no pkg-config, no -dev packages, and no OpenSSL (reqwest + sentry use rustls). Headless rigs can still opt out:

cargo install studio-worker --no-default-features   # service / `run` only

Five tabs:

Tab What it shows
Status Worker id, API URL, VRAM total + threshold, busy / idle / paused badge, last heartbeat age + outcome. When the worker isn't registered, an in-window Register form.
Jobs Current job in flight (kind, model, prompt, elapsed time) + bounded ring of the last 50 finished jobs with completed / failed badges.
Config Every config.toml field as an editable widget grouped into Connection / Worker / Engine / Auto-update / Models / Notifications / Background mode. Save writes through config::save and the runtime picks up new values on the next tick. Engine swaps surface a "restart required" banner.
Logs Level filter (info / warn / error), free-text search across category / message / job id, auto-scroll toggle, windowed at the last 500 entries.
About Version, Sentry release name, resolved config path, "Check for updates" button.

Status tab

The tray icon reflects state (idle = green, busy = amber, disconnected = red) and exposes:

  • Open Window — re-show the window after hide-to-tray.
  • Pause / Resume claiming — toggles auto_enabled, persisted to config.toml.
  • Quit — signals the runtime loops to stop, awaits any in-flight job briefly, then exits.

Closing the window hides it to the tray; the worker keeps running. For an autostart-on-login workflow, tick the Run in tray on login toggle on the Config tab (writes ~/.config/autostart/studio-worker-ui.desktop on Linux, a LaunchAgent plist on macOS, a marker file on Windows).

Build-time deps

None for the UI itself on any platform — that's the point of the GTK-free stack above (no pkg-config, no cairo/gtk -dev packages, no OpenSSL). A standard Rust toolchain is enough.

The all-backends build (--features all, used for the release binaries) additionally compiles llama.cpp in-process, which needs cmake + a C/C++ toolchain. The release runners install cmake automatically (cargo-dist system dependency); for a local cargo install studio-worker --features all make sure cmake and a C++ compiler are on PATH.

Quick install

Linux / macOS

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/webbertakken/studio-worker/releases/latest/download/studio-worker-installer.sh | sh

Windows (PowerShell)

irm https://github.com/webbertakken/studio-worker/releases/latest/download/studio-worker-installer.ps1 | iex

From cargo

cargo install studio-worker              # windowed UI by default
cargo install studio-worker --features all   # + in-process llama.cpp + media (needs cmake)
cargo install studio-worker --no-default-features  # headless service build

The install script is the turnkey path: its pre-built binaries already bundle the UI and every backend (in-process llama.cpp LLM + media engines), auto-start on login, auto-update, and auto-download models on demand — nothing else to install. cargo install studio-worker from source is UI-first but ships only the synthetic engine unless you add --features all (which needs a C/C++ toolchain).

Each release ships pre-built binaries for:

  • x86_64-pc-windows-msvc
  • x86_64-unknown-linux-gnu
  • aarch64-unknown-linux-gnu
  • aarch64-apple-darwin
  • x86_64-apple-darwin

First run

No shared secret to copy around. The worker auto-registers against https://studio.minis.gg on first launch; the studio operator sees a row in the dashboard's Pending Workers panel and clicks Approve, and the worker's next 30s poll picks up its worker_id + auth_token and starts heartbeating. Two ways to launch:

# Windowed (recommended) — Status tab shows 'Waiting for approval'
# until the operator approves.
studio-worker ui

# Headless — same flow, no window; pipe to journalctl in production.
studio-worker run

Optional pre-launch tweaks (none of these talk to the network):

# Pre-set the human label shown in the dashboard's Pending Workers panel.
studio-worker register --label "alice's gaming rig"

# Point at a self-hosted studio instead of studio.minis.gg.
studio-worker register --api-base-url https://my-studio.example.com

# Optionally install the auto-start OS service (systemd --user on Linux,
# launchd on macOS, scheduled task on Windows).  Alternative: the desktop
# UI's Config tab has a `Run in tray on login` toggle.
studio-worker install-service

If your registration is rejected (or you want to move the worker to a different studio), clear the local state and submit a fresh request:

studio-worker register --reset

CLI subcommands

Subcommand Purpose
run Auto-register if needed, then hold the WS session + auto-update loop.
ui (default) Same as run plus the desktop window + tray + notifications. Built unless installed with --no-default-features.
register Persist --label / --api-base-url; --reset clears local state.
status Print the local config + registration state.
install-service Install the auto-start OS service.
uninstall-service Remove the auto-start OS service.
enable Set auto_enabled = true (resume claiming).
disable Set auto_enabled = false (worker online but doesn't claim).
set-threshold <gb> Set the max VRAM (GB) the worker is willing to claim per job.
config Print the resolved config + its on-disk path.
check-update Check the release feed for a newer version (does not install).

Configuration

Config lives at:

  • Linux/macOS — ~/.config/minis-studio-worker/config.toml
  • Windows — %APPDATA%\minis-studio-worker\config.toml
api_base_url        = "https://studio.minis.gg"
worker_id           = "<filled on operator approval>"
auth_token          = "<filled on operator approval>"
vram_threshold_gb   = 12.0                       # max GB per claim
auto_start          = true

# Where on-demand model files are cached (defaults to ~/models).
models_root         = "~/models"

# Auto-update — checks the release feed on the cadence below, applies
# updates only when no job is running, then re-execs the new binary.
auto_update_enabled       = true
auto_update_interval_secs = 1800
auto_update_feed          = "https://api.github.com/repos/webbertakken/studio-worker/releases"
auto_update_prerelease    = false

# WebSocket reconnect cap.  When the session drops the worker tries
# to reconnect with exponential backoff up to this many times before
# exiting non-zero (and letting systemd/launchd/Task-Scheduler
# restart it).  `0` = infinite.  Omit to use the default of 5.
ws_reconnect_attempts     = 5

# Internal state written by the auto-register flow.  Don't edit by hand.
install_id              = "<uuidv4>"
registration_request_id = "<rr-...>"             # cleared on approval
registration_secret     = "<hex>"                # cleared on approval

Registration flow

The worker doesn't ship a shared secret. On first launch:

  1. Generates a per-install UUID + 256-bit registration_secret and keeps both in config.toml. Only the SHA-256 hash of the secret leaves the box.
  2. POSTs /workers/register-request to api_base_url with hostname, username, VRAM, supported models, optional label.
  3. The studio creates a Pending Workers row. The operator sees it in the studio dashboard, clicks Approve (or Reject), and the worker's next 30s poll picks up the decision.
  4. On Approve: worker_id + auth_token written to config.toml, normal heartbeat / claim loops take over.
  5. On Reject: worker stops trying. studio-worker register --reset clears state and the next launch submits a fresh request.

See docs/architecture/overview.md for the full state machine + per-install identity details.

Troubleshooting

  • Worker exits with ws auth failed: ... — the studio API rejected the auth token on the upgrade (HTTP 401) or via a close-code 4001 after a successful upgrade. The token was either revoked, the worker was deleted from the studio admin UI, or config.toml carries a stale token. Clear local state and let the next launch auto-register again: studio-worker register --reset then studio-worker run (or studio-worker ui).
  • Worker exits with ws reconnect cap reached — every reconnect attempt failed (DNS, TLS, or the API is down). Service manager will restart us; if it keeps happening, check the API is reachable from the worker host.

Engines

There's no engine-selection knob in the config. The worker advertises capabilities for every backend compiled into the binary and routes each incoming job to the first backend that supports its (kind, model) pair (see MultiEngine).

  • synthetic (always present, last in the chain) — produces deterministic, real WEBP/PNG/WAV/JSON outputs keyed by SHA-256 of the prompt/text/input. No GPU required. Use for smoke-tests, CI, and end-to-end verification of every modality.
  • sd-cpp — real image inference via stable-diffusion.cpp as a subprocess. Self-registers only when the sd-cli binary and at least one model's files are present under models_root. See docs/engines/sdcpp.md.
  • llama — real LLM inference via llama.cpp linked in-process (llama-cpp-2). Shipped in the release binaries (and any --features all / --features llama build); downloads the GGUF named by the offer's ModelSource into <models_root>/llm/ on demand and advertises the llama-cpp:* wildcard so a fresh worker is claimable.
  • feature-gated heavyweightswhisper (STT), image-candle (pure-Rust SD), video, tts drop in via the same trait when their cargo feature is enabled. whisper and llama each static-link their own ggml, which can't coexist in one binary, so whisper ships in its own bundle (all-engines-stt); the all-backends release pairs llama (in-process) with sd-cli (subprocess) to sidestep the clash.

When the studio offers a model whose engine isn't compiled into the worker, the job fails loudly with an actionable message (install the all-backends release, or rebuild with --features all) rather than silently producing placeholder bytes.

Adding a real engine

Implement the Engine trait under src/engine/ (see SyntheticEngine and SdCppEngine for examples). An engine declares its capabilities (per-kind supported models) and a dispatch(model, task) -> TaskResult function. Wire it into engine::build() behind a cargo feature, e.g.:

[features]
llama = ["dep:llama-cpp-2"]

The trait is already kind-aware so a single binary can host multiple engines (one per modality).

VRAM threshold

The worker reports two numbers to the API:

  • vramTotalGb — physical VRAM on the host (probed from /proc/driver/nvidia on Linux; 0 when no NVIDIA GPU is present).
  • vramThresholdGb — the max estimated VRAM per claim, controlled by the operator via set-threshold or by editing config.toml.

The studio API only hands a job to a worker if job.vramGbEstimate ≤ worker.vramThresholdGb and job.model ∈ worker.supportedModels. Jobs that no worker can take stay queued until either a suitable worker appears or the operator cancels.

Auto-update

A dedicated background task polls the GitHub Releases feed every auto_update_interval_secs (default 30 min). When a higher semver is available the worker:

  1. Confirms no job is currently in flight (per a shared busy flag).
  2. Downloads the cargo-dist installer for the current platform.
  3. Runs it (it overwrites the binary in place).
  4. Re-execs itself so the new code takes over.

Set auto_update_enabled = false to opt out. Set auto_update_prerelease = true to track pre-releases.

Observability

The worker batches log entries every second and pushes them as a logBatch frame over the WS session. The DO ingests them into the workerLogs D1 table; the studio LogViewer reads them from there.

Sentry (opt-in)

The worker integrates with Sentry for crash + error reporting. Disabled by default — set the following env vars before launching to enable it:

Env var Purpose
SENTRY_DSN The project DSN. Telemetry stays off when unset.
SENTRY_ENVIRONMENT Optional environment tag (defaults to production).

When enabled the worker:

  • captures panics automatically (sentry's default panic handler);
  • forwards tracing::error! events as Sentry events;
  • attaches preceding tracing::warn! events as breadcrumbs;
  • tags every event with the worker's release (= studio-worker@<crate version>, the Sentry-conventional namespaced form) and hostname (server_name).

No DSN is baked into the binary, so the public repo never carries credentials. Performance tracing is intentionally off — Sentry is used purely for error/crash visibility.

Development

cargo test                              # default (UI) build
cargo test --no-default-features        # headless core
cargo test --features all               # + llama.cpp + candle (needs cmake)
cargo clippy --tests -- -D warnings
cargo fmt --check
# Coverage gates the headless core (UI rendering isn't unit-testable):
cargo llvm-cov --workspace --no-default-features \
  --ignore-filename-regex 'src/main\.rs$|src/engine/sdcpp\.rs$|src/ws/session\.rs$' \
  --summary-only

Coverage CI enforces ≥ 90% line coverage on the headless core. Truly-untestable bits excluded from the gate:

  • src/main.rs — the CLI bootstrap (all logic lives in lib.rs).
  • src/engine/sdcpp.rs, src/ws/session.rs — subprocess / live-socket paths exercised by the dev loop, not unit tests.
  • the ui feature (egui rendering + OS tray glue) — not unit-testable; excluded by gating coverage on --no-default-features.
  • update::RealRunner::{download, run_installer} — real network + process spawn (tested through the UpdateRunner trait with a fake).
  • update::restart_self — calls execvp, never returns.
  • sys::detect_vram_gb NVIDIA-specific branch — requires NVIDIA hardware.

Integration tests live under tests/:

  • tests/ws_wire.rs — round-trip tests for every WorkerInbound / WorkerOutbound frame against the TS contract.
  • tests/ws_client_contract.rs — the WS client against a live tokio-tungstenite server (upgrade headers, hello roundtrip, 401 → AuthFailed, close 4001 → AuthFailed, binary-frame rejection, close idempotency).
  • tests/ws_session_full_loop.rs — end-to-end walk: hello → welcome → LLM offer → accept + completeJson → STT offer → accept + completeJson → clean close.
  • tests/http_contract.rs — register + multipart complete (image
    • audio) against wiremock.
  • tests/http_errors.rs — error-status paths for register + multipart complete plus the tracing-emission contract.
  • tests/multi_modal.rs — every TaskKind round-trips through the synthetic engine + decoders.
  • tests/auto_update.rs — release feed parsing + apply_with full flow.
  • tests/runtime_helpers.rs — one-shot CLI helpers via wiremock.
  • tests/runtime_ticks.rs — auto-update ticks + run_returns_when_aborted smoke test that exercises the AuthFailed exit path.

Release process

  1. PRs merge to main with conventional-commit titles (feat:, fix:, docs:, etc. — enforced by the Commit lint workflow).
  2. release-please opens a release PR that bumps the version and updates the changelog.
  3. Merging the release PR creates a git tag.
  4. The tag triggers the release.yml workflow (cargo-dist), which builds binaries for all supported targets and uploads them to the GitHub release alongside installer.sh + installer.ps1 one-liners.

Licence

MIT. See LICENSE.

About

Pull-based multi-modal generation worker (image / LLM / audio / video) for the minis.gg studio.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages