Skip to content

auth+router: per-installation routing alpha (quality vs cost knob)#94

Open
steventohme wants to merge 2 commits into
mainfrom
steven/router-routing-alpha-per-installation
Open

auth+router: per-installation routing alpha (quality vs cost knob)#94
steventohme wants to merge 2 commits into
mainfrom
steven/router-routing-alpha-per-installation

Conversation

@steventohme
Copy link
Copy Markdown
Collaborator

Summary

Lets each installation pick its own quality-vs-cost tradeoff for the cluster router in 0.1 steps via the self-hosted dashboard.

  • New routing_alpha SMALLINT 0..10 on model_router_installations (5 = α=0.5, matches today's shipped global default). Hot-path API-key join returns it inline.
  • cluster.Multiversion parses an -a<NN> suffix on the default version, builds an α→bundle map at boot, and dispatches: header override > installation α > default. Missing α bundles fall back silently with a Debug log; legacy bundles (no -aNN suffix) ignore α entirely.
  • auth.Service.SetInstallationRoutingAlpha validates the range (ErrAlphaOutOfRange); persistence rides the same write-through-by-TTL contract as excluded models.
  • /admin/v1/routing-alpha GET/PUT under the admin-cookie mgmt group.
  • /ui/settings gains a Quality vs cost section with a native range slider, labeled Lowest cost / Default (0.5) / Highest quality, displaying the canonical α float.
  • scripts/train_alpha_sweep.sh (in the WorkWeave outer repo, not this submodule) drives 11 alpha bundles per stem; only the -a05 bundle is promoted to latest.
  • CLAUDE.md / AGENTS.md updated — the old "Don't add a runtime α knob" rule is replaced with the per-installation contract; per-request x-weave-routing-alpha header remains P1.

Why α can be tunable without retraining N models

α is a scalar in the blend score = α·p̃ + (1−α)·(1−q̃) over pre-computed per-cluster price () and quality () tables. Today it's pre-blended at training time and only the result is shipped in rankings.json. Producing 11 alpha variants is 11 reruns of the existing training script with the same upstream embeddings — no model training. The runtime selects the matching bundle at request time.

Test plan

  • go build ./..., go vet ./... clean
  • go test ./... all green; new tests cover:
    • parseAlphaSuffix edge cases (no suffix, out-of-range, non-numeric, single-digit)
    • Multiversion α dispatch: default-on-unset, exact picks at 0/5/10, header beats α, missing bundle falls back, legacy default ignores α
    • auth.Service.SetInstallationRoutingAlpha: persistence with external_id scoping, boundary values, ErrAlphaOutOfRange below/above range
  • Migration applied locally (wv db run-migrations -r) — verify default 5 backfills, then wv db undo-migration -r
  • Local wv mr with ROUTER_CLUSTER_BUILD_ALL_VERSIONS=true after the α sweep lands; seed two API keys with different α values and confirm the routing decision flips for the same prompt
  • Eval harness pinned to -a02 vs -a08 shows quality/cost numbers move in the expected direction
  • Dashboard slider persists across reload and survives commit on mouseup / touchend / keyup without spamming the PUT endpoint

Out of scope (follow-ups)

  • The α sweep artifact bundles themselves (v<X.Y>-a00 … -a10). Land this PR first; ship bundles in a follow-up so reviewers don't have to wade through ~11× the artifact diff. Until they land, the runtime serves the legacy v0.37 default and treats α as a no-op (the test suite covers that legacy fallback).
  • Per-request x-weave-routing-alpha header override (P1 — defer until a customer asks).

🤖 Generated with Claude Code

Surfaces the AvengersPro α-blend (`score = α·p̃ + (1−α)·(1−q̃)`) as a
per-installation knob: each installation row stores `routing_alpha`
(SMALLINT 0..10), the cluster Multiversion router maps that value to a
pre-baked `v<stem>-a<NN>` artifact bundle, and the self-hosted dashboard
exposes a slider between "lowest cost" and "highest quality" in 0.1 steps.

α is not a neural-net parameter — it is a scalar blend over pre-computed
per-cluster price/quality tables. Tuning it across installations does not
require training/deploying separate models; it requires producing 11
pre-blended ranking bundles, one per α step. The
`scripts/train_alpha_sweep.sh` wrapper drives the sweep.

This supersedes the old "Don't add … runtime α knob" rule in CLAUDE.md /
AGENTS.md — per-installation persisted α has shipped; the per-request
`x-weave-routing-alpha` header override remains P1, deferred until a
customer asks.

- Migration 0004 adds `routing_alpha SMALLINT NOT NULL DEFAULT 5
  CHECK (BETWEEN 0 AND 10)`; the API-key hot-path join returns it inline
  so the request path stays single-roundtrip.
- `auth.Service.SetInstallationRoutingAlpha` validates the range and
  reuses the write-through-by-TTL cache pattern from excluded models.
- `cluster.Multiversion` parses an `-a<NN>` suffix on the default
  version, populates an `alpha→bundle` map at boot, and dispatches:
  header override > installation α > default. Missing α bundles fall
  back to the default scorer with a Debug log (legacy bundles ignore α
  entirely).
- `/admin/v1/routing-alpha` GET/PUT under the admin-cookie mgmt group.
- Self-hosted dashboard `/ui/settings` gains a "Quality vs cost" section
  with a slider; commits on release via the same write-through-by-TTL
  contract.
Trailing comments on the no-match cases were over-padded; gofmt collapses
the alignment when the longest line has no trailing comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant