auth+router: per-installation routing alpha (quality vs cost knob)#94
Open
steventohme wants to merge 2 commits into
Open
auth+router: per-installation routing alpha (quality vs cost knob)#94steventohme wants to merge 2 commits into
steventohme wants to merge 2 commits into
Conversation
Surfaces the AvengersPro α-blend (`score = α·p̃ + (1−α)·(1−q̃)`) as a per-installation knob: each installation row stores `routing_alpha` (SMALLINT 0..10), the cluster Multiversion router maps that value to a pre-baked `v<stem>-a<NN>` artifact bundle, and the self-hosted dashboard exposes a slider between "lowest cost" and "highest quality" in 0.1 steps. α is not a neural-net parameter — it is a scalar blend over pre-computed per-cluster price/quality tables. Tuning it across installations does not require training/deploying separate models; it requires producing 11 pre-blended ranking bundles, one per α step. The `scripts/train_alpha_sweep.sh` wrapper drives the sweep. This supersedes the old "Don't add … runtime α knob" rule in CLAUDE.md / AGENTS.md — per-installation persisted α has shipped; the per-request `x-weave-routing-alpha` header override remains P1, deferred until a customer asks. - Migration 0004 adds `routing_alpha SMALLINT NOT NULL DEFAULT 5 CHECK (BETWEEN 0 AND 10)`; the API-key hot-path join returns it inline so the request path stays single-roundtrip. - `auth.Service.SetInstallationRoutingAlpha` validates the range and reuses the write-through-by-TTL cache pattern from excluded models. - `cluster.Multiversion` parses an `-a<NN>` suffix on the default version, populates an `alpha→bundle` map at boot, and dispatches: header override > installation α > default. Missing α bundles fall back to the default scorer with a Debug log (legacy bundles ignore α entirely). - `/admin/v1/routing-alpha` GET/PUT under the admin-cookie mgmt group. - Self-hosted dashboard `/ui/settings` gains a "Quality vs cost" section with a slider; commits on release via the same write-through-by-TTL contract.
Trailing comments on the no-match cases were over-padded; gofmt collapses the alignment when the longest line has no trailing comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lets each installation pick its own quality-vs-cost tradeoff for the cluster router in 0.1 steps via the self-hosted dashboard.
routing_alpha SMALLINT 0..10onmodel_router_installations(5 = α=0.5, matches today's shipped global default). Hot-path API-key join returns it inline.cluster.Multiversionparses an-a<NN>suffix on the default version, builds an α→bundle map at boot, and dispatches: header override > installation α > default. Missing α bundles fall back silently with a Debug log; legacy bundles (no-aNNsuffix) ignore α entirely.auth.Service.SetInstallationRoutingAlphavalidates the range (ErrAlphaOutOfRange); persistence rides the same write-through-by-TTL contract as excluded models./admin/v1/routing-alphaGET/PUT under the admin-cookie mgmt group./ui/settingsgains a Quality vs cost section with a native range slider, labeledLowest cost / Default (0.5) / Highest quality, displaying the canonical α float.scripts/train_alpha_sweep.sh(in the WorkWeave outer repo, not this submodule) drives 11 alpha bundles per stem; only the-a05bundle is promoted tolatest.CLAUDE.md/AGENTS.mdupdated — the old "Don't add a runtime α knob" rule is replaced with the per-installation contract; per-requestx-weave-routing-alphaheader remains P1.Why α can be tunable without retraining N models
α is a scalar in the blend
score = α·p̃ + (1−α)·(1−q̃)over pre-computed per-cluster price (p̃) and quality (q̃) tables. Today it's pre-blended at training time and only the result is shipped inrankings.json. Producing 11 alpha variants is 11 reruns of the existing training script with the same upstream embeddings — no model training. The runtime selects the matching bundle at request time.Test plan
go build ./...,go vet ./...cleango test ./...all green; new tests cover:parseAlphaSuffixedge cases (no suffix, out-of-range, non-numeric, single-digit)Multiversionα dispatch: default-on-unset, exact picks at 0/5/10, header beats α, missing bundle falls back, legacy default ignores αauth.Service.SetInstallationRoutingAlpha: persistence withexternal_idscoping, boundary values,ErrAlphaOutOfRangebelow/above rangewv db run-migrations -r) — verify default5backfills, thenwv db undo-migration -rwv mrwithROUTER_CLUSTER_BUILD_ALL_VERSIONS=trueafter the α sweep lands; seed two API keys with different α values and confirm the routing decision flips for the same prompt-a02vs-a08shows quality/cost numbers move in the expected directionmouseup/touchend/keyupwithout spamming the PUT endpointOut of scope (follow-ups)
v<X.Y>-a00 … -a10). Land this PR first; ship bundles in a follow-up so reviewers don't have to wade through ~11× the artifact diff. Until they land, the runtime serves the legacyv0.37default and treats α as a no-op (the test suite covers that legacy fallback).x-weave-routing-alphaheader override (P1 — defer until a customer asks).🤖 Generated with Claude Code