feat(router): runtime-tunable routing knobs via v2 bundle format#214
Merged
Conversation
Collapse the (alpha, speed_weight, output_cost_ratio,
expected_output_tokens, per_model_verbosity) sweep from N committed
bundles into 1 bundle + per-request headers. The v2 artifact format
ships the components of the alpha-blend (not the blended scalar);
runtime reconstructs the blend exactly. No retrain required when only
these five knobs change.
Runtime
-------
- v2 bundle loader in artifacts.go: probes for quality_means.json,
falls back to v1 rankings.json. New Bundle fields: QualityMeans,
ModelAxes, MedianVerbosity, IsV2.
- Scorer.Route implements the v2 blend math per the plan's section 4,
including degenerate-range guards, speed redistribution for models
missing AA timing, and pure-speed fallback to qNorm.
- Per-cluster alpha vector with scalar-override fan-out: a header
x-weave-routing-alpha uniformly overwrites every alpha_vec[k]
(sledgehammer behavior; calibration-preserving overrides deferred).
- ErrInvalidRoutingKnobs sentinel mapped to HTTP 400 in Anthropic,
OpenAI, Gemini, and /v1/route handlers.
Middleware + plumbing
---------------------
- New x-weave-routing-{alpha, speed-weight, output-cost-ratio,
expected-output-tokens, per-model-verbosity} headers parsed in
middleware/routing_knobs_override.go.
- router.Request gains RoutingKnobs *Overrides; proxy/api request
builders for all four surfaces copy parsed overrides into the
routing request.
- bucketKey in cache.go gains clusterVersion + knobsHash so requests
with different effective knobs (or running under
ROUTER_CLUSTER_BUILD_ALL_VERSIONS) get isolated cache buckets. Fixes
a latent correctness bug that pre-existed this work.
Artifact layout
---------------
- Move v0.21-v0.52 to artifacts/legacy/. The loader's
bundleDirForVersion resolves either root or legacy transparently.
- v0.53 is the first v2 candidate: includes quality_means.json,
model_axes.json, default_routing_knobs, and recommended_ui_defaults
in metadata.yaml. Co-hosts rankings.json so the diff test can run.
- READMEs in artifacts/ and artifacts/legacy/ document the split.
Tests
-----
- Cache: TestBucketKeyIsolatesByVersion / IsolatesByKnobs prove the
cache fix.
- Scorer: TestDegenerateRangeFallthrough,
TestSpeedRangeZeroFallsBackToTwoAxis,
TestKnobOverrideOnlyReweights, TestAlphaScalarReplacesVector,
TestZeroAATimingFallback,
TestPureSpeedMissingTimingFallsBackToQuality,
TestInvalidEffectiveKnobs.
- Artifacts: TestListVersions_FlattensLegacyAndOmitsPseudoName,
TestResolveVersion_LegacyBundleIsReachable.
- Diff test (release gate, build-tagged "diff_v2 onnx_integration
ORT"): TestV2MatchesV1 routes a 1000-prompt fixture through both v1
and v2 scorers from the same bundle dir, asserts >=99% top-1
agreement, dumps divergences CSV. TestDiffV2PromptsFixtureSHA pins
the fixture SHA so drift fails CI before the gate runs.
LoadBundleFromDir / LoadBundleFromFS / LoadBundleV1Only added to
artifacts.go to support the diff test driver loading bundles from a
non-embedded path.
cf4498f to
6d7e4e6
Compare
…lice Two PR review findings on the v2 runtime-tunable knobs: 1. NaN/Inf bypass (middleware/routing_knobs_override.go) — strconv.ParseFloat accepts "NaN", "Inf", "-Inf" as valid floats, and the range check `val < 0 || val > 1` short-circuits on NaN (every comparison with NaN returns false). An authenticated caller could send `x-weave-routing-alpha: NaN` and force undefined routing math. Add explicit math.IsNaN / math.IsInf guards on Alpha, SpeedWeight, and OutputCostRatio. 2. Alpha slice aliasing (cluster/scorer.go) — `activeKnobs = *s.metadata .Training.DefaultRoutingKnobs` is a shallow struct copy; the Alpha slice header points at the bundle's backing array. The subsequent per-request override mutation (`activeKnobs.Alpha[i] = *req.Alpha`) wrote through that array, leaking the override into every later request on the same scorer and racing concurrent overrides. Clone the slice immediately after the copy so activeKnobs is a true deep copy. Found by Cursor security/bugbot on PR #214.
- Remove unused LoadBundleFromFS wrapper (Bugbot: dead exported API). LoadBundle covers the embedded path; LoadBundleFromDir covers the on-disk diff-test path. - Fix mixed-timing scoring asymmetry in v2 blend (Bugbot: medium). When sRange > 0, untimed models now get sNorm=1 (worst-case speed, no wS bonus) instead of taking the redistribution branch. This keeps wQ/wC weighting consistent with their timed peers, so the cost axis no longer silently disappears when wC=0. The wS-redistribution path is preserved for the all-untimed/sRange==0 case.
- Validate bundle metadata's default_routing_knobs.alpha length against centroids.K at NewScorer time so a misconfigured bundle fails fast at load rather than masquerading as an HTTP 400 ErrInvalidRoutingKnobs on every v2 request. The override path is a scalar replacement that preserves length, so this is the only place a length mismatch can be introduced. - Remap the request-time alpha-length backstop to ErrClusterUnavailable (HTTP 503) — a mismatch there can only mean a server-side bundle bug. - Validate speed_weight bounds before the per-alpha loop so an out-of-range speed_weight surfaces its own error instead of being shadowed by the combined alpha+speed_weight constraint.
… bucket map Two PR #214 review fixes: - internal/api/openai/responses.go: add the ErrInvalidRoutingKnobs -> HTTP 400 mapping that was already present in completions.go / messages.go / route.go / generate_content.go. Without this, the /v1/responses endpoint surfaces routing-knob validation failures as 502 instead of 400 even though the WithRoutingKnobsOverride middleware is wired for that route. - internal/router/cache/cache.go: replace the unbounded outer buckets map with an LRU of MaxBuckets (default 4096). Bucket identity now includes clusterVersion and knobsHash, both of which are influenceable by the request, so an unbounded outer map let an authenticated caller drive memory growth by varying x-weave-routing-* headers. The per-bucket inner LRU and TTL are unchanged. Test: TestBucketMapEvictsBeyondMaxBuckets pins the outer-map cap. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the log.Warn that the other four ErrInvalidRoutingKnobs handlers (anthropic messages, anthropic route, openai completions, gemini generate_content) already do, so operators can debug 400s from /v1/responses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3e0ccd2. Configure here.
Two PR #214 review fixes on the semantic cache: - Cross-tenant eviction: replaced the global outer bucket LRU with a per-installation LRU (`installationCache.buckets`), capped by `MaxBucketsPerInstallation`. An attacker varying `x-weave-routing-*` headers can now only evict their own buckets, not other tenants'. The outer installation map is capped by `MaxInstallations` (defense in depth — installation IDs come from auth, not headers). - Inner-LRU goroutine leak: dropped `expirable.LRU` for inner buckets. Its v2.0.7 cleanup goroutine has no public shutdown (`Close()` is commented out upstream), so every bucket evicted by the outer cap would leak one goroutine + retain its LRU forever. Inner buckets are now plain `lru.Cache` with `storedAt` stamped on each entry; Lookup evicts expired entries lazily. Tests: - TestBucketMapEvictsBeyondPerInstallationCap (renamed from the global-cap test; same pinning, new field name) - TestPerInstallationBucketCapIsolatesTenants (victim bucket survives 50 attacker knob churns under a cap of 2) - Existing TestCache_TTLExpiry still passes (lazy expiry on Lookup). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Collapse the (alpha, speed_weight, output_cost_ratio, expected_output_tokens, per_model_verbosity) sweep from N committed bundles into one bundle + per-request headers. The v2 artifact format ships the components of the alpha-blend (not the blended scalar); runtime reconstructs the blend exactly. No retrain required when only these five knobs change.
Implements ROUTER_RUNTIME_TUNABLE_KNOBS.md.
What changed
Runtime
quality_means.json; falls back to v1rankings.json.Scorer.Routeimplements the v2 blend math (§4 pseudocode): per-cluster alpha vector, degenerate-range guards, speed redistribution for models missing AA timing, pure-speed fallback to qNorm.x-weave-routing-{alpha,speed-weight,output-cost-ratio,expected-output-tokens,per-model-verbosity}parsed in middleware, threaded through proxy/api torouter.Request.RoutingKnobs.ErrInvalidRoutingKnobs→ HTTP 400 in Anthropic / OpenAI / Gemini //v1/routehandlers.Cache fix (latent bug, pre-existing this work)
bucketKeygainsclusterVersion+knobsHash. Without this, requests differing only on knobs share cache buckets and v0.51-cached responses can be served to v0.52 requests. The newRoutingMetadata.EffectiveKnobsHashis computed once during routing and read by the proxy when building the bucket key.Artifact layout
artifacts/legacy/.bundleDirForVersionresolves either root or legacy transparently;ListVersionsflattens both without leaking thelegacypseudo-name.rankings.jsonfor the diff test).artifacts/andartifacts/legacy/explain the split.Tests
TestBucketKeyIsolatesByVersion/IsolatesByKnobsprove the fix.TestDegenerateRangeFallthrough,TestSpeedRangeZeroFallsBackToTwoAxis,TestKnobOverrideOnlyReweights,TestAlphaScalarReplacesVector,TestZeroAATimingFallback,TestPureSpeedMissingTimingFallsBackToQuality,TestInvalidEffectiveKnobs.TestListVersions_FlattensLegacyAndOmitsPseudoName,TestResolveVersion_LegacyBundleIsReachable.diff_v2 onnx_integration ORT):TestV2MatchesV1routes a 1000-prompt fixture through both v1 and v2 scorers from the same bundle dir, asserts ≥99% top-1 agreement, dumps divergences CSV.TestDiffV2PromptsFixtureSHApins the fixture SHA so drift fails CI before the gate runs. Driver:router-internal/scripts/diff_v2_vs_v1.py(in the workweave repo PR).LoadBundleFromDir/LoadBundleFromFS/LoadBundleV1Onlyadded so the diff test can load bundles from a temp dir.Test plan
go test -tags no_onnx ./...— all 17 packages passgo vet -tags no_onnx ./...cleango vet -tags "diff_v2 onnx_integration ORT" ./internal/router/cluster/cleanTestV2MatchesV1against v0.53 viadiff_v2_vs_v1.py --bundle v0.53(needs ONNX runtime; covered separately)🤖 Generated with Claude Code