Skip to content

feat(router): runtime-tunable routing knobs via v2 bundle format#214

Merged
steventohme merged 8 commits into
mainfrom
steven/router-runtime-tunable-knobs
May 22, 2026
Merged

feat(router): runtime-tunable routing knobs via v2 bundle format#214
steventohme merged 8 commits into
mainfrom
steven/router-runtime-tunable-knobs

Conversation

@steventohme
Copy link
Copy Markdown
Collaborator

Summary

Collapse the (alpha, speed_weight, output_cost_ratio, expected_output_tokens, per_model_verbosity) sweep from N committed bundles into one bundle + per-request headers. The v2 artifact format ships the components of the alpha-blend (not the blended scalar); runtime reconstructs the blend exactly. No retrain required when only these five knobs change.

Implements ROUTER_RUNTIME_TUNABLE_KNOBS.md.

What changed

Runtime

  • v2 bundle loader detects v2 via quality_means.json; falls back to v1 rankings.json.
  • Scorer.Route implements the v2 blend math (§4 pseudocode): per-cluster alpha vector, degenerate-range guards, speed redistribution for models missing AA timing, pure-speed fallback to qNorm.
  • Five header overrides: x-weave-routing-{alpha,speed-weight,output-cost-ratio,expected-output-tokens,per-model-verbosity} parsed in middleware, threaded through proxy/api to router.Request.RoutingKnobs. ErrInvalidRoutingKnobs → HTTP 400 in Anthropic / OpenAI / Gemini / /v1/route handlers.

Cache fix (latent bug, pre-existing this work)

  • bucketKey gains clusterVersion + knobsHash. Without this, requests differing only on knobs share cache buckets and v0.51-cached responses can be served to v0.52 requests. The new RoutingMetadata.EffectiveKnobsHash is computed once during routing and read by the proxy when building the bucket key.

Artifact layout

  • v0.21–v0.52 moved to artifacts/legacy/. bundleDirForVersion resolves either root or legacy transparently; ListVersions flattens both without leaking the legacy pseudo-name.
  • v0.53 is the first v2 candidate (also keeps rankings.json for the diff test).
  • READMEs in artifacts/ and artifacts/legacy/ explain the split.

Tests

  • Cache: TestBucketKeyIsolatesByVersion / IsolatesByKnobs prove the fix.
  • Scorer: TestDegenerateRangeFallthrough, TestSpeedRangeZeroFallsBackToTwoAxis, TestKnobOverrideOnlyReweights, TestAlphaScalarReplacesVector, TestZeroAATimingFallback, TestPureSpeedMissingTimingFallsBackToQuality, TestInvalidEffectiveKnobs.
  • Artifacts: TestListVersions_FlattensLegacyAndOmitsPseudoName, TestResolveVersion_LegacyBundleIsReachable.
  • Release-gate diff test (build-tagged diff_v2 onnx_integration ORT): TestV2MatchesV1 routes a 1000-prompt fixture through both v1 and v2 scorers from the same bundle dir, asserts ≥99% top-1 agreement, dumps divergences CSV. TestDiffV2PromptsFixtureSHA pins the fixture SHA so drift fails CI before the gate runs. Driver: router-internal/scripts/diff_v2_vs_v1.py (in the workweave repo PR).

LoadBundleFromDir / LoadBundleFromFS / LoadBundleV1Only added so the diff test can load bundles from a temp dir.

Test plan

  • go test -tags no_onnx ./... — all 17 packages pass
  • go vet -tags no_onnx ./... clean
  • go vet -tags "diff_v2 onnx_integration ORT" ./internal/router/cluster/ clean
  • Run TestV2MatchesV1 against v0.53 via diff_v2_vs_v1.py --bundle v0.53 (needs ONNX runtime; covered separately)
  • CI green

🤖 Generated with Claude Code

Comment thread internal/server/middleware/routing_knobs_override.go
Collapse the (alpha, speed_weight, output_cost_ratio,
expected_output_tokens, per_model_verbosity) sweep from N committed
bundles into 1 bundle + per-request headers. The v2 artifact format
ships the components of the alpha-blend (not the blended scalar);
runtime reconstructs the blend exactly. No retrain required when only
these five knobs change.

Runtime
-------
- v2 bundle loader in artifacts.go: probes for quality_means.json,
  falls back to v1 rankings.json. New Bundle fields: QualityMeans,
  ModelAxes, MedianVerbosity, IsV2.
- Scorer.Route implements the v2 blend math per the plan's section 4,
  including degenerate-range guards, speed redistribution for models
  missing AA timing, and pure-speed fallback to qNorm.
- Per-cluster alpha vector with scalar-override fan-out: a header
  x-weave-routing-alpha uniformly overwrites every alpha_vec[k]
  (sledgehammer behavior; calibration-preserving overrides deferred).
- ErrInvalidRoutingKnobs sentinel mapped to HTTP 400 in Anthropic,
  OpenAI, Gemini, and /v1/route handlers.

Middleware + plumbing
---------------------
- New x-weave-routing-{alpha, speed-weight, output-cost-ratio,
  expected-output-tokens, per-model-verbosity} headers parsed in
  middleware/routing_knobs_override.go.
- router.Request gains RoutingKnobs *Overrides; proxy/api request
  builders for all four surfaces copy parsed overrides into the
  routing request.
- bucketKey in cache.go gains clusterVersion + knobsHash so requests
  with different effective knobs (or running under
  ROUTER_CLUSTER_BUILD_ALL_VERSIONS) get isolated cache buckets. Fixes
  a latent correctness bug that pre-existed this work.

Artifact layout
---------------
- Move v0.21-v0.52 to artifacts/legacy/. The loader's
  bundleDirForVersion resolves either root or legacy transparently.
- v0.53 is the first v2 candidate: includes quality_means.json,
  model_axes.json, default_routing_knobs, and recommended_ui_defaults
  in metadata.yaml. Co-hosts rankings.json so the diff test can run.
- READMEs in artifacts/ and artifacts/legacy/ document the split.

Tests
-----
- Cache: TestBucketKeyIsolatesByVersion / IsolatesByKnobs prove the
  cache fix.
- Scorer: TestDegenerateRangeFallthrough,
  TestSpeedRangeZeroFallsBackToTwoAxis,
  TestKnobOverrideOnlyReweights, TestAlphaScalarReplacesVector,
  TestZeroAATimingFallback,
  TestPureSpeedMissingTimingFallsBackToQuality,
  TestInvalidEffectiveKnobs.
- Artifacts: TestListVersions_FlattensLegacyAndOmitsPseudoName,
  TestResolveVersion_LegacyBundleIsReachable.
- Diff test (release gate, build-tagged "diff_v2 onnx_integration
  ORT"): TestV2MatchesV1 routes a 1000-prompt fixture through both v1
  and v2 scorers from the same bundle dir, asserts >=99% top-1
  agreement, dumps divergences CSV. TestDiffV2PromptsFixtureSHA pins
  the fixture SHA so drift fails CI before the gate runs.

LoadBundleFromDir / LoadBundleFromFS / LoadBundleV1Only added to
artifacts.go to support the diff test driver loading bundles from a
non-embedded path.
@steventohme steventohme force-pushed the steven/router-runtime-tunable-knobs branch from cf4498f to 6d7e4e6 Compare May 20, 2026 01:01
Comment thread internal/router/cluster/scorer.go
…lice

Two PR review findings on the v2 runtime-tunable knobs:

1. NaN/Inf bypass (middleware/routing_knobs_override.go) — strconv.ParseFloat
   accepts "NaN", "Inf", "-Inf" as valid floats, and the range check
   `val < 0 || val > 1` short-circuits on NaN (every comparison with NaN
   returns false). An authenticated caller could send
   `x-weave-routing-alpha: NaN` and force undefined routing math.
   Add explicit math.IsNaN / math.IsInf guards on Alpha, SpeedWeight,
   and OutputCostRatio.

2. Alpha slice aliasing (cluster/scorer.go) — `activeKnobs = *s.metadata
   .Training.DefaultRoutingKnobs` is a shallow struct copy; the Alpha
   slice header points at the bundle's backing array. The subsequent
   per-request override mutation (`activeKnobs.Alpha[i] = *req.Alpha`)
   wrote through that array, leaking the override into every later
   request on the same scorer and racing concurrent overrides.
   Clone the slice immediately after the copy so activeKnobs is a true
   deep copy.

Found by Cursor security/bugbot on PR #214.
Comment thread internal/router/cluster/artifacts.go Outdated
Comment thread internal/router/cluster/scorer.go
- Remove unused LoadBundleFromFS wrapper (Bugbot: dead exported API).
  LoadBundle covers the embedded path; LoadBundleFromDir covers the
  on-disk diff-test path.

- Fix mixed-timing scoring asymmetry in v2 blend (Bugbot: medium).
  When sRange > 0, untimed models now get sNorm=1 (worst-case speed,
  no wS bonus) instead of taking the redistribution branch. This keeps
  wQ/wC weighting consistent with their timed peers, so the cost axis
  no longer silently disappears when wC=0. The wS-redistribution path
  is preserved for the all-untimed/sRange==0 case.
Comment thread internal/router/cluster/scorer.go Outdated
Comment thread internal/router/cluster/scorer.go Outdated
- Validate bundle metadata's default_routing_knobs.alpha length against
  centroids.K at NewScorer time so a misconfigured bundle fails fast at
  load rather than masquerading as an HTTP 400 ErrInvalidRoutingKnobs on
  every v2 request. The override path is a scalar replacement that
  preserves length, so this is the only place a length mismatch can be
  introduced.
- Remap the request-time alpha-length backstop to ErrClusterUnavailable
  (HTTP 503) — a mismatch there can only mean a server-side bundle bug.
- Validate speed_weight bounds before the per-alpha loop so an
  out-of-range speed_weight surfaces its own error instead of being
  shadowed by the combined alpha+speed_weight constraint.
Comment thread internal/api/openai/completions.go
Comment thread internal/router/cache/cache.go
… bucket map

Two PR #214 review fixes:

- internal/api/openai/responses.go: add the ErrInvalidRoutingKnobs ->
  HTTP 400 mapping that was already present in completions.go /
  messages.go / route.go / generate_content.go. Without this, the
  /v1/responses endpoint surfaces routing-knob validation failures as
  502 instead of 400 even though the WithRoutingKnobsOverride
  middleware is wired for that route.

- internal/router/cache/cache.go: replace the unbounded outer
  buckets map with an LRU of MaxBuckets (default 4096). Bucket
  identity now includes clusterVersion and knobsHash, both of which
  are influenceable by the request, so an unbounded outer map let an
  authenticated caller drive memory growth by varying
  x-weave-routing-* headers. The per-bucket inner LRU and TTL are
  unchanged.

Test: TestBucketMapEvictsBeyondMaxBuckets pins the outer-map cap.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread internal/api/openai/responses.go
Comment thread internal/router/cache/cache.go
Mirror the log.Warn that the other four ErrInvalidRoutingKnobs
handlers (anthropic messages, anthropic route, openai completions,
gemini generate_content) already do, so operators can debug 400s
from /v1/responses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3e0ccd2. Configure here.

Comment thread internal/router/cache/cache.go Outdated
Two PR #214 review fixes on the semantic cache:

- Cross-tenant eviction: replaced the global outer bucket LRU with a
  per-installation LRU (`installationCache.buckets`), capped by
  `MaxBucketsPerInstallation`. An attacker varying `x-weave-routing-*`
  headers can now only evict their own buckets, not other tenants'.
  The outer installation map is capped by `MaxInstallations` (defense
  in depth — installation IDs come from auth, not headers).

- Inner-LRU goroutine leak: dropped `expirable.LRU` for inner buckets.
  Its v2.0.7 cleanup goroutine has no public shutdown (`Close()` is
  commented out upstream), so every bucket evicted by the outer cap
  would leak one goroutine + retain its LRU forever. Inner buckets are
  now plain `lru.Cache` with `storedAt` stamped on each entry; Lookup
  evicts expired entries lazily.

Tests:
- TestBucketMapEvictsBeyondPerInstallationCap (renamed from the
  global-cap test; same pinning, new field name)
- TestPerInstallationBucketCapIsolatesTenants (victim bucket survives
  50 attacker knob churns under a cap of 2)
- Existing TestCache_TTLExpiry still passes (lazy expiry on Lookup).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@steventohme steventohme merged commit 80ce62e into main May 22, 2026
7 checks passed
@steventohme steventohme deleted the steven/router-runtime-tunable-knobs branch May 22, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant