proxy: gate cluster routing on Anthropic 95% rate-limit utilization#108
proxy: gate cluster routing on Anthropic 95% rate-limit utilization#108steventohme wants to merge 5 commits into
Conversation
While the unified rate-limit utilization Anthropic returns on every
Messages response (`anthropic-ratelimit-unified-{5h,weekly}-utilization`)
stays below a configurable threshold, the router now skips the cluster
scorer and proxies straight to api.anthropic.com with the
caller-requested model. Once either window crosses the threshold the
scorer takes over and may substitute a cheaper / non-Anthropic model.
Defaulted OFF behind ROUTER_USAGE_BYPASS_ENABLED=false so the observer
records utilization in staging without changing routing behavior;
flip the flag to enable the gate.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixed (both flagged by Cursor Bugbot + Security Reviewer): - byokCredentialsFromContext type-asserted the wrong shape (map vs []*auth.ExternalAPIKey), so BYOK credentials were silently invisible to the bypass gate and the resolver fell through to header-supplied x-api-key values. Now routes through externalKeysFromContext + BuildCredentialsMap, matching the pattern used elsewhere in the service. - The bypass branch ran before installation-level model exclusions, letting tenants reach Anthropic models that policy had denied. ProxyMessages now consults excludedModelsForRequest first and falls through to runTurnLoop when the requested model is on the deny list so the tier clamp can substitute an allowed model. Added regression tests covering both paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… mismatch Fixed: - Gate shouldBypassToAnthropic on Observer.Enabled(): without it, a non-nil-but-disabled observer caused every authed request to skip the scorer, since Observer.ShouldBypassRouting returns true when disabled. - Mirror resolveAndInjectCredentials' router-key auth check: on installation-authed requests, do not fall back to inbound x-api-key client headers. Closes the trust-boundary mismatch where a caller could control the gate's key (and force cold-start bypass) while the upstream used the deployment key.
Bugbot flagged that Observer.entries grew without bound: RecordObservation adds entries, Latest treats stale ones as cold-start but never deletes them. Under BYOK credential rotation the map would grow indefinitely. Sweep stale entries on the write path once per TTL interval, reusing the write lock we're already holding.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2fd3f7e. Configure here.
| o.lastSweep = now | ||
| } | ||
| o.mu.Unlock() | ||
| } |
There was a problem hiding this comment.
Partial observation silently overwrites prior near-limit reading
Medium Severity
RecordObservation replaces the entire previous Observation when HasSignal() is true, but HasSignal only requires one window to have a valid reading. If a response carries e.g. 5h-utilization: 0.3 but drops the weekly-utilization header, the new observation {FiveHour: 0.3, Weekly: -1} passes HasSignal and overwrites a previous {FiveHour: 0.2, Weekly: 0.99}. Since ShouldBypassRouting treats -1 as below-threshold, the gate incorrectly opens despite the weekly window actually being near the cap. The existing test only covers the both-missing case; a partial-signal response silently clears a known near-limit reading.
Reviewed by Cursor Bugbot for commit 2fd3f7e. Configure here.
| "proxy_err", proxyErr, | ||
| ) | ||
| return proxyErr | ||
| } |
There was a problem hiding this comment.
Bypass path drops all telemetry and OTel observability
Medium Severity
bypassToAnthropic emits no OTel spans, persists no telemetry DB rows, and doesn't wrap the response writer in a UsageExtractor. When the feature flag is enabled and utilization is below threshold (the common case), bypassed requests become invisible to the dashboard cost panel, distributed tracing, and all token/cost accounting. The normal ProxyMessages path records both a router.decision and router.upstream span plus a telemetry row; the bypass path only emits a structured log line.
Reviewed by Cursor Bugbot for commit 2fd3f7e. Configure here.


Summary
internal/proxy/usagepackage observes theanthropic-ratelimit-unified-{5h,weekly}-utilizationheaders Anthropic returns on every Messages response (the same data theclaude /usageCLI surfaces) and caches them per upstream credential with a TTL.proxy.Service.ProxyMessagesnow short-circuits straight to the Anthropic provider with the caller-requested model when both windows are below threshold; once either crosses, the existing cluster scorer + planner path runs unchanged.WithUsageRecorderhook so every response (routed or bypassed) keeps the observer primed.ROUTER_USAGE_BYPASS_ENABLED=false— the observer still records utilization in staging so we can verify the signal, but routing behavior is identical to today until the flag is flipped on.Config
ROUTER_USAGE_BYPASS_ENABLED(defaultfalse)ROUTER_USAGE_BYPASS_THRESHOLD(default0.95)ROUTER_USAGE_OBSERVATION_TTL(default10m)Bypass contract:
Test plan
go build ./...,go vet ./...,go test ./...cleaninternal/proxy/usageunit tests cover cold-start, threshold edges, missing-signal-doesn't-overwrite, TTL expiry, and concurrent Record/ShouldBypass under-race-friendly contentionproxy.Servicetests assert scorer is NOT called when bypass engages (and the caller-requested model wins) and IS called once utilization crosses thresholdwv mronce merged + gitlink bump: confirmx-router-decision: usage_bypasson a cold start andx-router-model: claude-...swap when threshold is lowered to 0🤖 Generated with Claude Code