Skip to content

proxy: gate cluster routing on Anthropic 95% rate-limit utilization#108

Open
steventohme wants to merge 5 commits into
mainfrom
steven/anthropic-usage-bypass
Open

proxy: gate cluster routing on Anthropic 95% rate-limit utilization#108
steventohme wants to merge 5 commits into
mainfrom
steven/anthropic-usage-bypass

Conversation

@steventohme
Copy link
Copy Markdown
Collaborator

Summary

  • New internal/proxy/usage package observes the anthropic-ratelimit-unified-{5h,weekly}-utilization headers Anthropic returns on every Messages response (the same data the claude /usage CLI surfaces) and caches them per upstream credential with a TTL.
  • proxy.Service.ProxyMessages now short-circuits straight to the Anthropic provider with the caller-requested model when both windows are below threshold; once either crosses, the existing cluster scorer + planner path runs unchanged.
  • Anthropic provider adapter gains an optional WithUsageRecorder hook so every response (routed or bypassed) keeps the observer primed.
  • Default ROUTER_USAGE_BYPASS_ENABLED=false — the observer still records utilization in staging so we can verify the signal, but routing behavior is identical to today until the flag is flipped on.

Config

  • ROUTER_USAGE_BYPASS_ENABLED (default false)
  • ROUTER_USAGE_BYPASS_THRESHOLD (default 0.95)
  • ROUTER_USAGE_OBSERVATION_TTL (default 10m)

Bypass contract:

  • Cold start (no observation) → bypass (passthrough). First request feeds the observer.
  • Stale observation past TTL → bypass.
  • Either window ≥ threshold → engage cluster routing.
  • Feature flag off → always bypass (legacy behavior).
  • No resolvable Anthropic credential → run scorer.

Test plan

  • go build ./..., go vet ./..., go test ./... clean
  • New internal/proxy/usage unit tests cover cold-start, threshold edges, missing-signal-doesn't-overwrite, TTL expiry, and concurrent Record/ShouldBypass under -race-friendly contention
  • New Anthropic-client tests assert recorder fires once with parsed headers AND skips when no credential is resolvable
  • New proxy.Service tests assert scorer is NOT called when bypass engages (and the caller-requested model wins) and IS called once utilization crosses threshold
  • Local smoke with wv mr once merged + gitlink bump: confirm x-router-decision: usage_bypass on a cold start and x-router-model: claude-... swap when threshold is lowered to 0

🤖 Generated with Claude Code

steventohme and others added 2 commits May 13, 2026 19:45
While the unified rate-limit utilization Anthropic returns on every
Messages response (`anthropic-ratelimit-unified-{5h,weekly}-utilization`)
stays below a configurable threshold, the router now skips the cluster
scorer and proxies straight to api.anthropic.com with the
caller-requested model. Once either window crosses the threshold the
scorer takes over and may substitute a cheaper / non-Anthropic model.

Defaulted OFF behind ROUTER_USAGE_BYPASS_ENABLED=false so the observer
records utilization in staging without changing routing behavior;
flip the flag to enable the gate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread internal/proxy/service.go Outdated
Comment thread internal/proxy/service.go
Comment thread internal/proxy/service.go Outdated
Fixed (both flagged by Cursor Bugbot + Security Reviewer):
- byokCredentialsFromContext type-asserted the wrong shape (map vs
  []*auth.ExternalAPIKey), so BYOK credentials were silently invisible
  to the bypass gate and the resolver fell through to header-supplied
  x-api-key values. Now routes through externalKeysFromContext +
  BuildCredentialsMap, matching the pattern used elsewhere in the
  service.
- The bypass branch ran before installation-level model exclusions,
  letting tenants reach Anthropic models that policy had denied.
  ProxyMessages now consults excludedModelsForRequest first and falls
  through to runTurnLoop when the requested model is on the deny list
  so the tier clamp can substitute an allowed model.

Added regression tests covering both paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread internal/proxy/service.go Outdated
Comment thread internal/proxy/service.go
Comment thread internal/proxy/service.go
… mismatch

Fixed:
- Gate shouldBypassToAnthropic on Observer.Enabled(): without it, a
  non-nil-but-disabled observer caused every authed request to skip the
  scorer, since Observer.ShouldBypassRouting returns true when disabled.
- Mirror resolveAndInjectCredentials' router-key auth check: on
  installation-authed requests, do not fall back to inbound x-api-key
  client headers. Closes the trust-boundary mismatch where a caller
  could control the gate's key (and force cold-start bypass) while the
  upstream used the deployment key.
Comment thread internal/proxy/usage/observer.go
Bugbot flagged that Observer.entries grew without bound: RecordObservation
adds entries, Latest treats stale ones as cold-start but never deletes
them. Under BYOK credential rotation the map would grow indefinitely.

Sweep stale entries on the write path once per TTL interval, reusing the
write lock we're already holding.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2fd3f7e. Configure here.

o.lastSweep = now
}
o.mu.Unlock()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial observation silently overwrites prior near-limit reading

Medium Severity

RecordObservation replaces the entire previous Observation when HasSignal() is true, but HasSignal only requires one window to have a valid reading. If a response carries e.g. 5h-utilization: 0.3 but drops the weekly-utilization header, the new observation {FiveHour: 0.3, Weekly: -1} passes HasSignal and overwrites a previous {FiveHour: 0.2, Weekly: 0.99}. Since ShouldBypassRouting treats -1 as below-threshold, the gate incorrectly opens despite the weekly window actually being near the cap. The existing test only covers the both-missing case; a partial-signal response silently clears a known near-limit reading.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2fd3f7e. Configure here.

Comment thread internal/proxy/service.go
"proxy_err", proxyErr,
)
return proxyErr
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bypass path drops all telemetry and OTel observability

Medium Severity

bypassToAnthropic emits no OTel spans, persists no telemetry DB rows, and doesn't wrap the response writer in a UsageExtractor. When the feature flag is enabled and utilization is below threshold (the common case), bypassed requests become invisible to the dashboard cost panel, distributed tracing, and all token/cost accounting. The normal ProxyMessages path records both a router.decision and router.upstream span plus a telemetry row; the bypass path only emits a structured log line.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2fd3f7e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant