fix(channels): distinguish rate-limit sources in chat error classifier (#2364)#2371
fix(channels): distinguish rate-limit sources in chat error classifier (#2364)#2371CodeGhost21 wants to merge 2 commits into
Conversation
tinyhumansai#2364) User-perception bug from tinyhumansai#2364: any agent-loop error string containing "rate limit" was classified as `rate_limited` and the user saw "You're being rate-limited. Please wait a moment and try again." — copy that implies the AI provider is throttling them and gives no hint about which thread is affected or how long to wait. When the real cause was the SecurityPolicy hourly action cap on built-in tools (web_fetch / curl / http_request), the message was misleading; users opened new threads, saw those "work" (because the new thread didn't trigger a tool-call storm), and concluded the original thread was "stuck". Fix: split the catch-all rate-limit branch in `classify_inference_error` into three sources and surface retry-after when available. Classifier (src/openhuman/channels/providers/web.rs) - `action_budget_exceeded` (new): catches the SecurityPolicy strings emitted by the built-in tools (`Rate limit exceeded: action budget exhausted`, `Rate limit exceeded: too many actions in the last hour`, `Action blocked: rate limit exceeded`). User-facing copy: *"You've hit OpenHuman's per-hour action budget — this is a local safety cap, not your AI provider. The window decays gradually; you can keep chatting in this thread …"*. Ordered BEFORE the provider-429 branch so its substring `rate limit` no longer leaks into the wrong bucket. - `max_iterations` (new): catches the canonical agent-loop cap string ("Agent exceeded maximum tool iterations") via the existing `is_max_iterations_error` predicate so the user sees: *"The agent ran the maximum number of tool steps for one turn … You can retry the same question in this thread once the underlying limit clears."* — previously this fell through to the opaque generic `inference` bucket. - `rate_limited` (kept, enriched): now extracts `Retry-After` / `retry_after` seconds from the error body and appends a concrete hint to the user message ("Try again in 30 seconds" / "Try again in about 3 minutes" for windows ≥90s / "You can retry immediately" for 0). Fractional values round up so we never under-promise. Copy also now states the limit is upstream and that retrying in the same thread is fine. Tests (src/openhuman/channels/providers/web_tests.rs) - `classify_inference_error_distinguishes_action_budget_from_provider_429` — all three SecurityPolicy strings classify as `action_budget_exceeded` and the copy says "local safety cap" + "can keep chatting in this thread". - `classify_inference_error_max_iterations_gets_dedicated_branch` — the flattened web-channel error wrapper resolves to `max_iterations` with the same-thread recovery hint. - `classify_inference_error_rate_limited_surfaces_retry_after_seconds` — 30-second retry-after appears verbatim and the thread-recovery hint is present. - `classify_inference_error_rate_limited_no_retry_after_omits_hint` — 429 without Retry-After does NOT hallucinate a window. - `classify_inference_error_rate_limited_handles_fractional_and_minute_windows` — 2.4s rounds to 3s; 180s renders as "about 3 minutes". `cargo test --lib classify_inference_error` → 7 passed, 0 failed. `cargo check` clean. `cargo fmt` applied. What's intentionally out of scope - A real per-thread rate-limit Redux state: the trace shows none exists; `inferenceStatus` / `lifecycle` / `activeThread` all clear on `chat_error` in `ChatRuntimeProvider.onError`, and the FE test suite already exercises this. - A countdown timer in the composer: separate UI work; this PR keeps the fix server-side so the existing FE forwarder picks it up automatically.
📝 WalkthroughWalkthroughThe web channel error classifier now detects OpenHuman action-budget exhaustion, agent max-iteration tool-step exhaustion, and provider rate limits. It adds helpers to parse/format Retry-After timing, reorders classification priority to surface budgets first, and updates rate-limited messages to include optional retry timing hints. ChangesError classification improvements with retry messaging
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/channels/providers/web.rs`:
- Around line 271-278: The retry hint currently floors minutes in the Some(n)
branch of the match on secs, producing "about 1 minutes" for 90–119s and
understating the wait; change the minutes calculation to round up (e.g., minutes
= (n + 59) / 60) in the Some(n) => branch and format the message using correct
singular/plural ("minute"/"minutes") when building the string for "Try again in
about {mins} minute(s).".
- Around line 243-263: The current prefix list in the retry-after parsing loop
misses quoted JSON keys like "\"retry_after\": 30"; update the prefix matching
in the loop (where variables lower and err are used and the num_str parsing
occurs) to also check for quoted key patterns (e.g. "\"retry_after\":",
"\"retry-after\":", "\"retry_after\" :", "\"retry-after\" :") or use a small
regex that matches optional surrounding quotes, optional whitespace, the key
(retry_after|retry-after) and the colon, then extract the same trailing numeric
substring and parse it as before so quoted JSON bodies return the retry hint.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3857d90f-f32f-4d2a-9bfa-1dd877069a70
📒 Files selected for processing (2)
src/openhuman/channels/providers/web.rssrc/openhuman/channels/providers/web_tests.rs
…oted JSON retry_after + round minutes up
Two CodeRabbit findings on src/openhuman/channels/providers/web.rs:
1. **Minor — quoted JSON retry_after.** A serialised provider body
like `{"retry_after": 30}` would miss every prefix because the
surrounding quote stopped `lower.find("retry_after:")` from
matching, and the user lost the retry hint the upstream
actually supplied. Normalise by stripping double quotes from
the lowercased scan buffer before searching for prefixes.
New test `classify_inference_error_rate_limited_parses_quoted_json_retry_after`.
2. **Minor — minute hint rounds up + uses singular/plural.** The
`Some(n)` arm used `n / 60` (integer floor) and a hard-coded
"minutes" suffix, so 90–119s rendered as "about 1 minutes" —
both grammatically wrong and an instruction to retry sooner
than the upstream allows. Round up via
`(n / 60) + u64::from(n % 60 != 0)` and pick singular vs
plural. New test
`classify_inference_error_rate_limited_minute_window_uses_singular_and_rounds_up`
pins 90s → "about 2 minutes" and 119s → "about 2 minutes".
`cargo test --lib classify_inference_error` → 9 passed, 0 failed
(7 pre-existing + 2 new).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/channels/providers/web.rs (1)
305-309: 💤 Low valueConsider case-sensitivity and pattern fragility.
The
is_action_budget_exhaustedfunction receives an already-lowercased string (err_lower) from the caller, but the function signature doesn't enforce or document this contract. If a caller passes a non-lowercased string, the match will silently fail.Additionally, these exact substring matches are brittle if the SecurityPolicy error messages ever change slightly (e.g., "Action budget exhausted" vs "action budget exhausted").
🔧 Optional: Add defensive lowercasing and doc comment
/// Detect the SecurityPolicy global hourly action-budget signal /// emitted by the built-in tools (`web_fetch`, `curl`, `http_request`, /// `polymarket`, `composio`, etc.) — see `src/openhuman/security/ /// policy.rs::SecurityPolicy::is_rate_limited`. +/// +/// # Arguments +/// * `err_lower` - The error string, expected to be pre-lowercased by the caller. /// /// We match the canonical English strings those tools emit. This is /// load-bearing for issue `#2364`: before this check ran, any string /// containing "rate limit" was misclassified as a provider 429 and /// the user saw the generic "You're being rate-limited" copy, which /// hides that the cap is OpenHuman's own per-hour safety budget, /// not the upstream LLM provider. fn is_action_budget_exhausted(err_lower: &str) -> bool { + // Defensive: ensure lowercase even if caller forgets + let err_lower = err_lower.to_ascii_lowercase(); err_lower.contains("rate limit exceeded: action budget exhausted")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/channels/providers/web.rs` around lines 305 - 309, The helper is_action_budget_exhausted currently assumes its input is lowercased and uses brittle exact substrings; change it to accept &str, perform defensive normalization inside (e.g., let s = err.to_lowercase();) and run the contains checks against s so callers need not lowercase; add a short doc comment above is_action_budget_exhausted indicating it normalizes the input and what error substrings it checks for, and consider keeping the three existing substrings but document they are lowercased for future maintainers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/channels/providers/web.rs`:
- Around line 305-309: The helper is_action_budget_exhausted currently assumes
its input is lowercased and uses brittle exact substrings; change it to accept
&str, perform defensive normalization inside (e.g., let s = err.to_lowercase();)
and run the contains checks against s so callers need not lowercase; add a short
doc comment above is_action_budget_exhausted indicating it normalizes the input
and what error substrings it checks for, and consider keeping the three existing
substrings but document they are lowercased for future maintainers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8ec2649e-61d8-49d3-90de-081fb75b091d
📒 Files selected for processing (2)
src/openhuman/channels/providers/web.rssrc/openhuman/channels/providers/web_tests.rs
Summary
classify_inference_errorinweb.rsused to tag any error string containing"rate limit"asrate_limited, including the SecurityPolicy hourly-cap strings emitted by built-in tools — so users saw "Your AI provider is rate-limiting you" copy for an OpenHuman-local safety cap.action_budget_exceeded,max_iterations,rate_limited) and surfaceRetry-Afterseconds in the user-facing message when the upstream supplies them.chat_error.messageforwarder inChatRuntimeProvider.Problem
Issue #2364 reports that one chat thread shows "You're being rate-limited" while a fresh thread works. Tracing the frontend (
ChatRuntimeProvider.onError) showed no per-thread sticky state —inferenceStatus,inferenceTurnLifecycle, andactiveThreadIdall clear onchat_error. The stickiness is perceived, driven by:Solution
Three new / reshaped branches in
classify_inference_error:action_budget_exceededmax_iterationsrate_limited(enriched)Retry-After:/retry_after:header bodies, fractional values rounded up.The SecurityPolicy and max-iterations checks run BEFORE the provider-429 branch so their substring
rate limitno longer leaks into the wrong bucket. Existingrate_limitedcallers (FE handler, tests) continue to receive the sameerror_typetoken; only the human-readablemessagechanged.Submission Checklist
cargo test --lib classify_inference_error→ 7 passed.## Related— N/A: no matrix row touched.Closes #NNNin the## Relatedsection — see below.Impact
channels::providers::web::classify_inference_error).error_typetokens are unchanged for upstream consumers; only themessagebody is reworded. FE error forwarder picks up the new copy automatically.Tests
cargo test --lib classify_inference_error→ 7 passed, 0 failed (5 new, 2 pre-existing).classify_inference_error_distinguishes_action_budget_from_provider_429action_budget_exceeded; copy says "local safety cap" + "can keep chatting in this thread"classify_inference_error_max_iterations_gets_dedicated_branchmax_iterationswith the same-thread recovery hintclassify_inference_error_rate_limited_surfaces_retry_after_secondsclassify_inference_error_rate_limited_no_retry_after_omits_hintclassify_inference_error_rate_limited_handles_fractional_and_minute_windowsRelated
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/2364-rate-limit-classification(branched fromorigin/mainafter fresh fetch)Validation Run
pnpm --filter openhuman-app format:check— N/A: no frontend changes.pnpm typecheck— N/A: no TypeScript changes.cargo test --lib classify_inference_error→ 7 passed, 0 failed.cargo fmt --manifest-path Cargo.tomlapplied;cargo checkclean.Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
error_typetokens are unchanged for the existingrate_limitedconsumers; onlymessagebody changed. Generic-inferencefallback unchanged. Order of remaining branches (timeout / auth / budget / provider / model_unavailable / config-rejection) is unchanged.rate limitno longer leaks into the wrong bucket — locked in by_distinguishes_action_budget_from_provider_429and_max_iterations_gets_dedicated_branch.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Bug Fixes
Tests