Skip to content

fix(observability): classify Kimi access_terminated 403 as expected provider user-state#2090

Merged
senamakel merged 2 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-kimi-access-terminated-403-classification
May 20, 2026
Merged

fix(observability): classify Kimi access_terminated 403 as expected provider user-state#2090
senamakel merged 2 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-kimi-access-terminated-403-classification

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 18, 2026

Summary

  • Added a new provider error classifier for HTTP 403 access-policy denials (access_terminated_error) in the inference provider ops layer.
  • Wired this classifier into all OpenAI-compatible provider error paths (responses_api, streaming_chat, chat_completions, native_chat, stream_chat) to avoid noisy per-attempt Sentry error reports.
  • Extended observability expected-error classification to map this Kimi-specific 403 shape into ProviderUserState.
  • Added regression tests in both provider ops and observability to lock the behavior.
  • Preserved existing behavior for actionable 4xx/5xx errors; only the targeted policy-denied shape is demoted.

Problem

  • Sentry issue OPENHUMAN-TAURI-S7 (7486075476) was repeatedly triggered by custom_openai requests routed to kimi-for-coding.
  • The upstream provider returns HTTP 403 with access_terminated_error (“currently only available for Coding Agents”), which is a provider policy/user-access condition, not an OpenHuman runtime defect.
  • The previous behavior treated this as a regular error, generating high-noise unresolved events and masking actionable signals.

Solution

  • Introduced is_provider_access_policy_denied_http_403(status, body) in provider ops to detect the exact 403 access-policy pattern.
  • Added log_provider_access_policy_denied_http_403(...) and used it as a suppression branch before report_error(...) in all compatible provider HTTP failure handlers.
  • Updated observability classifier (is_provider_user_state_message) to detect access_terminated_error / “currently only available for coding agents”, ensuring upstream re-wrapped errors are also demoted consistently.
  • Added tests:
    • Provider ops: positive/negative coverage for 403 suppression matching.
    • Observability: classification coverage for direct and wrapped error strings.
  • Tradeoff: this does not “fix” provider access; it intentionally reclassifies known user/provider-state failures to reduce alert fatigue while preserving actionable remediation messaging.

Related

  • Sentry issue: OPENHUMAN-TAURI-S7 (7486075476)
  • Feature IDs: N/A — observability/error-classification hardening change (no feature-matrix row updated)
  • GitHub issue: N/A — no linked upstream issue number provided (Closes #NNN not included)

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
  • Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change (or N/A: behaviour-only change) — N/A: behaviour-only observability classification
  • All affected feature IDs from the matrix are listed in the PR description under ## RelatedN/A: no matrix feature IDs apply
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: does not touch release UI/runtime flows
  • Linked issue closed via Closes #NNN in the ## Related section — N/A: no GitHub issue number provided

Impact

  • Runtime/platform: Rust core observability/inference provider layers only; affects desktop runtime behavior for error reporting semantics.
  • Performance: negligible (simple string/status checks on existing error path).
  • Security: no credential or auth-scope expansion; reduces noisy logging path while retaining sanitized error handling.
  • Compatibility: no API contract changes; only reclassification of a known provider-policy 403 from error-level Sentry to expected user-state handling.

Related

Summary by CodeRabbit

  • Bug Fixes
    • Improved detection of provider access-policy denials (including access-terminated cases) so they are classified correctly.
    • Suppressed noisy error reporting for deterministic policy denials; these are logged at info level instead of being treated as system failures.
    • Enhanced error handling across multiple request paths to distinguish policy denials from genuine provider failures.

Review Change Stack

- Added a new function to check for provider access policy denials based on HTTP 403 responses, specifically for cases where requests are rejected for not being from approved coding agents.
- Introduced logging for these denials to avoid reporting them to Sentry.
- Updated tests to verify the correct classification of access-terminated errors as provider user state messages.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fc4770d6-5a84-4fa3-97ad-e9de77ec0539

📥 Commits

Reviewing files that changed from the base of the PR and between 9260a8a and d97ac97.

📒 Files selected for processing (1)
  • src/core/observability.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/core/observability.rs

📝 Walkthrough

Walkthrough

Detects a specific provider access-policy 403 payload, logs it as an info (not reported to Sentry), classifies it as an expected provider user-state error in observability, and wires this handling into five request/response paths.

Changes

Provider Access Policy Denial Handling

Layer / File(s) Summary
Observability classification for access-terminated provider errors
src/core/observability.rs
is_provider_user_state_message now matches access_terminated_error and "currently only available for coding agents" to classify these 403 bodies as ProviderUserState. Adds a unit test for direct and agent-wrapped messages.
Access-policy denial detection and logging in ops
src/openhuman/inference/provider/ops.rs
Adds is_provider_access_policy_denied_http_403 predicate, log_provider_access_policy_denied_http_403 helper, integrates the predicate into api_error with a new branch that logs (info) instead of reporting to Sentry, and adds unit tests for the predicate.
Integration across multiple API request paths
src/openhuman/inference/provider/compatible.rs
Five methods (Responses-API fallback, native streaming, chat_with_system, main native chat, stream_chat_with_system) each add a conditional branch that detects the access-policy 403 payload and calls the logging helper with the appropriate context tag before generic failure handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • graycyrus
  • senamakel

Poem

🐰 A rabbit nods to graceful 403s,
Info logs whisper where alarms once rose.
Coding-agent gates now named and mild,
Five paths tuned gentle, errors reconciled.
Sentry sleeps; the logs are kind and styled.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: classifying a specific provider HTTP 403 error pattern as an expected provider user-state condition. The change affects observability and error classification across three files with this as the unifying objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 18, 2026 11:42
@YellowSnnowmann YellowSnnowmann requested a review from a team May 18, 2026 11:42
@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/inference/provider/ops.rs (1)

317-326: ⚡ Quick win

Use debug/trace level for this diagnostic suppression log.

This branch is diagnostic observability flow; align it with repo logging-level policy.

Suggested change
-    tracing::info!(
+    tracing::debug!(
         domain = "llm_provider",
         operation = operation,
         provider = provider,
         model = model.unwrap_or(""),
         status = status.as_u16(),
         failure = "non_2xx",
         kind = "provider_access_policy",
         "[llm_provider] {operation} provider access-policy 403 — not reporting to Sentry"
     );

As per coding guidelines: src/**/*.rs: Use log or tracing crate at debug or trace level for Rust diagnostic logs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/inference/provider/ops.rs` around lines 317 - 326, The
observability log in ops.rs currently uses tracing::info! for the provider
access-policy 403 diagnostic; change it to a lower level (tracing::debug! or
tracing::trace!) per repo policy so this diagnostic suppression log is not at
info level—locate the tracing::info! call (the invocation that includes fields
domain="llm_provider", operation, provider, model.unwrap_or(""),
status.as_u16(), failure="non_2xx", kind="provider_access_policy") and replace
the macro with tracing::debug! (or tracing::trace!) while keeping the same
structured fields and message text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/inference/provider/ops.rs`:
- Around line 317-326: The observability log in ops.rs currently uses
tracing::info! for the provider access-policy 403 diagnostic; change it to a
lower level (tracing::debug! or tracing::trace!) per repo policy so this
diagnostic suppression log is not at info level—locate the tracing::info! call
(the invocation that includes fields domain="llm_provider", operation, provider,
model.unwrap_or(""), status.as_u16(), failure="non_2xx",
kind="provider_access_policy") and replace the macro with tracing::debug! (or
tracing::trace!) while keeping the same structured fields and message text.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1532b9dd-10de-4204-8f09-26ff4d3e2ef8

📥 Commits

Reviewing files that changed from the base of the PR and between 70fdedc and 9260a8a.

📒 Files selected for processing (3)
  • src/core/observability.rs
  • src/openhuman/inference/provider/compatible.rs
  • src/openhuman/inference/provider/ops.rs

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 18, 2026
@senamakel senamakel self-assigned this May 19, 2026
# Conflicts:
#	src/openhuman/inference/provider/ops.rs
model: Option<&str>,
status: reqwest::StatusCode,
) {
tracing::info!(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring this nitpick. The two sibling demotion-log helpers in this same file — log_budget_exhausted_http_400 and log_provider_config_rejection (added on main) — both use tracing::info!. Switching just this one to debug! would break the established pattern for "non-2xx user-state suppressed from Sentry" diagnostic logs. Happy to revisit in a follow-up that demotes all three together if repo policy prefers debug!.

@senamakel senamakel merged commit ff8d60c into tinyhumansai:main May 20, 2026
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants