Skip to content

fix(inference): actionable error for completion-only model 404 + unmistakable run_code failure (#3193)#3211

Merged
senamakel merged 3 commits into
tinyhumansai:mainfrom
oxoxDev:fix/3193-run-code-completion-only-404
Jun 2, 2026
Merged

fix(inference): actionable error for completion-only model 404 + unmistakable run_code failure (#3193)#3211
senamakel merged 3 commits into
tinyhumansai:mainfrom
oxoxDev:fix/3193-run-code-completion-only-404

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Jun 2, 2026

Summary

  • run_code (and any chat call) against a completion-only / base model now fails fast with an actionable error that names the model and the fix, instead of an opaque chat completions unavailable; responses fallback failed chain.
  • A subagent-delegation failure is now wrapped in an unmistakable failure envelope so a weak orchestrator can't narrate a fabricated success ("wrote the file") when nothing actually ran.
  • Completion-only detection is shared across all three chat entrypoints (chat_with_system, chat_with_history, native tool-calling chat).

Problem

Reported in #3193 (openhuman-core 0.56.0, Win11, agentic_provider = "openhuman"): every run_code call returns

openai API error (404 Not Found): "This is not a chat model and thus not supported in the v1/chat/completions endpoint. Did you mean to use v1/completions?" (chat completions unavailable; responses fallback failed)

Root cause: OpenHuman only speaks the chat-completions API (with an optional /v1/responses fallback) — there is no /v1/completions path. When the model bound to the coding role (run_codecode_executor, model hint coding) is a completion-only model, /v1/chat/completions 404s and the responses fallback can't rescue it. The surfaced error was opaque and gave no remediation. The reporter also observed the orchestrator presenting the failure as success and fabricating output — the bare error text alone didn't stop a weak model from narrating a fake result.

Solution

  • compatible.rs: add is_completion_only_model_404 (tight match on the OpenAI signature — ordinary "model does not exist" 404s are intentionally not matched) and completion_only_model_message (names the model + "assign a chat-capable model"). A shared completion_only_404_guard is invoked at all three chat 404 handlers and short-circuits before the doomed /v1/responses fallback.
  • dispatch.rs: route subagent-failure results through format_subagent_failure, which states the task did not run and instructs the model not to report success or fabricate output, while preserving the underlying error.
  • Deliberately not adding a legacy /v1/completions transport — the real cause is a misconfigured model, so the fix is an actionable diagnostic, not a new (deprecated) endpoint surface.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — Coverage Gate (diff-cover ≥ 80%) passes in CI; focused Rust tests (completion_only*, subagent_failure_envelope) pass locally.
  • Coverage matrix updated — N/A: behaviour-only change (error-handling/diagnostic; no feature row added/removed/renamed)
  • No new external network dependencies introduced (wiremock mock server used for the 404 path)
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: no release-cut surface touched
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Desktop/CLI (Rust core). No schema, migration, or protocol change.
  • Behaviour change is limited to error paths: a completion-only model now yields a clear, fixable message and stops before a futile fallback; failed delegations are reported as failures instead of being mistakable for success. Successful calls are unaffected.

Related

  • Closes: Cannot write code on filesystem #3193
  • Follow-up PR(s)/TODOs: deferred sub-symptoms from Cannot write code on filesystem #3193 needing the reporter's transcript — delegate_tools_agent "400 Insufficient budget", spawn_parallel_agents non-determinism, and whether parallel_tools is honored. Confirmation of a literal success:true flag bug (vs. orchestrator narration) also pending the transcript.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/3193-run-code-completion-only-404
  • Commit SHA: d555ae48472ce694570791ccbe7d99b742f9a3c4

Validation Run

  • N/A: no app/ changes — pnpm --filter openhuman-app format:check
  • N/A: no app/ changes — pnpm typecheck
  • Focused tests: cargo test --lib completion_only (6 passed) · cargo test --lib subagent_failure_envelope (1 passed)
  • Rust fmt/check (if changed): cargo fmt applied; cargo check --lib clean
  • N/A: no Tauri-shell changes — Tauri fmt/check

Validation Blocked

  • command: pnpm test:coverage / pnpm test:rust (full suites)
  • error: not run locally (heavy; full Rust test matrix OOMs locally)
  • impact: diff-coverage verified by CI gate (passing)

Behavior Changes

  • Intended behavior change: completion-only-model 404 → actionable fail-fast error; failed subagent delegation → unmistakable failure result.
  • User-visible effect: clear "assign a chat-capable model" guidance instead of an opaque fallback chain; agent no longer reports fabricated success after a hard failure.

Parity Contract

  • Legacy behavior preserved: ordinary 404s keep their existing fallback/enrich path; non-404 errors still flow through api_error (Sentry/classification) unchanged; successful responses unchanged.
  • Guard/fallback/dispatch parity checks: completion-only guard added uniformly to all three chat entrypoints; /v1/responses fallback still runs for non-completion-only 404s.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none found (no open PR touches run_code / completion endpoint)
  • Canonical PR: this PR
  • Resolution: N/A

oxoxDev added 3 commits June 2, 2026 17:25
…umansai#3193)

A completion-only/base model assigned to a chat role 404s on
/v1/chat/completions ('This is not a chat model … did you mean
v1/completions?') and the /v1/responses fallback cannot rescue it,
leaving an opaque 'responses fallback failed' chain. Detect the
signature and fail fast with a message naming the model and the
remediation (assign a chat-capable model), skipping the doomed fallback.
…ai#3193)

On a hard delegation failure (e.g. run_code's coding model 404ing) the
bare error text let a weak orchestrator narrate a plausible success and
fabricate output. Wrap failures in an envelope that states the task did
not run and forbids reporting success, while preserving the root error.
tinyhumansai#3193)

Extract the completion-only detection into completion_only_404_guard and
apply it in all three chat entrypoints (chat_with_system, chat_with_history,
native chat) so a completion-only model fails fast on every path. Add a
wiremock test proving the guard pre-empts the /v1/responses fallback end to
end, plus a unit test for the guard's match/no-match branches.
@oxoxDev oxoxDev requested a review from a team June 2, 2026 12:03
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Two independent error-handling improvements: (1) subagent failures now return a standardized envelope stating the task did not complete and discouraging output fabrication, and (2) OpenAI-compatible chat endpoints detect completion-only models on 404 and return actionable guidance instead of silent fallback.

Changes

Subagent Failure Envelope

Layer / File(s) Summary
Standardized failure message and validation
src/openhuman/agent_orchestration/tools/dispatch.rs
Error-handling path replaces simple error string with format_subagent_failure() helper, which builds an anti-fabrication envelope. Unit test validates required wording (no completion, anti-fabrication language) and preservation of tool name and root error.

Completion-Only Model 404 Detection

Layer / File(s) Summary
Completion-only model detection predicates and guard
src/openhuman/inference/provider/compatible.rs
New private helpers provide is_completion_only_model_404 and completion_only_404_guard that recognize the specific 404 pattern from OpenAI-compatible providers indicating a completion-only model, and generate actionable error messages with remediation guidance.
Guard integration into chat paths
src/openhuman/inference/provider/compatible.rs
chat_with_system, chat_with_history, and chat paths now apply the completion-only guard after sanitizing 404 error bodies: on match, they early-return the actionable error before any /v1/responses fallback.
Completion-only detection test coverage
src/openhuman/inference/provider/compatible_tests.rs
Unit tests validate signature detection (true positives, false negatives, status-code gating) and message construction. Guard behavior test confirms the guard fires only on exact match. End-to-end test verifies the guard short-circuits and prevents responses fallback.

Sequence Diagram

flowchart TD
    A["404 from chat/completions"] --> B["Sanitize error body"]
    B --> C["Apply completion_only_404_guard"]
    C --> D{Guard matches<br/>completion-only<br/>signature?}
    D -->|Yes| E["Return actionable error<br/>with model name &amp;<br/>remediation hint"]
    D -->|No| F{"Responses fallback<br/>enabled?"}
    F -->|Yes| G["Attempt /v1/responses<br/>fallback"]
    F -->|No| H["Return enriched<br/>generic 404 error"]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2214: Modifies compatible.rs 404 error handling to influence /v1/responses fallback logic and updates compatible_tests.rs with overlapping test scope.
  • tinyhumansai/openhuman#2814: Modifies compatible.rs non-streaming chat_completions error handling in the same control-flow region (main PR adds 404 completion-only guard, retrieved PR adds SessionExpired handling).

Suggested labels

rust-core, agent, bug, working

Suggested reviewers

  • graycyrus
  • senamakel

Poem

🐰 A subagent stumbled, but now truth will ring—
No more silent success when nothing took wing!
And models that whisper "I'm completion, not chat"
Get caught with a guard: "Not today, friend, flat fact!"
Clarity blooms where confusion once grew. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the two main changes: actionable error for completion-only model 404 and unmistakable run_code failure, directly matching the code changes.
Linked Issues check ✅ Passed The PR addresses core coding requirements from #3193: detecting completion-only model 404 errors and preventing silent success fabrication via failure envelope in subagent delegation.
Out of Scope Changes check ✅ Passed All changes are scoped to addressing #3193: dispatch.rs adds failure envelope, compatible.rs adds 404 detection, and tests validate both features—no unrelated alterations present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. working A PR that is being worked on by the team. bug labels Jun 2, 2026
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline findings posted on the changed lines. Overall: the core fix is correct, tightly scoped, and well-tested. Main follow-ups concern adjacent code paths (chat_with_tools, streaming) that share the same 404 surface but don't currently invoke the new guard.

/// Format a subagent-delegation failure so the orchestrator cannot mistake it
/// for success. Kept as a standalone, side-effect-free fn so the exact wording
/// is unit-testable without standing up a registry + failing model (#3193).
fn format_subagent_failure(tool_name: &str, message: &str) -> String {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-band prompt-engineering preamble now applied to every subagent failure.

format_subagent_failure is invoked on the generic run_subagent error path — so timeouts, budget exhaustion, transport errors, and any other transient failure now also carry the ~200-char "do NOT treat this as success or fabricate an output" preamble in transcripts, logs, and observability. That's intentional given #3193's "hallucinated success" symptom, but it's a behavior change broader than the issue title suggests ("completion-only 404"). Worth either:

  • calling out in the PR body's Behavior Changes section that the envelope is applied to all subagent failures, or
  • gating the envelope to a narrower class of "hard, non-retryable" failures so retry/budget cases keep their shorter error string.

Not blocking — flagging so it's a deliberate decision.

}
let lower = error.to_lowercase();
lower.contains("not a chat model")
|| (lower.contains("v1/chat/completions") && lower.contains("v1/completions"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second disjunct is correct today but fragile to OpenAI rewording.

The string "v1/chat/completions" does not contain "v1/completions" as a continuous substring (the /chat/ infix breaks it), so the two contains calls really are independent matches — but that's only obvious if you stop and parse it. The whole clause currently fires only because OpenAI's body happens to mention both endpoint paths ("...the v1/chat/completions endpoint. Did you mean to use v1/completions?"). If OpenAI ever drops the second URL from the wording, this branch silently stops matching and we fall back to the opaque path.

Suggest a one-line comment on line 853 making the "two independent substrings" intent explicit, e.g.:

// Defensive fallback: OpenAI's current phrasing references BOTH endpoint paths
// as two separate substrings (`v1/chat/completions` does not contain `v1/completions`).

.map_err(|responses_err| {
let fb = super::format_anyhow_chain(&responses_err);
anyhow::anyhow!(
"{} API error ({status}): {sanitized} (chat completions unavailable; responses fallback failed: {fb})",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undocumented behavior change on the non-completion-only fallback path.

Before this PR, chat_with_history's responses-fallback failure read:

{name} API error (chat completions unavailable; responses fallback failed: {fb})

After this PR it reads:

{name} API error ({status}): {sanitized} (chat completions unavailable; responses fallback failed: {fb})

That's a strict improvement (now includes the original 404 body and status), but it's a behavior change to a path unrelated to completion-only detection — any ordinary 404 that triggers the responses fallback will now have a different error string. The PR body's Behavior Changes section only lists the completion-only and subagent-envelope changes, not this. Suggest adding a one-liner there so downstream log parsers / Sentry classifiers aren't surprised.

(Cross-reference: the same widening was applied at the native-tool-calling chat 404 path — consistent with this one, which is good.)


// A completion-only model 404s here and the /v1/responses fallback
// cannot rescue it — fail fast with actionable guidance (#3193).
if let Some(err) = self.completion_only_404_guard(status, &sanitized, model) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjacent code paths share the same 404 surface but skip the new guard.

The PR body claims "completion-only detection is shared across all three chat entrypoints", but two more entrypoints on this same provider can hit the same OpenAI completion-only 404:

  1. chat_with_tools (line ~1697)if !response.status().is_success() { return Err(super::api_error(&self.name, response).await); }. No guard, no responses fallback. If a run_code-style flow ever passes through the native tool-calling non-streaming path (and tools is non-empty), the user gets the opaque "not a chat model" 404 again. Transport-error path does fall back to chat_with_history (which has the guard), but the 404-status path does not.

  2. Streaming pathsstream_chat_with_system (~line 2138) and stream_chat_with_history (~line 2298). Both already sanitize the error body, so adding a completion_only_404_guard branch is a one-liner that matches the pattern used here.

Proposal: either replicate the 3-line guard at all four additional sites (preferred — they're 1-liners and the helper is already shared), or document in the PR body why streaming + chat_with_tools are intentionally excluded. Otherwise #3193 can re-emerge through any of these paths.

@senamakel senamakel merged commit 1113965 into tinyhumansai:main Jun 2, 2026
26 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot write code on filesystem

3 participants