fix(inference): actionable error for completion-only model 404 + unmistakable run_code failure (#3193)#3211
Conversation
…umansai#3193) A completion-only/base model assigned to a chat role 404s on /v1/chat/completions ('This is not a chat model … did you mean v1/completions?') and the /v1/responses fallback cannot rescue it, leaving an opaque 'responses fallback failed' chain. Detect the signature and fail fast with a message naming the model and the remediation (assign a chat-capable model), skipping the doomed fallback.
…ai#3193) On a hard delegation failure (e.g. run_code's coding model 404ing) the bare error text let a weak orchestrator narrate a plausible success and fabricate output. Wrap failures in an envelope that states the task did not run and forbids reporting success, while preserving the root error.
tinyhumansai#3193) Extract the completion-only detection into completion_only_404_guard and apply it in all three chat entrypoints (chat_with_system, chat_with_history, native chat) so a completion-only model fails fast on every path. Add a wiremock test proving the guard pre-empts the /v1/responses fallback end to end, plus a unit test for the guard's match/no-match branches.
📝 WalkthroughWalkthroughTwo independent error-handling improvements: (1) subagent failures now return a standardized envelope stating the task did not complete and discouraging output fabrication, and (2) OpenAI-compatible chat endpoints detect completion-only models on 404 and return actionable guidance instead of silent fallback. ChangesSubagent Failure Envelope
Completion-Only Model 404 Detection
Sequence Diagramflowchart TD
A["404 from chat/completions"] --> B["Sanitize error body"]
B --> C["Apply completion_only_404_guard"]
C --> D{Guard matches<br/>completion-only<br/>signature?}
D -->|Yes| E["Return actionable error<br/>with model name &<br/>remediation hint"]
D -->|No| F{"Responses fallback<br/>enabled?"}
F -->|Yes| G["Attempt /v1/responses<br/>fallback"]
F -->|No| H["Return enriched<br/>generic 404 error"]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
CodeGhost21
left a comment
There was a problem hiding this comment.
Inline findings posted on the changed lines. Overall: the core fix is correct, tightly scoped, and well-tested. Main follow-ups concern adjacent code paths (chat_with_tools, streaming) that share the same 404 surface but don't currently invoke the new guard.
| /// Format a subagent-delegation failure so the orchestrator cannot mistake it | ||
| /// for success. Kept as a standalone, side-effect-free fn so the exact wording | ||
| /// is unit-testable without standing up a registry + failing model (#3193). | ||
| fn format_subagent_failure(tool_name: &str, message: &str) -> String { |
There was a problem hiding this comment.
In-band prompt-engineering preamble now applied to every subagent failure.
format_subagent_failure is invoked on the generic run_subagent error path — so timeouts, budget exhaustion, transport errors, and any other transient failure now also carry the ~200-char "do NOT treat this as success or fabricate an output" preamble in transcripts, logs, and observability. That's intentional given #3193's "hallucinated success" symptom, but it's a behavior change broader than the issue title suggests ("completion-only 404"). Worth either:
- calling out in the PR body's Behavior Changes section that the envelope is applied to all subagent failures, or
- gating the envelope to a narrower class of "hard, non-retryable" failures so retry/budget cases keep their shorter error string.
Not blocking — flagging so it's a deliberate decision.
| } | ||
| let lower = error.to_lowercase(); | ||
| lower.contains("not a chat model") | ||
| || (lower.contains("v1/chat/completions") && lower.contains("v1/completions")) |
There was a problem hiding this comment.
Second disjunct is correct today but fragile to OpenAI rewording.
The string "v1/chat/completions" does not contain "v1/completions" as a continuous substring (the /chat/ infix breaks it), so the two contains calls really are independent matches — but that's only obvious if you stop and parse it. The whole clause currently fires only because OpenAI's body happens to mention both endpoint paths ("...the v1/chat/completions endpoint. Did you mean to use v1/completions?"). If OpenAI ever drops the second URL from the wording, this branch silently stops matching and we fall back to the opaque path.
Suggest a one-line comment on line 853 making the "two independent substrings" intent explicit, e.g.:
// Defensive fallback: OpenAI's current phrasing references BOTH endpoint paths
// as two separate substrings (`v1/chat/completions` does not contain `v1/completions`).| .map_err(|responses_err| { | ||
| let fb = super::format_anyhow_chain(&responses_err); | ||
| anyhow::anyhow!( | ||
| "{} API error ({status}): {sanitized} (chat completions unavailable; responses fallback failed: {fb})", |
There was a problem hiding this comment.
Undocumented behavior change on the non-completion-only fallback path.
Before this PR, chat_with_history's responses-fallback failure read:
{name} API error (chat completions unavailable; responses fallback failed: {fb})
After this PR it reads:
{name} API error ({status}): {sanitized} (chat completions unavailable; responses fallback failed: {fb})
That's a strict improvement (now includes the original 404 body and status), but it's a behavior change to a path unrelated to completion-only detection — any ordinary 404 that triggers the responses fallback will now have a different error string. The PR body's Behavior Changes section only lists the completion-only and subagent-envelope changes, not this. Suggest adding a one-liner there so downstream log parsers / Sentry classifiers aren't surprised.
(Cross-reference: the same widening was applied at the native-tool-calling chat 404 path — consistent with this one, which is good.)
|
|
||
| // A completion-only model 404s here and the /v1/responses fallback | ||
| // cannot rescue it — fail fast with actionable guidance (#3193). | ||
| if let Some(err) = self.completion_only_404_guard(status, &sanitized, model) { |
There was a problem hiding this comment.
Adjacent code paths share the same 404 surface but skip the new guard.
The PR body claims "completion-only detection is shared across all three chat entrypoints", but two more entrypoints on this same provider can hit the same OpenAI completion-only 404:
-
chat_with_tools(line ~1697) —if !response.status().is_success() { return Err(super::api_error(&self.name, response).await); }. No guard, no responses fallback. If arun_code-style flow ever passes through the native tool-calling non-streaming path (and tools is non-empty), the user gets the opaque"not a chat model"404 again. Transport-error path does fall back tochat_with_history(which has the guard), but the 404-status path does not. -
Streaming paths —
stream_chat_with_system(~line 2138) andstream_chat_with_history(~line 2298). Both already sanitize the error body, so adding acompletion_only_404_guardbranch is a one-liner that matches the pattern used here.
Proposal: either replicate the 3-line guard at all four additional sites (preferred — they're 1-liners and the helper is already shared), or document in the PR body why streaming + chat_with_tools are intentionally excluded. Otherwise #3193 can re-emerge through any of these paths.
Summary
run_code(and any chat call) against a completion-only / base model now fails fast with an actionable error that names the model and the fix, instead of an opaquechat completions unavailable; responses fallback failedchain.chat_with_system,chat_with_history, native tool-callingchat).Problem
Reported in #3193 (openhuman-core 0.56.0, Win11,
agentic_provider = "openhuman"): everyrun_codecall returnsRoot cause: OpenHuman only speaks the chat-completions API (with an optional
/v1/responsesfallback) — there is no/v1/completionspath. When the model bound to the coding role (run_code→code_executor, model hintcoding) is a completion-only model,/v1/chat/completions404s and the responses fallback can't rescue it. The surfaced error was opaque and gave no remediation. The reporter also observed the orchestrator presenting the failure as success and fabricating output — the bare error text alone didn't stop a weak model from narrating a fake result.Solution
compatible.rs: addis_completion_only_model_404(tight match on the OpenAI signature — ordinary "model does not exist" 404s are intentionally not matched) andcompletion_only_model_message(names the model + "assign a chat-capable model"). A sharedcompletion_only_404_guardis invoked at all three chat 404 handlers and short-circuits before the doomed/v1/responsesfallback.dispatch.rs: route subagent-failure results throughformat_subagent_failure, which states the task did not run and instructs the model not to report success or fabricate output, while preserving the underlying error./v1/completionstransport — the real cause is a misconfigured model, so the fix is an actionable diagnostic, not a new (deprecated) endpoint surface.Submission Checklist
completion_only*,subagent_failure_envelope) pass locally.N/A: behaviour-only change(error-handling/diagnostic; no feature row added/removed/renamed)N/A: no release-cut surface touchedCloses #NNNin the## RelatedsectionImpact
Related
delegate_tools_agent"400 Insufficient budget",spawn_parallel_agentsnon-determinism, and whetherparallel_toolsis honored. Confirmation of a literalsuccess:trueflag bug (vs. orchestrator narration) also pending the transcript.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/3193-run-code-completion-only-404d555ae48472ce694570791ccbe7d99b742f9a3c4Validation Run
app/changes —pnpm --filter openhuman-app format:checkapp/changes —pnpm typecheckcargo test --lib completion_only(6 passed) ·cargo test --lib subagent_failure_envelope(1 passed)cargo fmtapplied;cargo check --libcleanValidation Blocked
command:pnpm test:coverage/pnpm test:rust(full suites)error:not run locally (heavy; full Rust test matrix OOMs locally)impact:diff-coverage verified by CI gate (passing)Behavior Changes
Parity Contract
api_error(Sentry/classification) unchanged; successful responses unchanged./v1/responsesfallback still runs for non-completion-only 404s.Duplicate / Superseded PR Handling
run_code/ completion endpoint)