fix(inference,ai-settings): prevent + handle ollama embedding-model-as-chat (Sentry TAURI-RUST-4P6) (#3359)#3360
Conversation
…er-state (Sentry TAURI-RUST-4P6) (tinyhumansai#3359) An embedding model (bge-m3) picked as the Ollama chat model is rejected with a 400 'does not support chat' on every turn. The 400 bypasses the 404-only completion_only guard and the classifier had no matching phrase, so the raw body re-reported each turn — 36.6k events / 2 users. Add 'does not support chat' to is_provider_config_rejection_message so the event is demoted error->info, same treatment as the sibling 'does not support tools' (TAURI-RUST-35). Tests cover the verbatim Sentry body and the enriched message shape.
…ntry TAURI-RUST-4P6) (tinyhumansai#3359) Ollama 400s an embedding-model-as-chat with 'does not support chat'. Unlike the completion-only base-model case (404), this is a 400/422, so add a not_chat_capable_guard that fires on those statuses and rewrites the opaque upstream JSON into 'model <m> does not support chat — assign a chat-capable model in Settings -> AI'. The message preserves the phrase so it stays demoted by the config-rejection classifier on re-report. Wired into all chat-completions error paths (chat_with_system, chat_with_tools, and the api_error-delegated chat_with_history non-404 branch). 6 unit + wire tests.
📝 WalkthroughWalkthroughAdds fast-fail detection and actionable remediation for OpenAI-compatible backends (Ollama) rejecting embedding models used as chat models. Detects HTTP 400/422 "does not support chat" errors, replaces opaque JSON with actionable configuration guidance, wires guards into three chat request paths, and extends error classification to prevent Sentry re-reporting. ChangesEmbedding Model as Chat Model — Handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…nyhumansai#3359) Parse the /api/show 'capabilities' list (already deserialized but unused) and classify each model chat-capable / embedding-only / unknown via ollama_chat_capability. Surface it as 'chat_capable' on each diagnostics installed_models entry, fetched in the same /api/show round-trip that already resolves the context window. Fail-open: unknown (empty/unrecognised capabilities) stays chat-capable. Prereq for hiding embedding-only models from the chat picker (Sentry TAURI-RUST-4P6).
…inyhumansai#3359) Filter models the core flagged chat_capable=false out of the local-model pickers (CustomRoutingDialog + GlobalOwnModelSelector) via the new isChatSelectableLocalModel helper in useInstalledModels. Prevents the root-cause misconfig behind TAURI-RUST-4P6: picking an embedding model (bge-m3) as the chat model 400s every turn. Unknown capability stays visible (fail-open); embedding selection is a separate panel, unaffected. 4 helper tests added.
…per (tinyhumansai#3359) Extract the embedding-only filter + picker-shape map out of the useInstalledModels hook into a pure toSelectableChatModels helper in aiRouting.ts, with unit tests. The hook's map() lines were the only changed frontend lines uncovered by Vitest (AIPanel.tsx 484,486-487), dropping diff coverage to 50% and failing the >=80% Coverage Gate.
M3gA-Mind
left a comment
There was a problem hiding this comment.
Reviewed locally — well-structured, correctly layered fix (prevention + Sentry demotion + actionable error), with consistent fail-open behavior end to end and thorough tests. Approving in spirit; a few inline non-blocking notes below.
| ) { | ||
| return false; | ||
| } | ||
| error.to_lowercase().contains("does not support chat") |
There was a problem hiding this comment.
Non-blocking: this does a raw substring match (does not support chat) over the whole sanitized body. Given the verbatim Ollama wire shape this is near-impossible to false-positive in practice, and the 4xx gate above + the not_chat_capable_ignores_unrelated_400 test guard it well. The only theoretical snag is an unrelated 400 whose body happens to contain that phrase in a different field (e.g. an error about some other feature not supporting "chat history"). No change needed — flagging for completeness.
| // in `compatible.rs` rewrites the opaque upstream JSON into an | ||
| // actionable "assign a chat-capable model" message that still carries | ||
| // this substring, so it stays demoted. | ||
| "does not support chat", |
There was a problem hiding this comment.
Worth making the coupling load-bearing in a comment here too (it already is in compatible.rs): this anchor phrase is what keeps the rewritten actionable message demoted. If anyone ever softens the rewrite in not_chat_capable_model_message and drops the phrase, the 36.6k stream reopens. The TAURI-RUST-4P6-enriched classifier test covers it — good — but a one-liner here pointing at that dependency would help the next editor.
| /// Callers treat `None` as "keep visible" — fail-open, never hide a model | ||
| /// that might be usable for chat. Mirrors the non-rejecting `Unknown` arm of | ||
| /// [`super::model_requirements::ContextEligibility`]. See Sentry TAURI-RUST-4P6. | ||
| pub(crate) fn ollama_chat_capability(capabilities: &[String]) -> Option<bool> { |
There was a problem hiding this comment.
Nice — conservative in the right direction: only Some(false) when confident it's embedding-only, None everywhere ambiguous, and completion/chat wins over an embedding tag. The case/whitespace tolerance and the ["insert"]-only → None case are both covered by the matrix test. Mirrors the existing Unknown eligibility arm; consistent with the codebase's fail-open posture.
M3gA-Mind
left a comment
There was a problem hiding this comment.
Approving. Correct root-cause fix with well-reasoned defense-in-depth (prevention + Sentry demotion + actionable error), fail-open behavior that's consistent across Rust/TS/picker layers, exemplary documentation of the cross-module anchor-phrase coupling, and thorough tests (classifier matrix, guard signature, wiremock E2E, classifier-parity for both raw and enriched messages). The inline notes are non-blocking. Nice work.
Summary
bge-m3:latest) as their Ollama chat model → Ollama 400s"does not support chat"every turn → 36.6k Sentry events / 2 users on0.57.13(TAURI-RUST-4P6)./api/showcapabilities(already fetched for the context-window gate — no extra round-trip). Fail-open on unknown capability.Problem
bge-m3:latestis OpenHuman's default memory-tree embedding model. Nothing stopped a user selecting it as their Ollama chat model, after which every chat turn failed:Two failures compounded it:
completion_only_404_guard, the classifier had no matching phrase (so it re-reported every retry → 36.6k events), and the user saw raw upstream JSON instead of remediation.Deterministic user-state (capability mismatch), not a server bug — same family as TAURI-RUST-35 (
does not support tools).Solution
Layer 1 — prevention (root cause):
/api/showcapabilitieslist (previously deserialized but unused) and classify each model chat-capable / embedding-only / unknown (ollama_chat_capability,ollama.rs). Surfacechat_capableon each diagnosticsinstalled_modelsentry, fetched in the same/api/showround-trip that already resolves the context window.chat_capable === falseout of the local-model pickers (CustomRoutingDialog+GlobalOwnModelSelector) viaisChatSelectableLocalModel. Unknown (null) stays visible — fail-open. The embedding model is configured in a separate panel, so embedding selection is unaffected.Layer 2 — Sentry demotion: add
"does not support chat"tois_provider_config_rejection_message(config_rejection.rs). Demotes error→info via both the inline guard sites andapi_error. 36.6k → 0.Layer 3 — actionable error:
not_chat_capable_guard(400/422) rewrites the opaque JSON intomodel '<m>' does not support chat — assign a chat-capable model in Settings → AI, preserving the phrase so it stays demoted. Wired into all chat-completions error paths.Submission Checklist
compatibleguard/wire tests +ollama_chat_capabilitymatrix; Frontend: 4isChatSelectableLocalModelcases (false / true / unknown-fail-open / list-filter)toSelectableChatModelshelper with unit tests so the changed FE lines are covered (Coverage Gate)N/A: behaviour-only change(no new feature row)N/A: none touchedwiremock; capability uses the existing/api/showcallN/A: error-classification + picker filter, no release-cut surfaceCloses #NNNin the## RelatedsectionImpact
/api/showround-trip — no extra requests./api/showmiss →chat_capableunknown → model stays selectable, so no usable model is ever hidden.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/4p6-ollama-chat-capability-rejectValidation Run
pnpm --filter openhuman-app format:check— prettier + rust fmt clean (verified locally)pnpm typecheck— cleancargo test --lib openhuman::inference::provider::{compatible,config_rejection}(151 + 11 ok),...::local::ollama(44 ok);vitest run aiRouting.test.ts(10 ok)cargo fmt+cargo checkclean;cargo clippy --libclean on changed filespnpm rust:checkFinished clean (no shell change; verified post-submodule-init)Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
not_chat_capable_ignores_unrelated_400+not_chat_capable_requires_4xx_status;chat_capability_classifies_*covers completion/chat/embedding/unknown;api_errorSessionExpired/classification onchat_with_historypreserved (message upgraded post-hoc)Duplicate / Superseded PR Handling
does not support chat/bge-m3/4P6/config_rejection)