Skip to content

fix(inference,ai-settings): prevent + handle ollama embedding-model-as-chat (Sentry TAURI-RUST-4P6) (#3359)#3360

Merged
M3gA-Mind merged 5 commits into
tinyhumansai:mainfrom
oxoxDev:fix/4p6-ollama-chat-capability-reject
Jun 4, 2026
Merged

fix(inference,ai-settings): prevent + handle ollama embedding-model-as-chat (Sentry TAURI-RUST-4P6) (#3359)#3360
M3gA-Mind merged 5 commits into
tinyhumansai:mainfrom
oxoxDev:fix/4p6-ollama-chat-capability-reject

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Jun 4, 2026

Summary

  • A user picked an embedding model (bge-m3:latest) as their Ollama chat model → Ollama 400s "does not support chat" every turn → 36.6k Sentry events / 2 users on 0.57.13 (TAURI-RUST-4P6).
  • Prevention (the real fix): hide embedding-only Ollama models from the chat/LLM model pickers so the misconfig can't be made.
  • Handling (defense-in-depth): demote the 400 from Sentry (classifier) + replace the opaque JSON with an actionable "assign a chat-capable model" message.
  • Capability comes from Ollama /api/show capabilities (already fetched for the context-window gate — no extra round-trip). Fail-open on unknown capability.

Problem

bge-m3:latest is OpenHuman's default memory-tree embedding model. Nothing stopped a user selecting it as their Ollama chat model, after which every chat turn failed:

ollama API error (400 Bad Request): {"error":{"message":"\"bge-m3:latest\" does not support chat","type":"invalid_request_error","param":null,"code":null}}

Two failures compounded it:

  1. No prevention — the chat-model picker listed every installed Ollama model, embedding-only ones included.
  2. Noise + opaque UX — the 400 bypassed the 404-only completion_only_404_guard, the classifier had no matching phrase (so it re-reported every retry → 36.6k events), and the user saw raw upstream JSON instead of remediation.

Deterministic user-state (capability mismatch), not a server bug — same family as TAURI-RUST-35 (does not support tools).

Solution

Layer 1 — prevention (root cause):

  • Parse the Ollama /api/show capabilities list (previously deserialized but unused) and classify each model chat-capable / embedding-only / unknown (ollama_chat_capability, ollama.rs). Surface chat_capable on each diagnostics installed_models entry, fetched in the same /api/show round-trip that already resolves the context window.
  • Frontend filters chat_capable === false out of the local-model pickers (CustomRoutingDialog + GlobalOwnModelSelector) via isChatSelectableLocalModel. Unknown (null) stays visible — fail-open. The embedding model is configured in a separate panel, so embedding selection is unaffected.

Layer 2 — Sentry demotion: add "does not support chat" to is_provider_config_rejection_message (config_rejection.rs). Demotes error→info via both the inline guard sites and api_error. 36.6k → 0.

Layer 3 — actionable error: not_chat_capable_guard (400/422) rewrites the opaque JSON into model '<m>' does not support chat — assign a chat-capable model in Settings → AI, preserving the phrase so it stays demoted. Wired into all chat-completions error paths.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) — Rust: classifier verbatim body + 6 compatible guard/wire tests + ollama_chat_capability matrix; Frontend: 4 isChatSelectableLocalModel cases (false / true / unknown-fail-open / list-filter)
  • Diff coverage ≥ 80% — filter+map extracted to pure toSelectableChatModels helper with unit tests so the changed FE lines are covered (Coverage Gate)
  • Coverage matrix updated — N/A: behaviour-only change (no new feature row)
  • All affected feature IDs from the matrix are listed — N/A: none touched
  • No new external network dependencies introduced — wire tests use wiremock; capability uses the existing /api/show call
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: error-classification + picker filter, no release-cut surface
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Desktop (all platforms). Touches the Ollama chat-model picker (hides embedding-only models) and the chat-completions error path.
  • Security/perf: none. Capability reuses the existing /api/show round-trip — no extra requests.
  • Observability: removes the 36.6k/2-user Sentry stream; event becomes an info-level config-rejection log.
  • Compat / fail-open: older Ollama or an /api/show miss → chat_capable unknown → model stays selectable, so no usable model is ever hidden.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/4p6-ollama-chat-capability-reject
  • Commit SHA: f073111 (classifier) · 751fa71 (guard) · 7497730 (capability) · 3baf728 (picker filter)

Validation Run

  • pnpm --filter openhuman-app format:check — prettier + rust fmt clean (verified locally)
  • pnpm typecheck — clean
  • Focused tests: cargo test --lib openhuman::inference::provider::{compatible,config_rejection} (151 + 11 ok), ...::local::ollama (44 ok); vitest run aiRouting.test.ts (10 ok)
  • Rust fmt/check (if changed): cargo fmt + cargo check clean; cargo clippy --lib clean on changed files
  • Tauri fmt/check (if changed): pnpm rust:check Finished clean (no shell change; verified post-submodule-init)

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: embedding-only Ollama models no longer appear in chat-model pickers; the residual 400 is demoted + made actionable
  • User-visible effect: can't misconfigure an embedding model as chat; clearer error if one slips through

Parity Contract

  • Legacy behavior preserved: unknown-capability models stay visible (fail-open); embedding-model selection panel unchanged; all other 4xx/5xx handling unchanged; guard fires only on 400/422 carrying the exact phrase
  • Guard/fallback/dispatch parity checks: not_chat_capable_ignores_unrelated_400 + not_chat_capable_requires_4xx_status; chat_capability_classifies_* covers completion/chat/embedding/unknown; api_error SessionExpired/classification on chat_with_history preserved (message upgraded post-hoc)

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none (searched open + merged for does not support chat / bge-m3 / 4P6 / config_rejection)
  • Canonical PR: this
  • Resolution: N/A

oxoxDev added 2 commits June 4, 2026 16:36
…er-state (Sentry TAURI-RUST-4P6) (tinyhumansai#3359)

An embedding model (bge-m3) picked as the Ollama chat model is rejected
with a 400 'does not support chat' on every turn. The 400 bypasses the
404-only completion_only guard and the classifier had no matching phrase,
so the raw body re-reported each turn — 36.6k events / 2 users.

Add 'does not support chat' to is_provider_config_rejection_message so the
event is demoted error->info, same treatment as the sibling 'does not
support tools' (TAURI-RUST-35). Tests cover the verbatim Sentry body and
the enriched message shape.
…ntry TAURI-RUST-4P6) (tinyhumansai#3359)

Ollama 400s an embedding-model-as-chat with 'does not support chat'. Unlike
the completion-only base-model case (404), this is a 400/422, so add a
not_chat_capable_guard that fires on those statuses and rewrites the opaque
upstream JSON into 'model <m> does not support chat — assign a chat-capable
model in Settings -> AI'. The message preserves the phrase so it stays
demoted by the config-rejection classifier on re-report.

Wired into all chat-completions error paths (chat_with_system,
chat_with_tools, and the api_error-delegated chat_with_history non-404
branch). 6 unit + wire tests.
@oxoxDev oxoxDev requested a review from a team June 4, 2026 11:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds fast-fail detection and actionable remediation for OpenAI-compatible backends (Ollama) rejecting embedding models used as chat models. Detects HTTP 400/422 "does not support chat" errors, replaces opaque JSON with actionable configuration guidance, wires guards into three chat request paths, and extends error classification to prevent Sentry re-reporting.

Changes

Embedding Model as Chat Model — Handling

Layer / File(s) Summary
Core error detection and message generation
src/openhuman/inference/provider/compatible.rs, src/openhuman/inference/provider/compatible_tests.rs
Adds is_not_chat_capable_model, not_chat_capable_model_message, and not_chat_capable_guard helpers to detect HTTP 400/422 errors containing "does not support chat", generate model-specific actionable remediation text, and wrap into fast-fail errors. Unit tests validate status-code requirements, signature matching, message enrichment, classifier integration, and guard firing.
Chat request path integration
src/openhuman/inference/provider/compatible.rs, src/openhuman/inference/provider/compatible_tests.rs
Wires the guard into chat_with_system, chat_with_history, and native chat entry points to short-circuit on non-chat-capable model rejections. End-to-end test confirms chat_with_history fails fast with actionable message and preserves config-rejection classification without fallback.
Provider config-rejection classifier extension
src/openhuman/inference/provider/config_rejection.rs
Extends is_provider_config_rejection_message to recognize "does not support chat" as a provider-config-rejection substring. Unit tests validate raw and enriched error bodies for TAURI-RUST-4P6 (embedding model as chat) classify correctly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • tinyhumansai/openhuman#3211: Adds fast-fail model capability mismatch guard in the same three chat entry points for completion-only model 404 rejection pattern.
  • tinyhumansai/openhuman#2813: Extends is_provider_config_rejection_message to detect "does not support tools" rejection (same family of capability mismatch errors).
  • tinyhumansai/openhuman#2346: Adds logging and suppression of provider config-rejection errors in streaming and native chat paths.

Suggested labels

sentry-traced-bug, bug, rust-core

Suggested reviewers

  • M3gA-Mind
  • senamakel

Poem

🐰 A rabbit hops through error logs with glee,
"Your embedding's confused—use chat, you'll see!"
Four hundred's now actionable, helpful, and true,
No more Sentry floods from a model askew. 🔧✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All objectives from #3359 are met: the PR adds detection for 'does not support chat' to the config-rejection classifier, implements a guard generating actionable remediation messages, wires it into all chat completion paths, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue: classifier updates, guard implementation, error path integration, and related test coverage; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately summarizes the main changes: fixing Ollama embedding-model-as-chat handling (Sentry TAURI-RUST-4P6) by adding detection and actionable error messages.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels Jun 4, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 4, 2026
oxoxDev added 2 commits June 4, 2026 17:00
…nyhumansai#3359)

Parse the /api/show 'capabilities' list (already deserialized but unused)
and classify each model chat-capable / embedding-only / unknown via
ollama_chat_capability. Surface it as 'chat_capable' on each
diagnostics installed_models entry, fetched in the same /api/show
round-trip that already resolves the context window. Fail-open: unknown
(empty/unrecognised capabilities) stays chat-capable. Prereq for hiding
embedding-only models from the chat picker (Sentry TAURI-RUST-4P6).
…inyhumansai#3359)

Filter models the core flagged chat_capable=false out of the local-model
pickers (CustomRoutingDialog + GlobalOwnModelSelector) via the new
isChatSelectableLocalModel helper in useInstalledModels. Prevents the
root-cause misconfig behind TAURI-RUST-4P6: picking an embedding model
(bge-m3) as the chat model 400s every turn. Unknown capability stays
visible (fail-open); embedding selection is a separate panel, unaffected.
4 helper tests added.
@oxoxDev oxoxDev changed the title fix(observability,inference): classify ollama embedding-model-as-chat 400 + actionable error (Sentry TAURI-RUST-4P6) (#3359) fix(inference,ai-settings): prevent + handle ollama embedding-model-as-chat (Sentry TAURI-RUST-4P6) (#3359) Jun 4, 2026
…per (tinyhumansai#3359)

Extract the embedding-only filter + picker-shape map out of the
useInstalledModels hook into a pure toSelectableChatModels helper in
aiRouting.ts, with unit tests. The hook's map() lines were the only
changed frontend lines uncovered by Vitest (AIPanel.tsx 484,486-487),
dropping diff coverage to 50% and failing the >=80% Coverage Gate.
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed locally — well-structured, correctly layered fix (prevention + Sentry demotion + actionable error), with consistent fail-open behavior end to end and thorough tests. Approving in spirit; a few inline non-blocking notes below.

) {
return false;
}
error.to_lowercase().contains("does not support chat")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: this does a raw substring match (does not support chat) over the whole sanitized body. Given the verbatim Ollama wire shape this is near-impossible to false-positive in practice, and the 4xx gate above + the not_chat_capable_ignores_unrelated_400 test guard it well. The only theoretical snag is an unrelated 400 whose body happens to contain that phrase in a different field (e.g. an error about some other feature not supporting "chat history"). No change needed — flagging for completeness.

// in `compatible.rs` rewrites the opaque upstream JSON into an
// actionable "assign a chat-capable model" message that still carries
// this substring, so it stays demoted.
"does not support chat",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth making the coupling load-bearing in a comment here too (it already is in compatible.rs): this anchor phrase is what keeps the rewritten actionable message demoted. If anyone ever softens the rewrite in not_chat_capable_model_message and drops the phrase, the 36.6k stream reopens. The TAURI-RUST-4P6-enriched classifier test covers it — good — but a one-liner here pointing at that dependency would help the next editor.

/// Callers treat `None` as "keep visible" — fail-open, never hide a model
/// that might be usable for chat. Mirrors the non-rejecting `Unknown` arm of
/// [`super::model_requirements::ContextEligibility`]. See Sentry TAURI-RUST-4P6.
pub(crate) fn ollama_chat_capability(capabilities: &[String]) -> Option<bool> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice — conservative in the right direction: only Some(false) when confident it's embedding-only, None everywhere ambiguous, and completion/chat wins over an embedding tag. The case/whitespace tolerance and the ["insert"]-only → None case are both covered by the matrix test. Mirrors the existing Unknown eligibility arm; consistent with the codebase's fail-open posture.

Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Correct root-cause fix with well-reasoned defense-in-depth (prevention + Sentry demotion + actionable error), fail-open behavior that's consistent across Rust/TS/picker layers, exemplary documentation of the cross-module anchor-phrase coupling, and thorough tests (classifier matrix, guard signature, wiremock E2E, classifier-parity for both raw and enriched messages). The inline notes are non-blocking. Nice work.

@M3gA-Mind M3gA-Mind merged commit 122196b into tinyhumansai:main Jun 4, 2026
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedding model picked as chat model floods Sentry with ollama 400 "does not support chat" (TAURI-RUST-4P6)

2 participants