Skip to content

fix(memory): add fallback model chain for unavailable GMI models#1704

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
Sathvik-1007:fix/cloud-chat-model-fallback
May 15, 2026
Merged

fix(memory): add fallback model chain for unavailable GMI models#1704
senamakel merged 1 commit into
tinyhumansai:mainfrom
Sathvik-1007:fix/cloud-chat-model-fallback

Conversation

@Sathvik-1007
Copy link
Copy Markdown
Contributor

@Sathvik-1007 Sathvik-1007 commented May 14, 2026

Summary

  • Add automatic fallback model chain when configured GMI model is unavailable for the user's org.
  • Detect "not available for your organization" errors and try known summarization models before failing.
  • Add 4 unit tests for the detection logic and fallback list invariants.

Problem

  • Summarization pipeline fails completely when deepseek-ai/DeepSeek-V4-Flash (or any configured model) isn't provisioned for the org.
  • API returns 404 with clear message but no recovery path exists — blocks all summarization for affected users.
  • Sentry: OPENHUMAN-TAURI-CC.

Solution

  • Add FALLBACK_MODELS constant with known working summarization models (summarization-v1, deepseek-ai/DeepSeek-V3-0324, deepseek-ai/DeepSeek-V3).
  • Add is_model_unavailable_error() helper that detects the 404 "not available" pattern.
  • In chat_for_json: try configured model first, on unavailable error iterate fallbacks (skipping configured model), return first success or bail with clear message.
  • Follows the module's soft-fallback contract — transient model unavailability doesn't abort ingest.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case)
  • Diff coverage >= 80% — 4 new tests cover all detection + fallback logic
  • Coverage matrix updated — N/A: bugfix-only
  • All affected feature IDs from the matrix are listed — N/A: bugfix
  • No new external network dependencies introduced
  • Manual smoke checklist updated — N/A: internal memory pipeline
  • Linked issue closed via Closes #NNN in the Related section

Impact

  • Desktop only (Tauri sidecar). No web/mobile/CLI impact.
  • Users whose org lacks the configured model now get automatic fallback instead of hard failure.
  • No performance regression — fallback only triggers on 404, adds one extra HTTP call per unavailable model.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

  • N/A (human PR)

Summary by CodeRabbit

  • Bug Fixes

    • Chat now automatically retries and falls back to alternative models when the configured model is unavailable, and surfaces clearer errors for other failures.
  • Tests

    • Added tests for unavailable-model detection, non-fallbacking errors, generic 404s, and validation of the fallback list.
  • Documentation

    • Updated comments to describe the new retry-and-fallback behavior.

Review Change Stack

@Sathvik-1007 Sathvik-1007 requested a review from a team May 14, 2026 05:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e3b69d01-5437-405f-8480-ece786077004

📥 Commits

Reviewing files that changed from the base of the PR and between 9af4136 and 1dbb797.

📒 Files selected for processing (1)
  • src/openhuman/memory/tree/chat/cloud.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/memory/tree/chat/cloud.rs

📝 Walkthrough

Walkthrough

CloudChatProvider::chat_for_json now tries the configured model, and on "not available for your organization" errors iterates an ordered FALLBACK_MODELS list (skipping duplicates), returning on first success or bailing with an explicit message if all models are unavailable.

Changes

Model Fallback and Unavailability Detection

Layer / File(s) Summary
Model unavailability detection and fallback constants
src/openhuman/memory/tree/chat/cloud.rs
Module documentation updated to describe fallback behavior. Introduces FALLBACK_MODELS ordered list and is_model_unavailable_error helper that classifies errors by matching the "not available for your organization" phrase on the formatted error string.
Fallback retry mechanism and try_model helper
src/openhuman/memory/tree/chat/cloud.rs
Private try_model helper wraps single-model calls to the backend provider. chat_for_json now tries the configured model first, then iterates through fallback models in order (skipping duplicates) on unavailability errors, returning on first success or bailing with an explicit message if all models are unavailable.
Fallback error detection and list validation tests
src/openhuman/memory/tree/chat/cloud.rs
Unit tests verify is_model_unavailable_error correctly detects and distinguishes unavailability errors from generic 404s and unrelated errors. Tests confirm FALLBACK_MODELS contains summarization-v1 and is non-empty.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant chat_for_json
  participant try_model
  participant OpenHumanBackendProvider

  Caller->>chat_for_json: request summarization (configured model)
  chat_for_json->>try_model: try configured model
  try_model->>OpenHumanBackendProvider: chat_with_history(model)
  OpenHumanBackendProvider-->>try_model: error (model unavailable)
  try_model-->>chat_for_json: Err(Unavailable)
  chat_for_json->>chat_for_json: is_model_unavailable_error? true
  chat_for_json->>try_model: try fallback model A
  try_model->>OpenHumanBackendProvider: chat_with_history(fallback A)
  OpenHumanBackendProvider-->>try_model: Ok(response)
  try_model-->>chat_for_json: Ok(response)
  chat_for_json-->>Caller: return response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hop through models, one then two,
If one won't answer, I'll try a few.
Summaries waiting at each little door,
I'll nudge the fallbacks till answers pour.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding a fallback model chain for unavailable GMI models, which is the core objective of this PR.
Linked Issues check ✅ Passed The PR implements the fallback chain approach from issue #1598, attempting unavailable models and retrying with known alternatives before failing.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the fallback chain mechanism in CloudChatProvider for the configured GMI model, directly addressing the linked issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/openhuman/memory/tree/ingest.rs (1)

556-561: 💤 Low value

Consider tightening the ASCII test assertion.

For pure ASCII input, every byte is already a char boundary, so floor_char_boundary will return exactly len - 2048, producing a result of exactly 2048 bytes. The current range assertion preview.len() <= 2048 + 3 is correct but looser than necessary for this test case.

🔍 Suggested refinement for test precision
 fn body_preview_long_ascii_truncates_to_trailing_bytes() {
     let long = "A".repeat(4096);
     let preview = super::build_body_preview(&long);
-    assert!(preview.len() >= 2048);
-    assert!(preview.len() <= 2048 + 3); // at most 3 extra bytes from boundary rounding
+    // Pure ASCII: every byte is a char boundary, so we get exactly 2048 bytes
+    assert_eq!(preview.len(), 2048);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/tree/ingest.rs` around lines 556 - 561, The test
body_preview_long_ascii_truncates_to_trailing_bytes is too loose for ASCII
input; since ASCII characters are single-byte, build_body_preview(long) should
produce exactly 2048 bytes. Update the assertion in that test (function
body_preview_long_ascii_truncates_to_trailing_bytes) to assert equality
(preview.len() == 2048) instead of the current range check, keeping references
to build_body_preview and the ASCII long string setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory/tree/chat/cloud.rs`:
- Around line 129-136: Add a grep-friendly tracing/log line before each terminal
return in the cloud chat error branches: in the Err(e) case inside the cloud
chat request closure (the block that calls Err(e).with_context and references
prompt.kind and self.model) and in the final "all fallbacks unavailable" exit
path; use tracing::debug or tracing::trace to log a concise, searchable message
that includes the failure context (prompt.kind, self.model, and the
error/summary) right before returning so the terminal branches follow the repo
logging standard.
- Around line 33-35: The current is_model_unavailable_error function treats any
message containing "model" and "404" as an unavailable-model case which is too
broad; update is_model_unavailable_error to only return true for the explicit
"not available for your organization" phrase OR for error strings that clearly
indicate the model resource was not found/unavailable (e.g., match a tighter
pattern such as "model.*not found" or provider-specific "model .* not found|does
not exist|is not available" using a regex or explicit substrings) rather than
any generic "404" alongside "model", and add a unit test that verifies a generic
404 error message (containing "404" and "model" but not the precise unavailable
phrasing) returns false to prevent regression.

---

Nitpick comments:
In `@src/openhuman/memory/tree/ingest.rs`:
- Around line 556-561: The test
body_preview_long_ascii_truncates_to_trailing_bytes is too loose for ASCII
input; since ASCII characters are single-byte, build_body_preview(long) should
produce exactly 2048 bytes. Update the assertion in that test (function
body_preview_long_ascii_truncates_to_trailing_bytes) to assert equality
(preview.len() == 2048) instead of the current range check, keeping references
to build_body_preview and the ASCII long string setup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccc73f0a-9ca8-46f0-8f8c-4bbad2d05102

📥 Commits

Reviewing files that changed from the base of the PR and between be21451 and eea811e.

📒 Files selected for processing (2)
  • src/openhuman/memory/tree/chat/cloud.rs
  • src/openhuman/memory/tree/ingest.rs

Comment thread src/openhuman/memory/tree/chat/cloud.rs Outdated
Comment thread src/openhuman/memory/tree/chat/cloud.rs
@graycyrus
Copy link
Copy Markdown
Contributor

@Sathvik-1007 please resolve merge conflicts before review.

@Sathvik-1007 Sathvik-1007 force-pushed the fix/cloud-chat-model-fallback branch from eea811e to 9401748 Compare May 14, 2026 07:52
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 14, 2026
When configured cloud_llm_model returns 404 'not available for your
organization', try FALLBACK_MODELS list before failing. Prevents
summarization pipeline from blocking entirely on model provisioning.

- Tighten is_model_unavailable_error to only match explicit phrase
- Add log::warn at all terminal error paths per repo logging standard
- Add negative test: generic 404 must NOT trigger fallback chain

Closes tinyhumansai#1598
@Sathvik-1007 Sathvik-1007 force-pushed the fix/cloud-chat-model-fallback branch 2 times, most recently from 9af4136 to 1dbb797 Compare May 14, 2026 17:18
@senamakel senamakel merged commit c4e9ce7 into tinyhumansai:main May 15, 2026
27 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GMI model 'deepseek-ai/DeepSeek-V4-Flash' unavailable — summarization fallback needed

3 participants