fix: add provider/model failover to streaming LLM calls by Sathvik-1007 · Pull Request #2022 · tinyhumansai/openhuman

Sathvik-1007 · 2026-05-17T20:23:24Z

Summary

stream_chat_with_system() in ReliableProvider only tried the first streaming-capable provider with the first model in the chain. Any transient error (rate limit, timeout, 503) propagated immediately — while non-streaming methods (chat_with_system, chat, chat_with_tools) had full retry + provider failover + model fallback.

Changes

Pre-create streams for all provider+model candidates (full model chain × all streaming providers)
Iterate candidates in a spawned task; commit to the first stream that yields a successful first chunk
On transient failure, apply exponential backoff then try next candidate
On non-retryable error, skip backoff and move to next candidate immediately
If all candidates exhausted, emit a single StreamError::Provider with clear message

Testing

All 53 reliable::tests pass
All 201 providers tests pass
cargo check clean (no new warnings)

Closes #1931

Summary by CodeRabbit

Bug Fixes
- Improved streaming reliability with robust failover across multiple providers and models.
- Better classification of streaming errors to distinguish retryable vs non-retryable failures.
- Added exponential backoff for retry attempts to increase resilience and recovery.
- Returns immediate, clear error when streaming is disabled or no providers support streaming.

coderabbitai · 2026-05-17T20:23:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 360e53c2-8fe6-4419-b5c8-2fdec9b819f8

📥 Commits

Reviewing files that changed from the base of the PR and between d2fd505 and 9008933.

📒 Files selected for processing (1)

src/openhuman/inference/provider/reliable.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/openhuman/inference/provider/reliable.rs

📝 Walkthrough

Walkthrough

ReliableProvider::stream_chat_with_system now performs provider+model streaming failover: it pre-creates candidate streams, spawns a task that peeks the first chunk to decide commit vs retry (with exponential backoff using is_stream_error_non_retryable), forwards committed chunks via an mpsc channel, and returns a BoxStream built from the receiver.

Changes

Provider/model failover for streaming requests

Layer / File(s)	Summary
Streaming failover implementation `src/openhuman/inference/provider/reliable.rs`	Adds `is_stream_error_non_retryable(&StreamError)` and rewrites `ReliableProvider::stream_chat_with_system` to return an immediate error stream when disabled or unsupported, build provider×model candidate streams, spawn a failover task that peeks the first chunk to commit or retry (with exponential backoff up to max_retries), forward committed-stream chunks via an `mpsc` channel, and construct the returned `BoxStream` with `stream::unfold` over the channel receiver.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

tinyhumansai/openhuman#1723: Related work around wrapping backends with ReliableProvider for retry behavior; interacts with the streaming failover changes here.

Suggested reviewers

senamakel

Poem

🐰 I peek the first hop, then choose who to trust,
Spawned tasks hum softly and backoff is just—
Streams find their home through channels so neat,
Chunks hop along on resilient feet. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding provider/model failover to streaming calls, which directly addresses the bug fix objective.
Linked Issues check	✅ Passed	The PR successfully implements all coding requirements from `#1931`: retry logic with exponential backoff, provider/model failover for transient errors, and parity with non-streaming behavior.
Out of Scope Changes check	✅ Passed	All changes are scoped to the streaming failover implementation in reliable.rs with no unrelated alterations to exported APIs or external behavior.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/inference/provider/reliable.rs`:
- Around line 996-1011: The current branch converts the StreamError to a string
and wraps it in anyhow, which loses type information and prevents
is_non_retryable from downcasting; instead, preserve and inspect the original
StreamError: in the Some(Err(ref e)) arm of the stream handling, match on
StreamError's variants (or add/ call a getter like StreamError::source_error()
or StreamError::is_non_retryable()) to extract the inner anyhow::Error or
underlying reqwest::Error and pass that to is_non_retryable (or call the new
StreamError::is_non_retryable helper), while keeping the tracing::warn message
(use e.to_string() only for the log text). Update the branch around
provider_name/current_model to use the original error value for retry
classification rather than the stringified error.
- Around line 980-1023: The streaming failover loop over candidate_streams only
tries each (provider_name, current_model, candidate_stream) once, so transient
errors immediately fail over; add an inner retry loop (using max_retries and
backoff_ms similar to the non-streaming path) that attempts to re-open/advance
the same candidate stream up to max_retries before moving to the next candidate;
on rate-limit/rotate-worthy errors call rotate_key(), record each failure with
push_failure (and use format_failure_aggregate when sending the aggregated error
downstream), honor is_non_retryable to skip retries, and keep sending successful
chunks to tx as currently implemented; ensure backoff_ms is reset/managed
per-candidate and that failure aggregation is attached to the final error sent
when all retries for a candidate are exhausted.
- Around line 961-973: The current code eagerly builds candidate_streams by
calling stream_chat_with_system for every provider/model pair (using
candidate_streams, stream_chat_with_system, model_chain, streaming_providers),
which opens connections up-front; instead, change to collect a Vec of
lightweight candidates (e.g., tuples of provider_name, model, and an
Arc/cloneable reference to the provider) and move the call to
stream_chat_with_system into the spawned failover task where you attempt each
candidate in order, creating the stream only when you start that attempt so
unused providers are never contacted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e987385b-e313-4996-b980-669ff048dbdf

📥 Commits

Reviewing files that changed from the base of the PR and between f9de38d and d2fd505.

📒 Files selected for processing (1)

src/openhuman/inference/provider/reliable.rs

coderabbitai · 2026-05-17T20:26:59Z

+        let mut candidate_streams: Vec<(String, String, stream::BoxStream<'static, StreamResult<StreamChunk>>)> = Vec::new();
+        for current_model in &model_chain {
+            for (provider_name, provider) in &streaming_providers {
+                let s = provider.stream_chat_with_system(
+                    system_prompt,
+                    message,
+                    current_model,
+                    temperature,
+                    options,
+                );
+                candidate_streams.push(((*provider_name).clone(), current_model.clone(), s));
            }
+        }


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Pre-creating all candidate streams initiates connections to all providers immediately.

Calling stream_chat_with_system on every provider×model combination upfront may open HTTP connections to all providers before any failover is needed. If the first candidate succeeds, the remaining connections are wasted and may unnecessarily consume rate-limit budgets.

Consider lazily creating streams inside the spawned task. One approach: pass a Vec of (provider_name, model, Arc<dyn Provider>) into the task and call stream_chat_with_system only when attempting that candidate.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/provider/reliable.rs` around lines 961 - 973, The current code eagerly builds candidate_streams by calling stream_chat_with_system for every provider/model pair (using candidate_streams, stream_chat_with_system, model_chain, streaming_providers), which opens connections up-front; instead, change to collect a Vec of lightweight candidates (e.g., tuples of provider_name, model, and an Arc/cloneable reference to the provider) and move the call to stream_chat_with_system into the spawned failover task where you attempt each candidate in order, creating the stream only when you start that attempt so unused providers are never contacted.

stream_chat_with_system only tried the first streaming-capable provider with the first model. Transient errors propagated immediately while non-streaming methods had full retry + failover. Now iterates all provider+model candidates with exponential backoff between transient failures, matching non-streaming reliability behavior. Closes tinyhumansai#1931

Sathvik-1007 requested a review from a team May 17, 2026 20:23

coderabbitai Bot requested changes May 17, 2026

View reviewed changes

Sathvik-1007 force-pushed the fix/streaming-retry-failover branch from d2fd505 to 9008933 Compare May 17, 2026 20:53

senamakel merged commit 3e7fde4 into tinyhumansai:main May 17, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add provider/model failover to streaming LLM calls#2022

fix: add provider/model failover to streaming LLM calls#2022
senamakel merged 1 commit into
tinyhumansai:mainfrom
Sathvik-1007:fix/streaming-retry-failover

Sathvik-1007 commented May 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sathvik-1007 commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sathvik-1007 commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading