Skip to content

fix: add provider/model failover to streaming LLM calls#2022

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
Sathvik-1007:fix/streaming-retry-failover
May 17, 2026
Merged

fix: add provider/model failover to streaming LLM calls#2022
senamakel merged 1 commit into
tinyhumansai:mainfrom
Sathvik-1007:fix/streaming-retry-failover

Conversation

@Sathvik-1007
Copy link
Copy Markdown
Contributor

@Sathvik-1007 Sathvik-1007 commented May 17, 2026

Summary

stream_chat_with_system() in ReliableProvider only tried the first streaming-capable provider with the first model in the chain. Any transient error (rate limit, timeout, 503) propagated immediately — while non-streaming methods (chat_with_system, chat, chat_with_tools) had full retry + provider failover + model fallback.

Changes

  • Pre-create streams for all provider+model candidates (full model chain × all streaming providers)
  • Iterate candidates in a spawned task; commit to the first stream that yields a successful first chunk
  • On transient failure, apply exponential backoff then try next candidate
  • On non-retryable error, skip backoff and move to next candidate immediately
  • If all candidates exhausted, emit a single StreamError::Provider with clear message

Testing

  • All 53 reliable::tests pass
  • All 201 providers tests pass
  • cargo check clean (no new warnings)

Closes #1931

Summary by CodeRabbit

  • Bug Fixes
    • Improved streaming reliability with robust failover across multiple providers and models.
    • Better classification of streaming errors to distinguish retryable vs non-retryable failures.
    • Added exponential backoff for retry attempts to increase resilience and recovery.
    • Returns immediate, clear error when streaming is disabled or no providers support streaming.

Review Change Stack

@Sathvik-1007 Sathvik-1007 requested a review from a team May 17, 2026 20:23
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 360e53c2-8fe6-4419-b5c8-2fdec9b819f8

📥 Commits

Reviewing files that changed from the base of the PR and between d2fd505 and 9008933.

📒 Files selected for processing (1)
  • src/openhuman/inference/provider/reliable.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/inference/provider/reliable.rs

📝 Walkthrough

Walkthrough

ReliableProvider::stream_chat_with_system now performs provider+model streaming failover: it pre-creates candidate streams, spawns a task that peeks the first chunk to decide commit vs retry (with exponential backoff using is_stream_error_non_retryable), forwards committed chunks via an mpsc channel, and returns a BoxStream built from the receiver.

Changes

Provider/model failover for streaming requests

Layer / File(s) Summary
Streaming failover implementation
src/openhuman/inference/provider/reliable.rs
Adds is_stream_error_non_retryable(&StreamError) and rewrites ReliableProvider::stream_chat_with_system to return an immediate error stream when disabled or unsupported, build provider×model candidate streams, spawn a failover task that peeks the first chunk to commit or retry (with exponential backoff up to max_retries), forward committed-stream chunks via an mpsc channel, and construct the returned BoxStream with stream::unfold over the channel receiver.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1723: Related work around wrapping backends with ReliableProvider for retry behavior; interacts with the streaming failover changes here.

Suggested reviewers

  • senamakel

Poem

🐰 I peek the first hop, then choose who to trust,
Spawned tasks hum softly and backoff is just—
Streams find their home through channels so neat,
Chunks hop along on resilient feet. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding provider/model failover to streaming calls, which directly addresses the bug fix objective.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from #1931: retry logic with exponential backoff, provider/model failover for transient errors, and parity with non-streaming behavior.
Out of Scope Changes check ✅ Passed All changes are scoped to the streaming failover implementation in reliable.rs with no unrelated alterations to exported APIs or external behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/inference/provider/reliable.rs`:
- Around line 996-1011: The current branch converts the StreamError to a string
and wraps it in anyhow, which loses type information and prevents
is_non_retryable from downcasting; instead, preserve and inspect the original
StreamError: in the Some(Err(ref e)) arm of the stream handling, match on
StreamError's variants (or add/ call a getter like StreamError::source_error()
or StreamError::is_non_retryable()) to extract the inner anyhow::Error or
underlying reqwest::Error and pass that to is_non_retryable (or call the new
StreamError::is_non_retryable helper), while keeping the tracing::warn message
(use e.to_string() only for the log text). Update the branch around
provider_name/current_model to use the original error value for retry
classification rather than the stringified error.
- Around line 980-1023: The streaming failover loop over candidate_streams only
tries each (provider_name, current_model, candidate_stream) once, so transient
errors immediately fail over; add an inner retry loop (using max_retries and
backoff_ms similar to the non-streaming path) that attempts to re-open/advance
the same candidate stream up to max_retries before moving to the next candidate;
on rate-limit/rotate-worthy errors call rotate_key(), record each failure with
push_failure (and use format_failure_aggregate when sending the aggregated error
downstream), honor is_non_retryable to skip retries, and keep sending successful
chunks to tx as currently implemented; ensure backoff_ms is reset/managed
per-candidate and that failure aggregation is attached to the final error sent
when all retries for a candidate are exhausted.
- Around line 961-973: The current code eagerly builds candidate_streams by
calling stream_chat_with_system for every provider/model pair (using
candidate_streams, stream_chat_with_system, model_chain, streaming_providers),
which opens connections up-front; instead, change to collect a Vec of
lightweight candidates (e.g., tuples of provider_name, model, and an
Arc/cloneable reference to the provider) and move the call to
stream_chat_with_system into the spawned failover task where you attempt each
candidate in order, creating the stream only when you start that attempt so
unused providers are never contacted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e987385b-e313-4996-b980-669ff048dbdf

📥 Commits

Reviewing files that changed from the base of the PR and between f9de38d and d2fd505.

📒 Files selected for processing (1)
  • src/openhuman/inference/provider/reliable.rs

Comment on lines +961 to +973
let mut candidate_streams: Vec<(String, String, stream::BoxStream<'static, StreamResult<StreamChunk>>)> = Vec::new();
for current_model in &model_chain {
for (provider_name, provider) in &streaming_providers {
let s = provider.stream_chat_with_system(
system_prompt,
message,
current_model,
temperature,
options,
);
candidate_streams.push(((*provider_name).clone(), current_model.clone(), s));
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Pre-creating all candidate streams initiates connections to all providers immediately.

Calling stream_chat_with_system on every provider×model combination upfront may open HTTP connections to all providers before any failover is needed. If the first candidate succeeds, the remaining connections are wasted and may unnecessarily consume rate-limit budgets.

Consider lazily creating streams inside the spawned task. One approach: pass a Vec of (provider_name, model, Arc<dyn Provider>) into the task and call stream_chat_with_system only when attempting that candidate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/inference/provider/reliable.rs` around lines 961 - 973, The
current code eagerly builds candidate_streams by calling stream_chat_with_system
for every provider/model pair (using candidate_streams, stream_chat_with_system,
model_chain, streaming_providers), which opens connections up-front; instead,
change to collect a Vec of lightweight candidates (e.g., tuples of
provider_name, model, and an Arc/cloneable reference to the provider) and move
the call to stream_chat_with_system into the spawned failover task where you
attempt each candidate in order, creating the stream only when you start that
attempt so unused providers are never contacted.

Comment thread src/openhuman/inference/provider/reliable.rs
Comment thread src/openhuman/inference/provider/reliable.rs Outdated
stream_chat_with_system only tried the first streaming-capable
provider with the first model. Transient errors propagated
immediately while non-streaming methods had full retry + failover.

Now iterates all provider+model candidates with exponential
backoff between transient failures, matching non-streaming
reliability behavior.

Closes tinyhumansai#1931
@Sathvik-1007 Sathvik-1007 force-pushed the fix/streaming-retry-failover branch from d2fd505 to 9008933 Compare May 17, 2026 20:53
@senamakel senamakel merged commit 3e7fde4 into tinyhumansai:main May 17, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Streaming LLM calls have no retry/failover (providers/reliable.rs)

2 participants