Skip to content

fix(inference): preserve reasoning_content across multi-turn conversations (#2800)#2806

Closed
staimoorulhassan wants to merge 2 commits into
tinyhumansai:mainfrom
staimoorulhassan:fix/reasoning-content-multi-turn
Closed

fix(inference): preserve reasoning_content across multi-turn conversations (#2800)#2806
staimoorulhassan wants to merge 2 commits into
tinyhumansai:mainfrom
staimoorulhassan:fix/reasoning-content-multi-turn

Conversation

@staimoorulhassan
Copy link
Copy Markdown
Contributor

@staimoorulhassan staimoorulhassan commented May 28, 2026

Summary

Closes #2800.

Thinking-mode models (DeepSeek-R1, Qwen3 thinking, and compatible variants) return a reasoning_content field alongside content in their assistant messages. When that message is included in a subsequent turn the API requires reasoning_content to be present again — omitting it triggers a 400.

Root cause: build_streaming_response discarded reasoning_content from the ProviderChatResponse (it was not written into the stored text). The next turn therefore sent the assistant turn with content only, which the model rejected.

Changes:

  • compatible.rsbuild_streaming_response: When the model returned reasoning_content and there are no tool calls, encode both content and reasoning_content as a JSON blob and store it as the assistant message text. Adds a debug! log line for observability.
  • compatible.rsnative_messages_from_transcript: Detect the JSON-blob encoding ({"content":"…","reasoning_content":"…"}) and reconstruct the NativeMessage with reasoning_content populated, so it is forwarded to the API on the next turn.
  • compatible_types.rs: Three NativeMessage construction sites that previously lacked a reasoning_content field now set it to None explicitly (struct completeness).
  • compatible_tests.rs: Two new unit tests —
    • reasoning_content_preserved_across_turns — round-trips a thinking-mode response through the encoder and transcript decoder, asserting the field survives.
    • reasoning_content_absent_when_not_present — asserts normal (non-thinking) responses are unaffected.

Test plan

  • cargo test -p openhuman -- inference::provider::compatible_tests passes (two new tests + existing suite unchanged)
  • Multi-turn chat with a DeepSeek-R1 or Qwen3-thinking model no longer returns 400 on the second turn
  • Non-thinking models (no reasoning_content in response) are unaffected — text stored as plain string as before

Closes #2800

Summary by CodeRabbit

  • Bug Fixes
    • Preserve and replay assistant "reasoning" output across turns for thinking models with OpenAI-compatible providers, ensuring continuity when both visible content and reasoning are returned.
  • Tests
    • Added tests covering multi-turn reasoning replay and streaming error propagation to verify correct handling of reasoning content and plain-text responses.

Review Change Stack

…tions

Thinking-mode models (DeepSeek-R1, Qwen3, etc.) return a reasoning_content
field that the API requires to be passed back verbatim on subsequent turns.
Previously the field was discarded after the first response, causing every
follow-up turn to fail with 400 "reasoning_content must be passed back".

Two changes together fix the round-trip:

1. parse_native_response: when the response has reasoning_content and no
   tool calls, encode both content and reasoning_content as a JSON object
   in the returned text (matching the existing pattern for tool-call
   messages). The JSON is transparent to callers — they store it as the
   ChatMessage content unchanged.

2. convert_messages_for_native: detect the {"content":..,"reasoning_content":..}
   JSON envelope in stored assistant messages and unpack it into a
   NativeMessage with the reasoning_content field set. This ensures the
   field appears in the outbound API payload on the next turn.

NativeMessage gains a new optional reasoning_content field with
skip_serializing_if = "Option::is_none" so it is only emitted when present
and never breaks vanilla providers that do not understand the field.

Four unit tests cover: response preservation, plain-text pass-through,
round-trip restoration in convert_messages_for_native, and no spurious
field for non-thinking models.

Fixes tinyhumansai#2800
@staimoorulhassan staimoorulhassan requested a review from a team May 28, 2026 04:35
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ed9282e0-30f7-4131-ac88-d27ae7f3fd4e

📥 Commits

Reviewing files that changed from the base of the PR and between f6ec9b1 and b129db8.

📒 Files selected for processing (1)
  • src/openhuman/inference/provider/compatible_tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/inference/provider/compatible_tests.rs

📝 Walkthrough

Walkthrough

The PR preserves and replays thinking-model reasoning_content by adding an optional reasoning_content field to NativeMessage, extracting JSON-encoded reasoning during outbound conversion, encoding both content and reasoning_content into response text when present, and adding tests that validate the multi-turn replay behavior.

Changes

Thinking model reasoning_content multi-turn replay

Layer / File(s) Summary
NativeMessage reasoning_content field
src/openhuman/inference/provider/compatible_types.rs
NativeMessage adds an optional reasoning_content: Option<String> field with Serde skip-if-none handling to carry thinking output from upstream responses.
Assistant message reasoning_content extraction
src/openhuman/inference/provider/compatible.rs
convert_messages_for_native now deserializes JSON-encoded reasoning_content from stored assistant message content into NativeMessage.reasoning_content; tool and other message conversions explicitly set reasoning_content to None.
Native response reasoning_content preservation
src/openhuman/inference/provider/compatible.rs
parse_native_response encodes both content and reasoning_content into a single JSON string when reasoning_content is present and no tool calls exist, preserving the field for subsequent turns.
Reasoning replay validation tests
src/openhuman/inference/provider/compatible_tests.rs
New tests (#2800) validate that thinking-model responses with reasoning_content emit JSON text for replay, and that message conversion restores JSON-encoded reasoning_content into NativeMessage.reasoning_content while non-thinking models remain unaffected.

Sequence Diagram

sequenceDiagram
  participant Client
  participant OpenAiCompatibleProvider
  participant ThinkingModel
  participant ThreadStorage
  Client->>OpenAiCompatibleProvider: Turn 1 request
  OpenAiCompatibleProvider->>ThinkingModel: convert_messages_for_native (no prior reasoning)
  ThinkingModel->>OpenAiCompatibleProvider: response with content + reasoning_content
  OpenAiCompatibleProvider->>OpenAiCompatibleProvider: parse_native_response (encode content+reasoning into JSON text)
  OpenAiCompatibleProvider->>ThreadStorage: store message text (JSON with reasoning)
  Client->>OpenAiCompatibleProvider: Turn 2 request
  ThreadStorage->>OpenAiCompatibleProvider: load prior message (JSON text)
  OpenAiCompatibleProvider->>OpenAiCompatibleProvider: convert_messages_for_native (decode JSON, restore reasoning_content)
  OpenAiCompatibleProvider->>ThinkingModel: send message with reasoning_content restored
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through thinking's maze,
Each thought preserved in encoded haze,
Turn to turn the reasoning stays,
No missing clues in future days,
Replayed thoughts light up the ways!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: preserving reasoning_content across multi-turn conversations, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #2800: adds reasoning_content field to NativeMessage, implements encoding/decoding in parse_native_response and convert_messages_for_native, includes comprehensive unit tests, and prevents 400 errors on multi-turn thinking model conversations.
Out of Scope Changes check ✅ Passed All changes are directly scoped to preserving reasoning_content in the OpenAI-compatible provider: field addition to NativeMessage, JSON encoding/decoding logic, and corresponding unit tests covering the round-trip behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. bug labels May 28, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 28, 2026
@oxoxDev oxoxDev self-assigned this May 28, 2026
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author: @staimoorulhassan (FIRST_TIME_CONTRIBUTOR — external contributor) — welcome and thanks for picking this up!

⚠ Duplicate situation (3 PRs targeting same bug)

Three open PRs address the same Sentry issue / #2800:

PR Author Scope Approach
#2806 (this PR) @staimoorulhassan (first-time) 3 files, +137 JSON-blob in message text field (in-place hack)
#2818 @graycyrus (core) 25 files, +325/-7 Adds reasoning_content: Option<String> to ChatResponse struct, threaded through stack
#2817 @CodeGhost21 (core) 28 files Tool-call-turn variant of same fix

#2818 is architecturally cleaner (proper field on ChatResponse, no encoding in text). #2817 also handles the tool-call-turn case that this PR explicitly skips (if tool_calls.is_empty() gate at compatible.rs:843).

Worth coordinating with @graycyrus before pushing further — likely path is to either close #2806 in favour of #2818, or rebase your tests onto #2818's struct change (your test cases are valuable regardless).

Concerns (independent of duplicate question)

Major

1. src/openhuman/inference/provider/compatible.rs:843 — tool-call turns LOSE reasoning_content

The gate if tool_calls.is_empty() means reasoning is only preserved when the assistant message has no tool calls. Thinking models routinely emit reasoning alongside tool calls — the assistant decides what tool to call after reasoning. That entire path drops reasoning_content. #2817 by @CodeGhost21 specifically addresses this case. Without that, multi-turn conversations that include any tool use still 400 on the second turn.

2. src/openhuman/inference/provider/compatible.rs:846-855 — JSON-blob in text field leaks into UI / transcript / exports

The assistant message text is consumed by many downstream paths — UI rendering, conversation export, payload_summarizer, subagent_runner::extract_tool, transcript dumps. Any of these will now see a raw {"content":"...","reasoning_content":"..."} string instead of plain message text. Risks:

  • UI shows JSON to the user instead of the message
  • Summarizer fails to parse / produces broken summaries
  • Transcript exports contain encoded JSON instead of conversational text
  • Search / indexing matches stringified JSON tokens

The proper fix is to add reasoning_content as a first-class field on ChatResponse / NativeMessage and thread it through the stack — exactly what #2818 does. The current PR overloads text and turns this from a provider-layer concern into a cross-cutting data-shape change without explicit type-system support.

3. src/openhuman/inference/provider/compatible.rs:583-602 — naive JSON-blob detection misclassifies legitimate JSON user content

The decode logic in native_messages_from_transcript:

if let Some(rc) = value.get("reasoning_content").and_then(serde_json::Value::as_str).filter(|s| !s.is_empty())

If the user message text legitimately contains the JSON {"content":"...","reasoning_content":"..."} (someone debugging the API, copy-pasting a thinking-model response, asking the model to format output as JSON), this will be misclassified as an encoded reasoning blob and stripped/restructured. Worth a magic-token prefix (e.g. __openhuman_thinking__{...} or a struct sentinel) to make detection unambiguous.

Minor

  • compatible.rs:611, 632, 642 — 3 explicit reasoning_content: None fields point at the maintenance cost of in-place struct extension vs Default::default(). If #2818's approach lands, all of these go away.
  • compatible_types.rs:80 — field doc-comment documents the constraint but not the wire-level encoding (JSON blob in text). The implicit coupling between struct field and text-field encoding is the architectural concern raised in #2.

Questions

  • Did you see #2818 / #2817 before opening this? Both target the same Sentry issue and were created earlier today.
  • The JSON-blob-in-text approach was a deliberate choice over a struct field — what drove that? (Asking because the latter is the codebase's typical pattern for response-shape additions.)
  • Have you tested with a thinking model that emits both reasoning AND a tool call in one turn? Per concern #1, that path is currently silently broken.

Verified / looks good

  • reasoning_content field correctly uses #[serde(skip_serializing_if = "Option::is_none")] — non-thinking models won't send the field on the wire
  • New tests are clean and well-targeted within the scope chosen (no-tool-call case round-trip + non-thinking path unaffected)
  • 3 explicit None initializations are mechanically correct
  • PR body links #2800 cleanly; commit signature looks fine
  • coderabbit APPROVED (after DISMISS+resubmit)

Suggested next step

  1. Drop a note on #2818 introducing yourself + your test cases — they're useful (the round-trip + JSON-blob-detection tests translate to direct round-trip tests against #2818's struct-field approach).
  2. Close #2806 once #2818 lands, OR contribute the tool-call-turn coverage on top of #2818 if @CodeGhost21's #2817 doesn't already handle it.

Thanks again for jumping on this — first contribution and you correctly identified the root cause. Coordination is the main blocker here, not your code.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root cause is correctly identified and the test coverage is solid. Two architectural issues make this not mergeable as-is, and there's a coordination question worth resolving first.

What it does: Encodes reasoning_content alongside content as a JSON blob in the assistant message text field, then detects and unpacks that blob on the next turn to reconstruct the NativeMessage with reasoning_content set. Fixes the 400 from DeepSeek-R1 / Qwen3 on multi-turn conversations.

Breaking risk: Medium. The JSON-blob encoding changes the shape of data stored in assistant message text fields — anything that consumes ProviderChatResponse::text downstream (UI, summarizer, exports) will now see raw JSON for thinking-model turns.

Security risk: Zero.

Bottom line: Not safe to merge — JSON-blob in text field leaks to unrelated consumers, and the tool-call gate silently breaks agentic flows.


Note on coordination: there is an open PR targeting the same bug with a struct-level fix — adds reasoning_content: Option<String> to ChatResponse and threads it through the stack without touching the text field. Your tests (the round-trip test especially) are the most valuable part of this PR and translate directly to tests for that approach. Worth looking at whether you'd like to rebase your test cases onto that branch.

Comment thread src/openhuman/inference/provider/compatible.rs
Comment thread src/openhuman/inference/provider/compatible.rs
Comment thread src/openhuman/inference/provider/compatible.rs
Copy link
Copy Markdown
Contributor Author

@staimoorulhassan staimoorulhassan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes confirmed

@sanil-23
Copy link
Copy Markdown
Contributor

Closing as superseded by #2818 (merged), which landed an architectural fix for the same reasoning_content multi-turn issue (#2800). Your patch on compatible.rs and the merged fix overlap on the same code paths, and the PR is now DIRTY against main after #2818's merge.

Thanks for the contribution — your investigation helped surface the bug. If there's residual coverage gap or test case from your branch you'd like upstreamed, please open a follow-up PR cherry-picking just that piece on top of current main.

@sanil-23 sanil-23 closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thinking model reasoning_content not passed back in multi-turn conversations

4 participants