Skip to content

fix(chat): stop duplicating assistant replies on multi-segment turns#1648

Merged
graycyrus merged 1 commit into
tinyhumansai:mainfrom
sanil-23:debug/double-messages-chat
May 13, 2026
Merged

fix(chat): stop duplicating assistant replies on multi-segment turns#1648
graycyrus merged 1 commit into
tinyhumansai:mainfrom
sanil-23:debug/double-messages-chat

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented May 13, 2026

Summary

  • Stop duplicating every multi-paragraph assistant reply in the chat UI.
  • Replace a content-equality check in segment reconciliation with a count + index-presence check.
  • Update the "out-of-order full_response" unit test (which had asserted the buggy behaviour) and add a regression test for a genuinely missing segment.

Problem

When the server splits a long assistant reply into multiple bubbles via presentation.rs::segment_for_delivery, the client receives N chat_segment events followed by a chat_done. Each segment is persisted as its own message in onSegment. onDone then runs a "did all segments arrive?" check and, if not, falls back to appending chat_done.full_response as one more message.

That fallback check used:

return reconstructed === event.full_response;

where reconstructed is the received segments joined with no separator. The server-side segmenter .trim()s each segment and joins paragraphs with a normalised \n\n, while chat_done.full_response ships the raw LLM text (leading/trailing whitespace, original separators). The strings basically never matched — so the fallback fired on every multi-segment turn and produced N segment bubbles + one duplicate full-text bubble. The duplicates were persisted to the backend via threadApi.appendMessage, so they stick around in thread history on reload.

The bug only fires when segmentation kicks in (response ≥ 80 chars, no code fences, not predominantly list/table content), so short replies and structured responses were unaffected — easy to miss in unit tests that used clean concatenable inputs.

Solution

Trust the count, not the content. Delivery is complete iff every expected segment_index arrived:

  • delivery.segments.size === segment_total
  • every index in [0, segment_total) is present in the Map

Per-segment dedup already runs at the cache key segment:${thread}:${request}:${segment_index}, so the Map can only reach size segment_total when all distinct indices have been seen exactly once. The content-equality check added nothing except a guaranteed false-negative because of the lossy server-side normalisation.

Reconciliation still fires when a segment_index is genuinely missing (socket drop / partial delivery), which is the case it was designed for — covered by a new test.

Added a leading comment to hasCompleteSegmentDelivery explaining why the equality check was removed, so it doesn't get reintroduced.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
  • N/A: behaviour-only change — no new/removed/renamed feature rows for docs/TEST-COVERAGE-MATRIX.md
  • N/A: behaviour-only change, no matrix feature IDs affected
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: touches chat-runtime client glue, no new surface in docs/RELEASE-MANUAL-SMOKE.md — existing "send a chat message" smoke covers it
  • N/A: no linked issue (regression caught during live triage)

Impact

  • Desktop: every multi-paragraph assistant reply previously produced one redundant bubble persisted to backend; now produces exactly N segment bubbles.
  • No protocol change — purely a client-side check tightening.
  • No migration: existing duplicate messages already persisted in user threads stay until manually deleted; future turns are clean.
  • Performance: one fewer appendMessage RPC + one fewer Redux dispatch per affected turn.
  • Backwards compatibility: the genuine "segment dropped on the wire" recovery path still works (covered by new test).

Related

  • Closes: no GitHub issue — regression found during live debugging session
  • Follow-up PR(s)/TODOs: scripts/run-dev-win.sh has two unrelated Windows build-tooling bugs (greedy-regex SDK detection, leaked VSINSTALLDIR causing Generator Ninja does not support instance specification on fresh worktrees) — separate PR.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: debug/double-messages-chat
  • Commit SHA: 37b9cd4667c98046ddb80cd95756f141f87ab243

Validation Run

  • pnpm --filter openhuman-app format:check
  • pnpm typecheck
  • Focused tests: pnpm --filter openhuman-app test:unit src/providers/__tests__/ChatRuntimeProvider.test.tsx — 22/22 passed
  • N/A: no Rust changes
  • N/A: no Tauri shell changes

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: multi-segment assistant replies no longer produce a duplicate full-response bubble.
  • User-visible effect: in the chat list, an assistant reply that previously rendered as e.g. 2 segment bubbles followed by a 3rd duplicate-content bubble now renders as exactly 2 bubbles.

Parity Contract

  • Legacy behavior preserved: the genuine missing-segment recovery path (when a segment_index never arrives) is unchanged — onDone still appends full_response in that case. Covered by the new 'reconciles when a segment is missing' test.
  • Guard/fallback/dispatch parity checks: !event.segment_total branch (single-bubble path) untouched; onSegment per-segment dedup untouched; segment-delivery TTL/eviction untouched.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none
  • Canonical PR: N/A
  • Resolution (closed/superseded/updated): N/A

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved chat streaming completion detection to prevent unnecessary reconciliation when segment delivery is complete, even if response formatting differs.
  • Tests

    • Updated test coverage for streaming chat scenarios, including regression tests for segment delivery verification and reconciliation behavior.

Review Change Stack

The reconciliation path in ChatRuntimeProvider treated a segmented turn as
"incomplete" whenever the concatenation of received segments did not byte-
for-byte equal chat_done.full_response, then appended full_response as an
extra assistant message. That equality essentially never held in practice
— the server-side segmenter trims each segment and normalises paragraph
breaks to "\n\n" (presentation.rs::segment_for_delivery), while
chat_done.full_response ships the raw, untrimmed LLM text. So every
multi-paragraph reply produced N segment bubbles + one duplicate full-text
bubble.

Trust the count instead of content: delivery is complete iff every
expected segment_index arrived. Per-segment dedup (markChatEventSeen on
segment:thread:request:index) already guarantees Map size = expected only
when all distinct indices have been seen, so the count + index-presence
check is sufficient. The reconciliation path still fires when a
segment_index is genuinely missing, which is what it was meant to cover.

Updated the "out-of-order full_response" test (which asserted the buggy
content-equality behaviour) to assert the new contract, and added a
regression test that exercises a missing segment_index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 requested a review from a team May 13, 2026 14:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6212c8ad-cb88-4ef3-a2b7-2b239473ac43

📥 Commits

Reviewing files that changed from the base of the PR and between 9160317 and 37b9cd4.

📒 Files selected for processing (2)
  • app/src/providers/ChatRuntimeProvider.tsx
  • app/src/providers/__tests__/ChatRuntimeProvider.test.tsx

📝 Walkthrough

Walkthrough

ChatRuntimeProvider's segment delivery completion check is refactored to verify that all expected segment indices (0 through segment_total - 1) have been received, rather than reconstructing the full response and byte-comparing it against event.full_response. Tests add a regression case and update assertions to confirm reconciliation is skipped when segments are complete, and to validate reconciliation behavior when segments are missing.

Changes

Segment Completion Verification

Layer / File(s) Summary
Segment index coverage verification
app/src/providers/ChatRuntimeProvider.tsx
hasCompleteSegmentDelivery now checks that all expected segment_index values (0 to segment_total - 1) are present in the delivery, rather than reconstructing the full response and byte-comparing it to event.full_response. Comments explain that segment trimming and joiner normalization make byte-equality unreliable.
Segment delivery and reconciliation tests
app/src/providers/__tests__/ChatRuntimeProvider.test.tsx
A regression test verifies reconciliation does not occur when all segments arrive, even if chat_done.full_response formatting differs. The missing-segment test is updated to ensure reconciliation occurs only after chat_done when segments are incomplete, producing the full joined message content with the agent sender.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1469: Both PRs modify ChatRuntimeProvider's segmented-response reconciliation logic and update tests to cover reconciliation behavior when all expected segments have arrived despite formatting differences.
  • tinyhumansai/openhuman#1051: Directly related refactor to stop byte-equality-based reconciliation in favor of segment-index coverage verification to handle joiner/trim formatting divergence in full_response.
  • tinyhumansai/openhuman#1261: Both PRs modify ChatRuntimeProvider segment completion logic and update tests to avoid unnecessary reconciliation based on segment presence verification.

Suggested reviewers

  • senamakel

Poem

🐰 A rabbit hops through segments bright,
No longer haunted by byte-equal plight!
Coverage counts, not string comparison's way,
Streaming flows smoothly, hooray hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(chat): stop duplicating assistant replies on multi-segment turns' directly and clearly summarizes the main change: fixing duplicate assistant message replies that occur during multi-segment chat responses.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review — fix(chat): stop duplicating assistant replies on multi-segment turns

Walkthrough

This PR fixes a genuine product bug: every assistant reply long enough to trigger the Rust-side segmenter produced a spurious extra bubble in the chat UI. The root cause was a byte-equality check in hasCompleteSegmentDelivery that compared client-reconstructed segment text against full_response. Because segment_for_delivery in presentation.rs trims each segment and joins on \n\n while full_response preserves raw LLM whitespace, the strings almost never matched — so the reconciliation path appended full_response as a new message on nearly every multi-segment turn, and that message was persisted to the backend via appendMessage, meaning duplicates survived page reload.

The fix is correct and minimal: replace the string-equality gate with a count + index-presence loop. The genuine missing-segment recovery (socket drop / partial delivery) is preserved and now has its own test.

Changes

File Summary
app/src/providers/ChatRuntimeProvider.tsx hasCompleteSegmentDelivery: drop reconstructed-string concatenation and === full_response equality check; replace with count guard + 0..N-1 index-presence loop. Add 6-line comment explaining removal.
app/src/providers/__tests__/ChatRuntimeProvider.test.tsx Rename and invert old "out-of-order" test → regression test asserting reconciliation does NOT fire. Add new 'reconciles when a segment is missing' test for genuine drop scenario.

Actionable comments

[minor] ChatRuntimeProvider.tsx:136segments.size < expected guard is now redundant

The early-exit if (delivery.segments.size < expected) return false was load-bearing when the function also relied on the reconstructed string — it short-circuited an expensive equality check. Now that the body is a loop that returns false on the first missing index, this guard adds no new information. Consider labeling it as a fast-path optimization:

// Fast path: if fewer entries than expected, at least one index is absent.
if (delivery.segments.size < expected) return false;

[minor] ChatRuntimeProvider.tsx:771 — no debug log when segment delivery is complete (happy path)

The chat_done_segment_reconcile log fires on reconciliation. There is no corresponding log when completeSegmentDelivery === true. Before this fix, that branch was never hit for multi-segment turns; now it will be the dominant case. Per the project's debug-logging requirement, consider:

if (completeSegmentDelivery) {
  rtLog('chat_done_segment_complete', {
    thread: event.thread_id,
    request: event.request_id,
    segments: event.segment_total,
  });
}

[minor] Test: 'reconciles when a segment is missing' — intermediate assertion could be more specific

The await waitFor(() => expect(threadApi.appendMessage).toHaveBeenCalledTimes(1)) before onDone asserts count but not content. Making it content-aware would strengthen the test:

await waitFor(() =>
  expect(threadApi.appendMessage).toHaveBeenCalledWith(
    't-missing',
    expect.objectContaining({ content: 'Part one.', sender: 'agent' })
  )
);

Verified / looks good

  • segment_for_delivery in presentation.rs confirms the PR description: it calls trim() on each paragraph during the split, so byte-equality was a guaranteed false-negative.
  • takeSegmentDelivery removes the delivery from the map before hasCompleteSegmentDelivery is called — no double-fire risk.
  • The segment:${thread}:${request}:${segment_index} dedupe key ensures delivery.segments.size can only reach segment_total when all distinct indices arrived exactly once. The count guard is therefore sufficient.
  • No dynamic imports, no direct import.meta.env, no window.__TAURI__ checks — all clean.
  • Test data uses generic IDs (t-trim, r-missing) — no hardcoded real names/emails.
  • Coverage gate passed. All 22 tests pass.

CI note

The only CI failure is "PR Submission Checklist" — 6 N/A items need [x] marking. Not a code issue.

Overall: clean, well-scoped fix with good test coverage. All comments are minor — nothing blocking merge.

@graycyrus graycyrus merged commit bddfbb1 into tinyhumansai:main May 13, 2026
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants