-
Notifications
You must be signed in to change notification settings - Fork 88
fix: prevent duplicate output for streaming tools in subagents (#898) #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughSuppress final onMessage for tool calls that stream live output (tool.canUpdateOutput = true); add tests covering streaming, non‑streaming, error, and undefined‑tool cases; adjust scheduler to update UI locally and avoid invoking external output handler to prevent duplicate renders. Changes
Sequence Diagram(s)sequenceDiagram
participant SubAgent
participant Core as Core (subagent.ts)
participant UI as UI Scheduler (useReactToolScheduler)
participant Ext as External Handler
SubAgent->>Core: deliver completed tool call (display, status, tool.canUpdateOutput)
Core->>Core: compute toolCanUpdateOutput = (status == "success" && tool?.canUpdateOutput == true)
alt toolCanUpdateOutput == true (tool streams live)
Core--x UI: do NOT call onMessage(final display)
UI->>UI: updateToolCallOutput via streaming updates
UI--x Ext: do NOT invoke external _outputUpdateHandler (avoid duplicate render)
else toolCanUpdateOutput == false (no streaming)
Core->>UI: call onMessage with final display text
end
opt error response
Core->>UI: call onMessage with error text
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.{ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📓 Common learnings📚 Learning: 2025-11-16T22:51:26.374ZApplied to files:
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I'll start by examining the context of this PR to understand the issue, then analyze the proposed solution. Let me find the exact buildPartsFromCompletedCalls method that's being modified: Let me look for the LLxprt PR Review – PR #905Issue Alignment
Side Effects
Code Quality
Tests & Coverage
Verdict
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/core/src/core/subagent.test.ts(1 hunks)packages/core/src/core/subagent.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Don't useany- Always specify proper types. Useunknownif the type is truly unknown and add proper type guards.
Do not useconsole.logorconsole.debug- Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings aboutanytypes
Files:
packages/core/src/core/subagent.test.tspackages/core/src/core/subagent.ts
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/core/subagent.test.tspackages/core/src/core/subagent.ts
📚 Learning: 2025-12-05T14:59:34.368Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-05T14:59:34.368Z
Learning: Use subagents (general-purpose, typescript-code-reviewer, or typescript-coder) for complex multi-step tasks, research, complex implementations, code review, and extensive file searching
Applied to files:
packages/core/src/core/subagent.test.ts
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: E2E Test (macOS)
- GitHub Check: Slow E2E - Win
🔇 Additional comments (1)
packages/core/src/core/subagent.ts (1)
1510-1524: LGTM! Deduplication logic correctly prevents duplicate output for streaming tools.The implementation properly addresses issue #898 by suppressing
onMessagecalls for successful tool executions wherecanUpdateOutput=true, while preserving error display. The logic handles edge cases gracefully (undefined tool, empty display, missing onMessage callback).Based on learnings, this aligns with the expected behavior: streaming tools provide real-time UI updates via
outputUpdateHandlerduring execution and produce one finalToolResponseBlock, so the post-completionresultDisplayshould not trigger an additionalonMessagecall.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-24.x-ubuntu-latest' artifact from the main CI run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/cli/src/ui/hooks/useReactToolScheduler.ts (1)
374-374: Use the sophisticated logging system instead ofconsole.warn.The coding guidelines specify that
console.logandconsole.debugshould not be used. Whileconsole.warnisn't explicitly listed, the guideline indicates that the sophisticated logging system should be used for all logging, with files written to~/.llxprt/debug/.🔎 Suggested refactor
Consider using a logger instance (similar to how it's used in other parts of the codebase) instead of
console.warn:- console.warn(`Unknown core status encountered: ${coreStatus}`); + // Use logger instance here, e.g.: + // logger.warn(() => `Unknown core status encountered: ${coreStatus}`);Note: You'll need access to a logger instance in this context.
As per coding guidelines.
♻️ Duplicate comments (1)
packages/core/src/core/subagent.test.ts (1)
1774-2067: Critical: MissingpromptConfigdefinition will cause test failures.All four tests reference
promptConfig(at lines 1810, 1878, 1946, and 2014), but this variable is never defined within the test suite scope. This will cause all four tests to fail with aReferenceError.🔎 Proposed fix
Add the missing
promptConfigconstant after the test suite declaration:describe('buildPartsFromCompletedCalls output deduplication', () => { + const promptConfig: PromptConfig = { systemPrompt: 'Execute task.' }; + beforeEach(() => {This issue was previously identified in an earlier review.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/cli/src/ui/hooks/useReactToolScheduler.ts(1 hunks)packages/core/src/core/subagent.test.ts(1 hunks)packages/core/src/core/subagent.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/core/src/core/subagent.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Don't useany- Always specify proper types. Useunknownif the type is truly unknown and add proper type guards.
Do not useconsole.logorconsole.debug- Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings aboutanytypes
Files:
packages/core/src/core/subagent.test.tspackages/cli/src/ui/hooks/useReactToolScheduler.ts
🧠 Learnings (3)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
📚 Learning: 2025-12-05T14:59:34.368Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-05T14:59:34.368Z
Learning: Use subagents (general-purpose, typescript-code-reviewer, or typescript-coder) for complex multi-step tasks, research, complex implementations, code review, and extensive file searching
Applied to files:
packages/core/src/core/subagent.test.ts
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Applied to files:
packages/core/src/core/subagent.test.ts
🧬 Code graph analysis (1)
packages/cli/src/ui/hooks/useReactToolScheduler.ts (1)
packages/core/src/core/coreToolScheduler.ts (1)
CoreToolScheduler(385-1569)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: Slow E2E - Win
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (macOS)
🔇 Additional comments (5)
packages/cli/src/ui/hooks/useReactToolScheduler.ts (1)
226-271: LGTM! Deduplication logic correctly prevents double-rendering.The changes properly address issue #898 by ensuring subagent tool outputs are not duplicated. The underscore prefix convention (
_outputUpdateHandler) and explanatory comments make the intent clear: only local UI updates viaupdateToolCallOutputare used, preventing the external handler from creating a second display.packages/core/src/core/subagent.test.ts (4)
1801-1867: Test logic correctly verifies deduplication for streaming tools.The test properly validates that tools with
canUpdateOutput=truedo not trigger duplicateonMessagecalls, directly addressing issue #898. The use of reflection to access the private method is acceptable for unit testing, and the mock setup correctly simulates a streaming tool scenario.Assuming the
promptConfigdefinition is added as noted in the previous comment.
1869-1935: Test correctly validates non-streaming tools still display output.This test ensures the deduplication fix doesn't inadvertently suppress output from tools that don't stream (
canUpdateOutput=false). The verification of both call count and message content provides comprehensive coverage.
1937-2003: Essential test for error visibility in streaming tools.This test verifies a critical edge case: even when a tool normally streams output (
canUpdateOutput=true), error messages must still be displayed viaonMessage. This ensures errors remain visible for debugging, which is crucial for maintaining system observability.
2005-2066: Good defensive test for undefined tool handling.This test ensures graceful degradation when tool metadata is unavailable. The three-part assertion (no throw, parts produced, error displayed) comprehensively validates that the system handles this edge case without crashing while still providing useful error information.
When tools with canUpdateOutput=true (like shell) were executed via subagents, their output was displayed twice: 1. Live streaming via outputUpdateHandler during execution 2. After completion via buildPartsFromCompletedCalls calling onMessage This fix addresses both sources of duplication: 1. In subagent.ts buildPartsFromCompletedCalls: Skip the post-completion onMessage call for successful tool calls where tool.canUpdateOutput=true, since that output was already streamed live. Error cases still display resultDisplay to ensure error messages are shown. 2. In useReactToolScheduler.ts createExternalScheduler: Don't call the outputUpdateHandler callback when updating the local UI state. The subagent's outputUpdateHandler calls onMessage which goes to task.updateOutput, creating a second display path. The local updateToolCallOutput handles the UI rendering for subagent tools. fixes #898
Summary
Fixes #898 - Shell tool output displays twice when run via subagents.
Problem
When tools with
canUpdateOutput=true(like the shell tool) were executed via subagents, their output was displayed twice:outputUpdateHandlerduring executionbuildPartsFromCompletedCallscallingonMessagewithresultDisplayThe user reported this appeared as "two running live" rather than a streaming-then-final display pattern.
Solution
Modified
buildPartsFromCompletedCallsinsubagent.tsto skip the post-completiononMessagecall for successful tool calls wheretool.canUpdateOutput === true, since that output was already streamed live via theoutputUpdateHandler.Error cases still display
resultDisplayto ensure error messages are shown to the user.Changes
call.status === 'success' && call.tool?.canUpdateOutput === truebefore callingonMessagewithresultDisplaybuildPartsFromCompletedCallsoutput deduplication behavior:canUpdateOutput=truedon't trigger duplicateonMessagecallscanUpdateOutput=falsestill displayresultDisplayTesting
node scripts/start.js --profile-load syntheticSummary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.