feat(platform): structured AI responses with section markers and timeout budget chain#496
Conversation
…kers Introduce marker-based structured responses that allow the AI to emit section markers ([[CONCLUSION]], [[KEY_POINTS]], [[DETAILS]], [[QUESTIONS]], [[NEXT_STEPS]]) which the frontend parses and renders with rich UI — highlighted conclusions, collapsible details, clickable follow-up buttons, and independent per-section streaming animation.
Update DetailsSection to dynamically switch between "Show details" and "Hide details" based on open state. Fix generate_response to use agentInstructions instead of undefined instructions variable for context window metadata.
…llback Propagate an absolute deadline through the full agent chain (chat → sub-agent → operator) so each layer knows its remaining time budget. Sub-agent tools now check budget before starting and refuse when time is exhausted. On the operator side, when the browser agent times out without producing a text response, a Phase 2 map-reduce summarization step synthesizes collected page content into a coherent answer using a fast LLM model. Navigation is also capped at 15 page visits. Timeout budget chain: operator 180s → client 240s → web sub-agent 300s → chat 420s.
Greptile SummaryThis PR introduces three major enhancements: structured AI responses with section markers parsed into rich UI components, timeout budget chain that propagates an absolute deadline through the full agent chain (chat → sub-agent → operator) to prevent premature timeouts, and Phase 2 summarization fallback that synthesizes collected page content using map-reduce when browser operations time out without producing text. The implementation is comprehensive with strong test coverage and well-architected timeout handling with recovery mechanisms. Key changes:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| services/platform/convex/lib/agent_response/with_timeout.ts | New timeout utility with Promise.race-based abort controller, clean implementation |
| services/platform/convex/agent_tools/sub_agents/helpers/check_budget.ts | Budget checking for sub-agents, validates remaining time before starting operations |
| services/platform/convex/lib/agent_response/generate_response.ts | Core timeout handling with recovery, deadline propagation, and structured response support |
| services/operator/app/services/browser_service.py | Phase 2 map-reduce summarization, page content capture, navigation limits added |
| services/platform/lib/utils/marker-parser.ts | Robust marker parser with partial marker buffering during streaming, well-tested |
| services/platform/app/features/chat/components/structured-message/structured-message.tsx | Orchestrates structured section rendering with independent typewriter animations per section |
| services/platform/convex/lib/context_management/constants.ts | Timeout constants added for each agent type in the budget chain hierarchy |
| services/platform/convex/lib/agent_chat/start_agent_chat.ts | Deadline initialization at chat start, propagates absolute timestamp through chain |
Sequence Diagram
sequenceDiagram
participant User
participant ChatUI
participant startAgentChat
participant generateResponse
participant SubAgent
participant BrowserOperator
participant Phase2
User->>ChatUI: Send message
ChatUI->>startAgentChat: Start chat (compute deadline)
Note over startAgentChat: deadline = now() + 420s
startAgentChat->>generateResponse: Generate response (deadline=420s)
Note over generateResponse: Set variables.actionDeadlineMs<br/>Add structured markers to prompt
alt Sub-agent tool called
generateResponse->>SubAgent: Invoke sub-agent tool
SubAgent->>SubAgent: checkBudget()
Note over SubAgent: Verify remaining > 60s<br/>Subtract 30s buffer
SubAgent->>generateResponse: Execute with deadline=390s
end
alt Browser operation
generateResponse->>BrowserOperator: browserOperate()
Note over BrowserOperator: Client timeout: 240s<br/>Operator timeout: 180s
BrowserOperator->>BrowserOperator: Phase 1 browsing<br/>(max 15 pages)
alt Phase 1 timeout without text
BrowserOperator->>Phase2: Summarize page content
Note over Phase2: Map-reduce summarization<br/>Parallel chunk summaries<br/>Final synthesis
Phase2-->>BrowserOperator: Generated summary
end
BrowserOperator-->>generateResponse: Response with text
end
alt Generation timeout
generateResponse->>generateResponse: withTimeout() triggers
Note over generateResponse: AbortController.abort()<br/>Catch AgentTimeoutError
generateResponse->>generateResponse: Recovery generation<br/>(60s, no tools)
Note over generateResponse: Use accumulated tool results
end
generateResponse->>ChatUI: Parse response for markers
Note over ChatUI: [[CONCLUSION]]<br/>[[KEY_POINTS]]<br/>[[DETAILS]]<br/>[[QUESTIONS]]<br/>[[NEXT_STEPS]]
ChatUI->>User: Render structured sections<br/>with independent animations
Last reviewed commit: c263a0a
📝 WalkthroughWalkthroughThis PR implements Phase 2 content summarization for browser operations, structured response formatting with marker-based UI sections, deadline-aware execution across agent pipelines, and enhanced timeout recovery mechanisms. Key additions include: browser_service Phase 2 map-reduce summarization for accumulated page content; structured message rendering components (CONCLUSION, KEY_POINTS, DETAILS, QUESTIONS, NEXT_STEPS) with marker parsing; deadlineMs propagation through agent generation layers; timeout utilities and recovery flows on agent generation timeout; configuration adjustments (operator timeout 270→180 seconds, openai_fast_model field); and follow-up item extraction utilities. Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
services/platform/convex/agents/web/generate_response.ts (1)
37-37: 🧹 Nitpick | 🔵 TrivialConsider explicit validation like document agent.
Using
|| ''for a missing model will cause a downstream failure with a less clear error message. The document agent throws explicitly with a descriptive message, which aids debugging.♻️ Proposed fix to align with document agent
export async function generateWebResponse( args: GenerateWebResponseArgs, ): Promise<GenerateWebResponseResult> { + const model = process.env.OPENAI_FAST_MODEL; + if (!model) { + throw new Error('OPENAI_FAST_MODEL environment variable is not configured'); + } + return generateAgentResponse( { agentType: 'web', createAgent: createWebAgent, - model: process.env.OPENAI_FAST_MODEL || '', + model, provider: 'openai', debugTag: '[WebAgent]', instructions: WEB_AGENT_INSTRUCTIONS, }, args, ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agents/web/generate_response.ts` at line 37, The config uses a fallback "model: process.env.OPENAI_FAST_MODEL || ''" which yields an opaque downstream failure; instead validate OPENAI_FAST_MODEL explicitly (as the document agent does) and throw a clear error when missing. Locate the model assignment in generate_response.ts (the object containing model: process.env.OPENAI_FAST_MODEL) and replace the silent fallback with an explicit check that throws a descriptive Error (e.g., "OPENAI_FAST_MODEL is not set; please configure the model for the web agent") before proceeding, ensuring callers get an immediate, helpful message.services/platform/convex/agents/document/generate_response.ts (1)
33-36: 🛠️ Refactor suggestion | 🟠 MajorAdopt centralized model configuration pattern across all agents.
The document and integration agents throw if
OPENAI_FAST_MODELis missing, while web and crm agents silently fall back to an empty string. This inconsistency risks different failure modes. More importantly, agents should source runtime configuration viagetDefaultAgentRuntimeConfig()(as the chat agent does) rather than readingprocess.envdirectly. This centralizes configuration and ensures consistent handling across all agents.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agents/document/generate_response.ts` around lines 33 - 36, The code currently reads OPENAI_FAST_MODEL directly and throws if missing; instead, use the centralized runtime config helper getDefaultAgentRuntimeConfig() (the same pattern as the chat agent) to obtain the model/fast model setting for generate_response.ts, e.g., call getDefaultAgentRuntimeConfig() and pull the appropriate model field instead of process.env.OPENAI_FAST_MODEL, remove the direct throw, and ensure the agent follows the same fallback/validation behavior as other agents (silently default or surface a consistent error) so all agents handle missing model config uniformly.services/platform/convex/lib/agent_response/generate_response.ts (1)
686-694:⚠️ Potential issue | 🟡 MinorToken estimation underestimates for streaming agents.
Line 686 uses
instructionsfor token estimation, but for streaming agents, the actual system prompt includesSTRUCTURED_RESPONSE_INSTRUCTIONS(appended at line 228). This underestimates the token count by the size of the structured response instructions (~300 tokens).🔧 Proposed fix for accurate token estimation
- const systemPromptTokens = instructions ? estimateTokens(instructions) : 0; + const systemPromptTokens = agentInstructions ? estimateTokens(agentInstructions) : 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/lib/agent_response/generate_response.ts` around lines 686 - 694, The token estimate currently uses only instructions and toolsSummary to compute contextStats.totalTokens, which undercounts when the streaming agent appends STRUCTURED_RESPONSE_INSTRUCTIONS; update the token calculation in generate_response (where structuredThreadContext.stats and estimateTokens are used) to add estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS) when the agent is a streaming agent (or whenever STRUCTURED_RESPONSE_INSTRUCTIONS is appended), e.g., compute systemPromptTokens = estimateTokens(instructions) + (isStreamingAgent ? estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS) : 0) and use that in contextStats.totalTokens so the total includes the structured response instructions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@services/operator/app/services/browser_service.py`:
- Around line 588-612: The code conflates navigation-limit termination with an
actual timeout by setting timed_out = True when nav_terminated is true;
introduce a separate boolean (e.g., terminated_early or nav_terminated_early)
and set that when accumulator.should_terminate triggers, keep timed_out only for
real TimeoutError cases, and update subsequent logging/response logic to use the
new flag (referencing nav_terminated, timed_out, accumulator,
MAX_NAVIGATION_COUNT, workspace_dir) so the response semantics distinguish
timeout vs early navigation termination.
- Around line 310-339: In _call_llm, add defensive parsing and clearer error
logging around the LLM response: after response.json() assign to result and
validate that result is a dict, has a non-empty "choices" list, that choices[0]
is a dict containing "message" which itself contains "content"; if any of these
checks fail, log a descriptive error including the raw result (or its truncated
string) and return None. Also catch specific parsing issues
(KeyError/IndexError/TypeError, ValueError) to produce a targeted log like
"Phase 2 LLM parse error: ..." instead of the generic exception message, while
preserving the existing timeout handling and final generic exception catch as a
fallback.
In `@services/operator/tests/test_phase2_summarization.py`:
- Line 3: Remove the unused top-level import "import json" from the test module
(the import statement in test_phase2_summarization.py); simply delete the unused
import so the file no longer imports json that isn't referenced anywhere in
functions or tests such as those around Phase2 summarization helpers.
In
`@services/platform/app/features/chat/components/structured-message/section-renderers.tsx`:
- Around line 166-198: NextStepsSection renders follow-up buttons even when
onSendFollowUp is undefined; update the component (NextStepsSection) to return
null early when onSendFollowUp is not provided (e.g., add a guard like if
(!onSendFollowUp) return null) so parseFollowUpItems(content) and the mapped
Button elements are only rendered when a handler exists.
- Around line 105-115: The <details> element is forced open by the hardcoded
open attribute so toggles don't persist; change it to be controlled by the
component state by replacing the literal open with open={isOpen} (or, if you
want uncontrolled behavior, use defaultOpen={isOpen} and remove onToggle), and
ensure the existing handleToggle updates the isOpen state; keep the
aria-label/summary logic that uses isOpen so the label stays in sync.
In `@services/platform/convex/agent_tools/sub_agents/helpers/check_budget.ts`:
- Around line 27-37: The minimum budget check currently compares remainingMs to
MIN_TOOL_BUDGET_MS but only subtracts SUB_AGENT_BUFFER_MS when returning
deadlineMs, which can allow starting a sub-agent with less than the intended
minimum; adjust the logic in the check (using remainingMs, MIN_TOOL_BUDGET_MS,
SUB_AGENT_BUFFER_MS and deadline) to subtract SUB_AGENT_BUFFER_MS from
remainingMs first (e.g., effectiveRemaining = remainingMs - SUB_AGENT_BUFFER_MS)
and then enforce effectiveRemaining >= MIN_TOOL_BUDGET_MS, returning deadlineMs
based on deadline - SUB_AGENT_BUFFER_MS when OK.
- Around line 21-24: The current logic only converts string actionDeadlineMs to
a number and ignores numeric values; update the handling around
ctx.variables?.actionDeadlineMs (raw) so that if raw is a number you use it
directly (e.g., typeof raw === 'number' ? raw : typeof raw === 'string' ?
Number(raw) : undefined), then keep the existing Number.isFinite(deadline) check
and return behavior; update the variables named raw and deadline in
check_budget.ts accordingly to ensure numeric inputs are accepted defensively.
In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 524-553: The empty-text retry path loses timeout handling and
response metadata like steps/usage—fix retryAgent.generateText call to use the
same timeout and context/storage options as the tool-result retry (e.g., pass
the timeout wrapper or timeout option and matching storageOptions), and ensure
the final result preserves/merges steps and usage correctly (merge
retryResult.steps with existing result.steps or append retry steps rather than
overwriting), keep finishReason coming from retryResult, and maintain debugLog
usage; update references in this block (retryAgent.generateText, retryResult,
result, mergeUsage, emptyRetrySystemPrompt, promptMessage, subAgentContext) to
mirror the tool-result retry logic for consistency.
- Around line 464-493: The retry path currently calls retryAgent.generateText
without a timeout and then overwrites result without preserving the response
metadata; wrap the retryAgent.generateText call in the same timeout mechanism
used elsewhere (apply the existing timeout helper used for streaming retries) to
abort on timeout, and when merging the retry output into result (the block that
sets result = { text: ..., steps: ..., usage: ..., finishReason: ... }),
include/preserve the retryResult.response (and any existing result.response) so
result.response is not lost (ensure mergeUsage stays as-is and debugLog remains
unchanged); update references to retryAgent.generateText, the result assignment
block, and merge logic so actualModel (which reads result.response later) will
be available.
- Around line 599-619: The recovery path uses withTimeout around
recoveryAgent.generateText but doesn't pass an AbortSignal, so the LLM call can
continue after timeout; create an AbortController (e.g., recoveryAbort), pass
recoveryAbort.signal into recoveryAgent.generateText via the same options shape
used in the main paths, and ensure withTimeout will call recoveryAbort.abort()
on timeout (or wire the abort into withTimeout) so the in-flight request is
cancelled when RECOVERY_TIMEOUT_MS elapses; update the recovery call sites (the
withTimeout wrapping recoveryAgent.generateText for
contextWithOrg/threadId/userId and the storage/contextOptions) to include the
abortSignal.
- Around line 351-370: The retryAgent.generateText call lacks timeout
protection; wrap the call to retryAgent.generateText (the block that assigns
retryResult) with the existing withTimeout helper and use the remaining time
budget variable (or compute remainingMs from the current request
timeout/timeoutBudget) so the retry cannot hang indefinitely; ensure you
propagate or handle the timeout error the same way as other generateText calls
so retryResult is either returned or fails fast when the remaining timeout
elapses.
In `@services/platform/lib/utils/marker-parser.ts`:
- Around line 144-145: The file exports MARKERS, MarkerType, and
MarkerParseResult but knip flags them as unused; update exports to match actual
consumers by removing the unused exports (MARKERS, MarkerType,
MarkerParseResult) from the export list and keep only parseMarkers and
ParsedSection, or alternatively add real consumers if they must remain public
(e.g., import them in structured-message.tsx or another module); ensure the
changed export statement references only the symbols in use (parseMarkers and
ParsedSection) or document/annotate the API if you intentionally keep the
others.
---
Outside diff comments:
In `@services/platform/convex/agents/document/generate_response.ts`:
- Around line 33-36: The code currently reads OPENAI_FAST_MODEL directly and
throws if missing; instead, use the centralized runtime config helper
getDefaultAgentRuntimeConfig() (the same pattern as the chat agent) to obtain
the model/fast model setting for generate_response.ts, e.g., call
getDefaultAgentRuntimeConfig() and pull the appropriate model field instead of
process.env.OPENAI_FAST_MODEL, remove the direct throw, and ensure the agent
follows the same fallback/validation behavior as other agents (silently default
or surface a consistent error) so all agents handle missing model config
uniformly.
In `@services/platform/convex/agents/web/generate_response.ts`:
- Line 37: The config uses a fallback "model: process.env.OPENAI_FAST_MODEL ||
''" which yields an opaque downstream failure; instead validate
OPENAI_FAST_MODEL explicitly (as the document agent does) and throw a clear
error when missing. Locate the model assignment in generate_response.ts (the
object containing model: process.env.OPENAI_FAST_MODEL) and replace the silent
fallback with an explicit check that throws a descriptive Error (e.g.,
"OPENAI_FAST_MODEL is not set; please configure the model for the web agent")
before proceeding, ensuring callers get an immediate, helpful message.
In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 686-694: The token estimate currently uses only instructions and
toolsSummary to compute contextStats.totalTokens, which undercounts when the
streaming agent appends STRUCTURED_RESPONSE_INSTRUCTIONS; update the token
calculation in generate_response (where structuredThreadContext.stats and
estimateTokens are used) to add estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS)
when the agent is a streaming agent (or whenever
STRUCTURED_RESPONSE_INSTRUCTIONS is appended), e.g., compute systemPromptTokens
= estimateTokens(instructions) + (isStreamingAgent ?
estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS) : 0) and use that in
contextStats.totalTokens so the total includes the structured response
instructions.
- Wrap long lines in operator synthesis prompt to stay under 120 char limit - Run ruff format on browser_service.py - Remove unused exports (MARKERS, MarkerType, MarkerParseResult) from marker-parser.ts - Fix unnecessary regex escape in parse-follow-up-items.ts
… response messages
The non-streaming tool-result and empty-text retry paths were missing the `response` field, causing `actualModel` to be undefined after retries.
…out budget chain (#496)
Summary
[[CONCLUSION]],[[KEY_POINTS]],[[DETAILS]],[[QUESTIONS]],[[NEXT_STEPS]]) that the frontend parses into rich UI — highlighted conclusions, collapsible details, clickable follow-up buttons, and independent per-section streaming animation.Test plan
marker-parser.test.ts)parse-follow-up-items.test.ts)test_output_accumulator.py)test_phase2_summarization.py)🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Improvements