Skip to content

feat(platform): structured AI responses with section markers and timeout budget chain#496

Merged
larryro merged 15 commits into
mainfrom
feat/structured-ai-responses
Feb 19, 2026
Merged

feat(platform): structured AI responses with section markers and timeout budget chain#496
larryro merged 15 commits into
mainfrom
feat/structured-ai-responses

Conversation

@larryro
Copy link
Copy Markdown
Collaborator

@larryro larryro commented Feb 19, 2026

Summary

  • Structured AI responses: Introduce marker-based section rendering ([[CONCLUSION]], [[KEY_POINTS]], [[DETAILS]], [[QUESTIONS]], [[NEXT_STEPS]]) that the frontend parses into rich UI — highlighted conclusions, collapsible details, clickable follow-up buttons, and independent per-section streaming animation.
  • Timeout budget chain: Propagate an absolute deadline through the full agent chain (chat → sub-agent → operator) so each layer knows its remaining time budget. Sub-agent tools refuse to start when time is exhausted.
  • Phase 2 summarization fallback: When the browser agent times out without producing a text response, a map-reduce summarization step synthesizes collected page content into a coherent answer using a fast LLM model. Navigation is capped at 15 page visits.

Test plan

  • Verify structured markers render correctly in chat (conclusion highlight, collapsible details, follow-up buttons)
  • Confirm streaming animation works independently per section
  • Test timeout budget propagation across agent chain layers
  • Verify Phase 2 summarization triggers on operator timeout
  • Run marker-parser unit tests (marker-parser.test.ts)
  • Run parse-follow-up-items unit tests (parse-follow-up-items.test.ts)
  • Run operator output accumulator tests (test_output_accumulator.py)
  • Run Phase 2 summarization tests (test_phase2_summarization.py)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Structured response sections (conclusion, key points, details, questions, next steps) for better-organized AI responses
    • Suggested follow-up prompts in chat messages for quicker conversation continuation
    • Automatic content summarization for lengthy web pages during browsing operations
  • Improvements

    • Faster response times with optimized timeout windows
    • Enhanced recovery handling when operations approach time limits
    • Better aggregation of page content during web assistant tasks

…kers

Introduce marker-based structured responses that allow the AI to emit
section markers ([[CONCLUSION]], [[KEY_POINTS]], [[DETAILS]], [[QUESTIONS]],
[[NEXT_STEPS]]) which the frontend parses and renders with rich UI —
highlighted conclusions, collapsible details, clickable follow-up buttons,
and independent per-section streaming animation.
Update DetailsSection to dynamically switch between "Show details" and
"Hide details" based on open state. Fix generate_response to use
agentInstructions instead of undefined instructions variable for context
window metadata.
…llback

Propagate an absolute deadline through the full agent chain (chat → sub-agent → operator)
so each layer knows its remaining time budget. Sub-agent tools now check budget before
starting and refuse when time is exhausted.

On the operator side, when the browser agent times out without producing a text response,
a Phase 2 map-reduce summarization step synthesizes collected page content into a coherent
answer using a fast LLM model. Navigation is also capped at 15 page visits.

Timeout budget chain: operator 180s → client 240s → web sub-agent 300s → chat 420s.
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 19, 2026

Greptile Summary

This PR introduces three major enhancements: structured AI responses with section markers parsed into rich UI components, timeout budget chain that propagates an absolute deadline through the full agent chain (chat → sub-agent → operator) to prevent premature timeouts, and Phase 2 summarization fallback that synthesizes collected page content using map-reduce when browser operations time out without producing text. The implementation is comprehensive with strong test coverage and well-architected timeout handling with recovery mechanisms.

Key changes:

  • Timeout budget chain propagates actionDeadlineMs through agent layers with appropriate buffers (chat: 420s → web: 300s → browser: 240s/180s)
  • Sub-agent tools check remaining budget before starting operations (MIN_TOOL_BUDGET_MS = 60s)
  • withTimeout() utility wraps agent generation with AbortController to gracefully handle timeouts
  • Phase 2 map-reduce summarization triggers when browser times out without text (parallel chunk summaries → final synthesis)
  • Navigation capped at 15 pages to prevent endless browsing
  • Marker parser handles streaming with partial marker buffering ([[CONCLUSION]], [[KEY_POINTS]], [[DETAILS]], [[QUESTIONS]], [[NEXT_STEPS]])
  • Frontend parses markers and renders structured sections with independent typewriter animations
  • Recovery generation (60s) runs on timeout to synthesize available tool results

Confidence Score: 4/5

  • This PR is safe to merge with moderate risk - the implementation is solid but involves complex timeout orchestration across multiple layers
  • Score reflects strong code quality and comprehensive test coverage, but the complexity of timeout chain coordination and Phase 2 fallback logic introduces some risk. The timeout budget chain relies on accurate deadline propagation and buffer calculations across agent boundaries. Phase 2 summarization adds latency when browser operations timeout. Thorough testing of edge cases (nested timeouts, buffer exhaustion, partial failures) is recommended before production deployment.
  • Pay close attention to services/platform/convex/lib/agent_response/generate_response.ts for timeout error handling and recovery logic, and services/operator/app/services/browser_service.py for Phase 2 summarization triggering conditions

Important Files Changed

Filename Overview
services/platform/convex/lib/agent_response/with_timeout.ts New timeout utility with Promise.race-based abort controller, clean implementation
services/platform/convex/agent_tools/sub_agents/helpers/check_budget.ts Budget checking for sub-agents, validates remaining time before starting operations
services/platform/convex/lib/agent_response/generate_response.ts Core timeout handling with recovery, deadline propagation, and structured response support
services/operator/app/services/browser_service.py Phase 2 map-reduce summarization, page content capture, navigation limits added
services/platform/lib/utils/marker-parser.ts Robust marker parser with partial marker buffering during streaming, well-tested
services/platform/app/features/chat/components/structured-message/structured-message.tsx Orchestrates structured section rendering with independent typewriter animations per section
services/platform/convex/lib/context_management/constants.ts Timeout constants added for each agent type in the budget chain hierarchy
services/platform/convex/lib/agent_chat/start_agent_chat.ts Deadline initialization at chat start, propagates absolute timestamp through chain

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatUI
    participant startAgentChat
    participant generateResponse
    participant SubAgent
    participant BrowserOperator
    participant Phase2

    User->>ChatUI: Send message
    ChatUI->>startAgentChat: Start chat (compute deadline)
    Note over startAgentChat: deadline = now() + 420s
    startAgentChat->>generateResponse: Generate response (deadline=420s)
    Note over generateResponse: Set variables.actionDeadlineMs<br/>Add structured markers to prompt
    
    alt Sub-agent tool called
        generateResponse->>SubAgent: Invoke sub-agent tool
        SubAgent->>SubAgent: checkBudget()
        Note over SubAgent: Verify remaining > 60s<br/>Subtract 30s buffer
        SubAgent->>generateResponse: Execute with deadline=390s
    end
    
    alt Browser operation
        generateResponse->>BrowserOperator: browserOperate()
        Note over BrowserOperator: Client timeout: 240s<br/>Operator timeout: 180s
        BrowserOperator->>BrowserOperator: Phase 1 browsing<br/>(max 15 pages)
        
        alt Phase 1 timeout without text
            BrowserOperator->>Phase2: Summarize page content
            Note over Phase2: Map-reduce summarization<br/>Parallel chunk summaries<br/>Final synthesis
            Phase2-->>BrowserOperator: Generated summary
        end
        
        BrowserOperator-->>generateResponse: Response with text
    end
    
    alt Generation timeout
        generateResponse->>generateResponse: withTimeout() triggers
        Note over generateResponse: AbortController.abort()<br/>Catch AgentTimeoutError
        generateResponse->>generateResponse: Recovery generation<br/>(60s, no tools)
        Note over generateResponse: Use accumulated tool results
    end
    
    generateResponse->>ChatUI: Parse response for markers
    Note over ChatUI: [[CONCLUSION]]<br/>[[KEY_POINTS]]<br/>[[DETAILS]]<br/>[[QUESTIONS]]<br/>[[NEXT_STEPS]]
    ChatUI->>User: Render structured sections<br/>with independent animations
Loading

Last reviewed commit: c263a0a

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

This PR implements Phase 2 content summarization for browser operations, structured response formatting with marker-based UI sections, deadline-aware execution across agent pipelines, and enhanced timeout recovery mechanisms. Key additions include: browser_service Phase 2 map-reduce summarization for accumulated page content; structured message rendering components (CONCLUSION, KEY_POINTS, DETAILS, QUESTIONS, NEXT_STEPS) with marker parsing; deadlineMs propagation through agent generation layers; timeout utilities and recovery flows on agent generation timeout; configuration adjustments (operator timeout 270→180 seconds, openai_fast_model field); and follow-up item extraction utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(platform): structured AI responses with section markers and timeout budget chain' directly and comprehensively summarizes the main changes: introduction of structured AI response markers and timeout budget propagation across agent layers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/structured-ai-responses

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
services/platform/convex/agents/web/generate_response.ts (1)

37-37: 🧹 Nitpick | 🔵 Trivial

Consider explicit validation like document agent.

Using || '' for a missing model will cause a downstream failure with a less clear error message. The document agent throws explicitly with a descriptive message, which aids debugging.

♻️ Proposed fix to align with document agent
 export async function generateWebResponse(
   args: GenerateWebResponseArgs,
 ): Promise<GenerateWebResponseResult> {
+  const model = process.env.OPENAI_FAST_MODEL;
+  if (!model) {
+    throw new Error('OPENAI_FAST_MODEL environment variable is not configured');
+  }
+
   return generateAgentResponse(
     {
       agentType: 'web',
       createAgent: createWebAgent,
-      model: process.env.OPENAI_FAST_MODEL || '',
+      model,
       provider: 'openai',
       debugTag: '[WebAgent]',
       instructions: WEB_AGENT_INSTRUCTIONS,
     },
     args,
   );
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agents/web/generate_response.ts` at line 37, The
config uses a fallback "model: process.env.OPENAI_FAST_MODEL || ''" which yields
an opaque downstream failure; instead validate OPENAI_FAST_MODEL explicitly (as
the document agent does) and throw a clear error when missing. Locate the model
assignment in generate_response.ts (the object containing model:
process.env.OPENAI_FAST_MODEL) and replace the silent fallback with an explicit
check that throws a descriptive Error (e.g., "OPENAI_FAST_MODEL is not set;
please configure the model for the web agent") before proceeding, ensuring
callers get an immediate, helpful message.
services/platform/convex/agents/document/generate_response.ts (1)

33-36: 🛠️ Refactor suggestion | 🟠 Major

Adopt centralized model configuration pattern across all agents.

The document and integration agents throw if OPENAI_FAST_MODEL is missing, while web and crm agents silently fall back to an empty string. This inconsistency risks different failure modes. More importantly, agents should source runtime configuration via getDefaultAgentRuntimeConfig() (as the chat agent does) rather than reading process.env directly. This centralizes configuration and ensures consistent handling across all agents.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agents/document/generate_response.ts` around lines
33 - 36, The code currently reads OPENAI_FAST_MODEL directly and throws if
missing; instead, use the centralized runtime config helper
getDefaultAgentRuntimeConfig() (the same pattern as the chat agent) to obtain
the model/fast model setting for generate_response.ts, e.g., call
getDefaultAgentRuntimeConfig() and pull the appropriate model field instead of
process.env.OPENAI_FAST_MODEL, remove the direct throw, and ensure the agent
follows the same fallback/validation behavior as other agents (silently default
or surface a consistent error) so all agents handle missing model config
uniformly.
services/platform/convex/lib/agent_response/generate_response.ts (1)

686-694: ⚠️ Potential issue | 🟡 Minor

Token estimation underestimates for streaming agents.

Line 686 uses instructions for token estimation, but for streaming agents, the actual system prompt includes STRUCTURED_RESPONSE_INSTRUCTIONS (appended at line 228). This underestimates the token count by the size of the structured response instructions (~300 tokens).

🔧 Proposed fix for accurate token estimation
-    const systemPromptTokens = instructions ? estimateTokens(instructions) : 0;
+    const systemPromptTokens = agentInstructions ? estimateTokens(agentInstructions) : 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/lib/agent_response/generate_response.ts` around
lines 686 - 694, The token estimate currently uses only instructions and
toolsSummary to compute contextStats.totalTokens, which undercounts when the
streaming agent appends STRUCTURED_RESPONSE_INSTRUCTIONS; update the token
calculation in generate_response (where structuredThreadContext.stats and
estimateTokens are used) to add estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS)
when the agent is a streaming agent (or whenever
STRUCTURED_RESPONSE_INSTRUCTIONS is appended), e.g., compute systemPromptTokens
= estimateTokens(instructions) + (isStreamingAgent ?
estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS) : 0) and use that in
contextStats.totalTokens so the total includes the structured response
instructions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/operator/app/services/browser_service.py`:
- Around line 588-612: The code conflates navigation-limit termination with an
actual timeout by setting timed_out = True when nav_terminated is true;
introduce a separate boolean (e.g., terminated_early or nav_terminated_early)
and set that when accumulator.should_terminate triggers, keep timed_out only for
real TimeoutError cases, and update subsequent logging/response logic to use the
new flag (referencing nav_terminated, timed_out, accumulator,
MAX_NAVIGATION_COUNT, workspace_dir) so the response semantics distinguish
timeout vs early navigation termination.
- Around line 310-339: In _call_llm, add defensive parsing and clearer error
logging around the LLM response: after response.json() assign to result and
validate that result is a dict, has a non-empty "choices" list, that choices[0]
is a dict containing "message" which itself contains "content"; if any of these
checks fail, log a descriptive error including the raw result (or its truncated
string) and return None. Also catch specific parsing issues
(KeyError/IndexError/TypeError, ValueError) to produce a targeted log like
"Phase 2 LLM parse error: ..." instead of the generic exception message, while
preserving the existing timeout handling and final generic exception catch as a
fallback.

In `@services/operator/tests/test_phase2_summarization.py`:
- Line 3: Remove the unused top-level import "import json" from the test module
(the import statement in test_phase2_summarization.py); simply delete the unused
import so the file no longer imports json that isn't referenced anywhere in
functions or tests such as those around Phase2 summarization helpers.

In
`@services/platform/app/features/chat/components/structured-message/section-renderers.tsx`:
- Around line 166-198: NextStepsSection renders follow-up buttons even when
onSendFollowUp is undefined; update the component (NextStepsSection) to return
null early when onSendFollowUp is not provided (e.g., add a guard like if
(!onSendFollowUp) return null) so parseFollowUpItems(content) and the mapped
Button elements are only rendered when a handler exists.
- Around line 105-115: The <details> element is forced open by the hardcoded
open attribute so toggles don't persist; change it to be controlled by the
component state by replacing the literal open with open={isOpen} (or, if you
want uncontrolled behavior, use defaultOpen={isOpen} and remove onToggle), and
ensure the existing handleToggle updates the isOpen state; keep the
aria-label/summary logic that uses isOpen so the label stays in sync.

In `@services/platform/convex/agent_tools/sub_agents/helpers/check_budget.ts`:
- Around line 27-37: The minimum budget check currently compares remainingMs to
MIN_TOOL_BUDGET_MS but only subtracts SUB_AGENT_BUFFER_MS when returning
deadlineMs, which can allow starting a sub-agent with less than the intended
minimum; adjust the logic in the check (using remainingMs, MIN_TOOL_BUDGET_MS,
SUB_AGENT_BUFFER_MS and deadline) to subtract SUB_AGENT_BUFFER_MS from
remainingMs first (e.g., effectiveRemaining = remainingMs - SUB_AGENT_BUFFER_MS)
and then enforce effectiveRemaining >= MIN_TOOL_BUDGET_MS, returning deadlineMs
based on deadline - SUB_AGENT_BUFFER_MS when OK.
- Around line 21-24: The current logic only converts string actionDeadlineMs to
a number and ignores numeric values; update the handling around
ctx.variables?.actionDeadlineMs (raw) so that if raw is a number you use it
directly (e.g., typeof raw === 'number' ? raw : typeof raw === 'string' ?
Number(raw) : undefined), then keep the existing Number.isFinite(deadline) check
and return behavior; update the variables named raw and deadline in
check_budget.ts accordingly to ensure numeric inputs are accepted defensively.

In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 524-553: The empty-text retry path loses timeout handling and
response metadata like steps/usage—fix retryAgent.generateText call to use the
same timeout and context/storage options as the tool-result retry (e.g., pass
the timeout wrapper or timeout option and matching storageOptions), and ensure
the final result preserves/merges steps and usage correctly (merge
retryResult.steps with existing result.steps or append retry steps rather than
overwriting), keep finishReason coming from retryResult, and maintain debugLog
usage; update references in this block (retryAgent.generateText, retryResult,
result, mergeUsage, emptyRetrySystemPrompt, promptMessage, subAgentContext) to
mirror the tool-result retry logic for consistency.
- Around line 464-493: The retry path currently calls retryAgent.generateText
without a timeout and then overwrites result without preserving the response
metadata; wrap the retryAgent.generateText call in the same timeout mechanism
used elsewhere (apply the existing timeout helper used for streaming retries) to
abort on timeout, and when merging the retry output into result (the block that
sets result = { text: ..., steps: ..., usage: ..., finishReason: ... }),
include/preserve the retryResult.response (and any existing result.response) so
result.response is not lost (ensure mergeUsage stays as-is and debugLog remains
unchanged); update references to retryAgent.generateText, the result assignment
block, and merge logic so actualModel (which reads result.response later) will
be available.
- Around line 599-619: The recovery path uses withTimeout around
recoveryAgent.generateText but doesn't pass an AbortSignal, so the LLM call can
continue after timeout; create an AbortController (e.g., recoveryAbort), pass
recoveryAbort.signal into recoveryAgent.generateText via the same options shape
used in the main paths, and ensure withTimeout will call recoveryAbort.abort()
on timeout (or wire the abort into withTimeout) so the in-flight request is
cancelled when RECOVERY_TIMEOUT_MS elapses; update the recovery call sites (the
withTimeout wrapping recoveryAgent.generateText for
contextWithOrg/threadId/userId and the storage/contextOptions) to include the
abortSignal.
- Around line 351-370: The retryAgent.generateText call lacks timeout
protection; wrap the call to retryAgent.generateText (the block that assigns
retryResult) with the existing withTimeout helper and use the remaining time
budget variable (or compute remainingMs from the current request
timeout/timeoutBudget) so the retry cannot hang indefinitely; ensure you
propagate or handle the timeout error the same way as other generateText calls
so retryResult is either returned or fails fast when the remaining timeout
elapses.

In `@services/platform/lib/utils/marker-parser.ts`:
- Around line 144-145: The file exports MARKERS, MarkerType, and
MarkerParseResult but knip flags them as unused; update exports to match actual
consumers by removing the unused exports (MARKERS, MarkerType,
MarkerParseResult) from the export list and keep only parseMarkers and
ParsedSection, or alternatively add real consumers if they must remain public
(e.g., import them in structured-message.tsx or another module); ensure the
changed export statement references only the symbols in use (parseMarkers and
ParsedSection) or document/annotate the API if you intentionally keep the
others.

---

Outside diff comments:
In `@services/platform/convex/agents/document/generate_response.ts`:
- Around line 33-36: The code currently reads OPENAI_FAST_MODEL directly and
throws if missing; instead, use the centralized runtime config helper
getDefaultAgentRuntimeConfig() (the same pattern as the chat agent) to obtain
the model/fast model setting for generate_response.ts, e.g., call
getDefaultAgentRuntimeConfig() and pull the appropriate model field instead of
process.env.OPENAI_FAST_MODEL, remove the direct throw, and ensure the agent
follows the same fallback/validation behavior as other agents (silently default
or surface a consistent error) so all agents handle missing model config
uniformly.

In `@services/platform/convex/agents/web/generate_response.ts`:
- Line 37: The config uses a fallback "model: process.env.OPENAI_FAST_MODEL ||
''" which yields an opaque downstream failure; instead validate
OPENAI_FAST_MODEL explicitly (as the document agent does) and throw a clear
error when missing. Locate the model assignment in generate_response.ts (the
object containing model: process.env.OPENAI_FAST_MODEL) and replace the silent
fallback with an explicit check that throws a descriptive Error (e.g.,
"OPENAI_FAST_MODEL is not set; please configure the model for the web agent")
before proceeding, ensuring callers get an immediate, helpful message.

In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 686-694: The token estimate currently uses only instructions and
toolsSummary to compute contextStats.totalTokens, which undercounts when the
streaming agent appends STRUCTURED_RESPONSE_INSTRUCTIONS; update the token
calculation in generate_response (where structuredThreadContext.stats and
estimateTokens are used) to add estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS)
when the agent is a streaming agent (or whenever
STRUCTURED_RESPONSE_INSTRUCTIONS is appended), e.g., compute systemPromptTokens
= estimateTokens(instructions) + (isStreamingAgent ?
estimateTokens(STRUCTURED_RESPONSE_INSTRUCTIONS) : 0) and use that in
contextStats.totalTokens so the total includes the structured response
instructions.

Comment thread services/operator/app/services/browser_service.py
Comment thread services/operator/app/services/browser_service.py Outdated
Comment thread services/operator/tests/test_phase2_summarization.py Outdated
Comment thread services/platform/convex/lib/agent_response/generate_response.ts Outdated
Comment thread services/platform/convex/lib/agent_response/generate_response.ts Outdated
Comment thread services/platform/convex/lib/agent_response/generate_response.ts Outdated
Comment thread services/platform/convex/lib/agent_response/generate_response.ts
Comment thread services/platform/lib/utils/marker-parser.ts Outdated
@larryro larryro merged commit 0dd19b0 into main Feb 19, 2026
17 checks passed
@larryro larryro deleted the feat/structured-ai-responses branch February 19, 2026 14:36
yannickmonney pushed a commit that referenced this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant