Skip to content

feat(operator,platform): refactor operator agent loop and improve platform reliability#499

Merged
larryro merged 10 commits into
mainfrom
feat/operator-and-platform-updates
Feb 20, 2026
Merged

feat(operator,platform): refactor operator agent loop and improve platform reliability#499
larryro merged 10 commits into
mainfrom
feat/operator-and-platform-updates

Conversation

@larryro
Copy link
Copy Markdown
Collaborator

@larryro larryro commented Feb 20, 2026

Summary

  • Operator service: Restructured with a dedicated agent loop (agent_loop.py) and browser pool (browser_pool.py), replacing the MCP vision server and workspace manager for cleaner separation of concerns
  • Platform reliability: Improved error boundaries, RLS context handling, branding queries, and agent response generation with better timeout budget chain support
  • Chat UX: Refined chat pending state logic and updated tests to match new behavior

Changes

Operator (Python)

  • Added agent_loop.py — dedicated agent loop service with structured task execution
  • Added browser_pool.py — connection pooling for browser instances
  • Removed workspace_manager.py and mcp/vision_server.py (consolidated into new modules)
  • Simplified browser_service.py by extracting pool management
  • Updated Docker configuration and entrypoint for the new architecture
  • Trimmed uv.lock dependencies

Platform (TypeScript/Convex)

  • Improved error boundary components (layout error boundary, base error boundary)
  • Enhanced generate_response.ts with better timeout budget handling
  • Refined RLS context creation and organization access validation
  • Updated branding queries and custom agent queries
  • Improved chat pending state hook with updated tests

Test plan

  • Added test_agent_loop.py (728 lines) covering agent loop behavior
  • Added test_browser_pool.py (173 lines) covering browser pool management
  • Updated test_output_accumulator.py and test_phase2_summarization.py for new structure
  • Updated use-chat-pending-state.test.ts for refined pending state logic
  • Verify operator service starts correctly with new architecture
  • Verify platform chat interface behavior with updated error boundaries

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Direct browser automation now available for AI-powered task execution
    • Added fast model option for quicker agent responses
    • Vision API integration for screenshot and image analysis
  • Improvements

    • Enhanced error recovery with automatic retry mechanisms
    • Improved chat responsiveness with better pending state management
    • Optimized timeout handling with platform safety limits
    • Streamlined browser request concurrency management
  • Changes

    • Browser automation now uses direct integration instead of layered approach
    • Simplified service configuration and setup requirements

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 20, 2026

Greptile Summary

Major architectural refactoring replacing OpenCode CLI with direct LLM function-calling agent loop and browser pooling, plus platform reliability improvements.

Operator Service Refactoring

  • New agent_loop.py (824 lines): Implements direct LLM function-calling with Playwright browser tools, replacing the OpenCode CLI subprocess model
  • New browser_pool.py (128 lines): Manages persistent Chromium browser with per-request isolated contexts using semaphore-based concurrency control
  • Simplified browser_service.py: Removed 280+ lines of subprocess management, now delegates to agent_loop and browser_pool
  • Removed workspace_manager.py (260 lines) and mcp/vision_server.py (102 lines): Consolidated into new architecture
  • Comprehensive test coverage: Added 728-line test_agent_loop.py and 173-line test_browser_pool.py

Platform Reliability Improvements

  • generate_response.ts: Added 9-minute platform hard limit cap to prevent Convex from killing actions mid-operation, switched retries to fast model for efficiency
  • Error boundaries: Added auto-retry with exponential backoff for transient errors
  • RLS optimization: Added optional user parameter to createRLSContext and validateOrganizationAccess to skip expensive auth queries when user already available (saves 2 DB queries per call)
  • Chat UX: Refined pending state to clear only when no assistant messages are in non-terminal state

Issues Found

  • Critical: config.py headless default changed from True to False without migration guidance — production deployments may unexpectedly run browsers in non-headless mode

Confidence Score: 3/5

  • Safe to merge with one critical configuration issue requiring attention
  • The refactoring is well-architected with excellent test coverage (900+ lines of new tests), but the headless default change is a breaking configuration change that could cause issues in production environments if not explicitly handled
  • services/operator/app/config.py requires environment variable audit before deployment

Important Files Changed

Filename Overview
services/operator/app/services/agent_loop.py New 824-line agent loop with direct LLM function-calling and Playwright browser automation, replacing OpenCode CLI
services/operator/app/services/browser_pool.py New browser context pooling with persistent Chromium instance and semaphore-based concurrency control
services/operator/app/config.py Removed workspace config fields, added vision config, changed headless default from True to False
services/platform/convex/lib/agent_response/generate_response.ts Added 9-minute platform hard limit cap, improved retry logic with fast model, better timeout budget handling
services/platform/app/components/error-boundaries/core/error-boundary-base.tsx Added auto-retry with exponential backoff, cleanup for timers in componentWillUnmount
services/platform/app/features/chat/hooks/use-chat-pending-state.ts Simplified pending state logic to clear only when no assistant messages are in non-terminal state

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Request] --> B[BrowserService]
    B --> C{Initialize?}
    C -->|Yes| D[BrowserPool.initialize]
    D --> E[Launch Chromium]
    C -->|No| F[BrowserPool.acquire]
    F --> G[Create BrowserContext]
    G --> H[run_agent_loop]
    H --> I{Agent Loop}
    I --> J[Call LLM with tools]
    J --> K{Response Type}
    K -->|Tool Calls| L[Execute Playwright Tools]
    L --> M[navigate/snapshot/click/etc]
    M --> N{Continue?}
    N -->|Yes, budget remains| I
    N -->|Timeout/Max turns| O[Phase 2: Summarize]
    K -->|Text Response| P[Return Response]
    O --> P
    P --> Q[BrowserPool.release]
    Q --> R[Close Context]
    R --> S[Return Results]
Loading

Last reviewed commit: faa8ad3

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread services/operator/app/config.py
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

This PR refactors the Operator service from an OpenCode + Playwright MCP architecture to direct Playwright browser automation with an in-process LLM agent loop. Key changes include removing WorkspaceManager and vision_server modules, introducing BrowserPool for concurrent context management and agent_loop for orchestrating tool-calling agent interactions. Configuration removes workspace-related settings and adds vision API credentials. Additional platform-side changes introduce error boundary retry mechanisms with exponential backoff, propagate incomplete message signals through chat UI components, add useFastModel options to agent factories, refactor authentication flows using getAuthUserIdentity, and impose platform hard timeout limits on agent execution.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: operator agent loop refactoring and platform reliability improvements, matching the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/operator-and-platform-updates

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/platform/app/features/chat/hooks/__tests__/use-chat-pending-state.test.ts (1)

8-19: 🧹 Nitpick | 🔵 Trivial

Prefer satisfies UIMessage over as UIMessage in this test fixture helper.

This provides compile-time validation while keeping type inference intact, aligning with the repository's guidance to avoid type casting.

♻️ Suggested refactor
-  return {
+  const message = {
     key: overrides.id,
     role: 'assistant',
     text: '',
     _creationTime: Date.now(),
     status: 'success',
     parts: [],
     ...overrides,
-  } as UIMessage;
+  } satisfies UIMessage;
+  return message;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@services/platform/app/features/chat/hooks/__tests__/use-chat-pending-state.test.ts`
around lines 8 - 19, The test helper createUIMessage currently casts the
returned object with "as UIMessage"; replace that cast by using the TypeScript
"satisfies UIMessage" operator on the object literal (keep the function
signature returning UIMessage) so the object is validated at compile time
without losing inference. Update the return expression in createUIMessage to use
"satisfies UIMessage" instead of "as UIMessage" and ensure the function still
returns UIMessage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/operator/app/services/agent_loop.py`:
- Around line 736-743: The check `context is not None` is redundant because
`context: BrowserContext` is required; remove that condition and simplify
`should_batch = len(navigate_calls) > 1 and context is not None` to just
`should_batch = len(navigate_calls) > 1`, leaving the batching logic using
`navigate_calls`, `parsed_calls`, `_fetch_single_page`, and `urls` unchanged so
the parallel `asyncio.gather` and `nav_results` creation behave the same.
- Around line 453-460: The select_option branch in the tool handler (tool_name
== "select_option") currently calls _resolve_locator(...) and await
locator.select_option(value) but doesn't wait for page load like the click and
type_text branches; update the handler to await the same post-action load wait
(e.g. await page.wait_for_load_state("domcontentloaded") or the existing helper
used elsewhere) after await locator.select_option(value) to make behavior
consistent and prevent race conditions before returning the "Selected ..."
message.
- Around line 569-620: The function _call_llm_with_tools currently returns
immediately on httpx.TimeoutException instead of retrying; change the handler
for TimeoutException to mirror the general Exception branch (set last_error to a
descriptive timeout message, log a warning with attempt info, backoff with await
asyncio.sleep(2**attempt) and continue if attempt < retries, otherwise
break/allow final error logging), and rename the parameter _retries to retries
(and update all internal uses and any callers) so the retry loop uses the public
name; ensure final logger.error still reports the last_error after retries are
exhausted.

In `@services/operator/app/services/browser_pool.py`:
- Around line 75-76: Replace the two assert statements that check
self._semaphore and self._browser (used in BrowserPool methods around the
asserts at lines showing the checks) with explicit runtime checks that raise a
clear RuntimeError (or custom exception) if either is None; do the same for the
similar check at the other location (around the check near line 92).
Specifically, locate the methods referencing self._semaphore and self._browser
in browser_pool.py, remove the assert lines and insert explicit if-not checks
that raise RuntimeError with a helpful message (e.g., "BrowserPool not
initialized: _semaphore is None" / "_browser is None") so the error cannot be
silently skipped when Python runs with -O.
- Line 54: The disconnected callback creates a fire-and-forget task via
asyncio.create_task(self._on_browser_disconnected()), which can be orphaned
during fast shutdown; change this to store the created Task (e.g.
self._disconnect_task = asyncio.create_task(self._on_browser_disconnected()))
and ensure that any shutdown/cleanup method for BrowserPool (or equivalent)
checks for self._disconnect_task and awaits or cancels it (and awaits) so the
_on_browser_disconnected coroutine is not left running; update the
_on_browser_disconnected reference and any close/shutdown logic to handle a
pending task safely.

In `@services/operator/Dockerfile`:
- Around line 74-75: The Dockerfile has a standalone RUN chown -R pwuser:pwuser
/app that can be merged with adjacent RUN instructions to reduce image layers
(per Hadolint DL3059); update the Dockerfile by combining this chown into the
nearest preceding or following RUN that performs related setup (using shell
chaining like &&) so the ownership change and other setup steps execute in a
single RUN, referencing the existing RUN chown -R pwuser:pwuser /app command and
any neighboring RUN that installs/copies app files.

In
`@services/platform/app/components/error-boundaries/core/error-boundary-base.tsx`:
- Around line 60-71: The retryCount is only incremented in the timeout handler,
which means once a boundary recovers it still accumulates retries; update the
setTimeout callback in the error boundary's retry logic (where retryTimerId is
set and getRetryDelay is used) to reset retryCount to 0 when clearing
hasError/error and isRetrying, e.g. setState should set retryCount: 0 instead of
incrementing prev.retryCount so future independent errors start with fresh
retries.

In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 194-206: Compute remainingPlatformMs = Math.max(platformDeadline -
Date.now(), 0) and use it when computing effectiveTimeoutMs and actionDeadline
to prevent negative timeouts; if remainingPlatformMs === 0 then throw an
AgentTimeoutError immediately (use AgentTimeoutError symbol) to abort early;
ensure rawTimeoutMs calculation still uses args.deadlineMs ?
Math.max(args.deadlineMs - Date.now(), 30_000) : agentConfig.timeoutMs but cap
effectiveTimeoutMs = Math.min(rawTimeoutMs, remainingPlatformMs) and set
actionDeadline = Math.min(args.deadlineMs ?? startTime + effectiveTimeoutMs,
platformDeadline). Also update the recovery timeout logic (the code referenced
around lines 689–725) to clamp any recovery allocation to remainingPlatformMs so
recovery never requests more time than the platform budget.

---

Outside diff comments:
In
`@services/platform/app/features/chat/hooks/__tests__/use-chat-pending-state.test.ts`:
- Around line 8-19: The test helper createUIMessage currently casts the returned
object with "as UIMessage"; replace that cast by using the TypeScript "satisfies
UIMessage" operator on the object literal (keep the function signature returning
UIMessage) so the object is validated at compile time without losing inference.
Update the return expression in createUIMessage to use "satisfies UIMessage"
instead of "as UIMessage" and ensure the function still returns UIMessage.

Comment thread services/operator/app/services/agent_loop.py
Comment thread services/operator/app/services/agent_loop.py
Comment thread services/operator/app/services/agent_loop.py
Comment thread services/operator/app/services/browser_pool.py Outdated
Comment thread services/operator/app/services/browser_pool.py Outdated
Comment thread services/operator/Dockerfile Outdated
Comment thread services/platform/convex/lib/agent_response/generate_response.ts
@larryro larryro force-pushed the feat/operator-and-platform-updates branch from 8b3b831 to f0de21c Compare February 20, 2026 07:10
@larryro larryro merged commit 754744a into main Feb 20, 2026
17 checks passed
@larryro larryro deleted the feat/operator-and-platform-updates branch February 20, 2026 07:30
yannickmonney pushed a commit that referenced this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant