feat(operator,platform): refactor operator agent loop and improve platform reliability by larryro · Pull Request #499 · tale-project/tale

larryro · 2026-02-20T05:21:34Z

Summary

Operator service: Restructured with a dedicated agent loop (agent_loop.py) and browser pool (browser_pool.py), replacing the MCP vision server and workspace manager for cleaner separation of concerns
Platform reliability: Improved error boundaries, RLS context handling, branding queries, and agent response generation with better timeout budget chain support
Chat UX: Refined chat pending state logic and updated tests to match new behavior

Changes

Operator (Python)

Added agent_loop.py — dedicated agent loop service with structured task execution
Added browser_pool.py — connection pooling for browser instances
Removed workspace_manager.py and mcp/vision_server.py (consolidated into new modules)
Simplified browser_service.py by extracting pool management
Updated Docker configuration and entrypoint for the new architecture
Trimmed uv.lock dependencies

Platform (TypeScript/Convex)

Improved error boundary components (layout error boundary, base error boundary)
Enhanced generate_response.ts with better timeout budget handling
Refined RLS context creation and organization access validation
Updated branding queries and custom agent queries
Improved chat pending state hook with updated tests

Test plan

Added test_agent_loop.py (728 lines) covering agent loop behavior
Added test_browser_pool.py (173 lines) covering browser pool management
Updated test_output_accumulator.py and test_phase2_summarization.py for new structure
Updated use-chat-pending-state.test.ts for refined pending state logic
Verify operator service starts correctly with new architecture
Verify platform chat interface behavior with updated error boundaries

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Direct browser automation now available for AI-powered task execution
- Added fast model option for quicker agent responses
- Vision API integration for screenshot and image analysis
Improvements
- Enhanced error recovery with automatic retry mechanisms
- Improved chat responsiveness with better pending state management
- Optimized timeout handling with platform safety limits
- Streamlined browser request concurrency management
Changes
- Browser automation now uses direct integration instead of layered approach
- Simplified service configuration and setup requirements

greptile-apps · 2026-02-20T05:24:24Z

Greptile Summary

Major architectural refactoring replacing OpenCode CLI with direct LLM function-calling agent loop and browser pooling, plus platform reliability improvements.

Operator Service Refactoring

New agent_loop.py (824 lines): Implements direct LLM function-calling with Playwright browser tools, replacing the OpenCode CLI subprocess model
New browser_pool.py (128 lines): Manages persistent Chromium browser with per-request isolated contexts using semaphore-based concurrency control
Simplified browser_service.py: Removed 280+ lines of subprocess management, now delegates to agent_loop and browser_pool
Removed workspace_manager.py (260 lines) and mcp/vision_server.py (102 lines): Consolidated into new architecture
Comprehensive test coverage: Added 728-line test_agent_loop.py and 173-line test_browser_pool.py

Platform Reliability Improvements

generate_response.ts: Added 9-minute platform hard limit cap to prevent Convex from killing actions mid-operation, switched retries to fast model for efficiency
Error boundaries: Added auto-retry with exponential backoff for transient errors
RLS optimization: Added optional user parameter to createRLSContext and validateOrganizationAccess to skip expensive auth queries when user already available (saves 2 DB queries per call)
Chat UX: Refined pending state to clear only when no assistant messages are in non-terminal state

Issues Found

Critical: config.py headless default changed from True to False without migration guidance — production deployments may unexpectedly run browsers in non-headless mode

Confidence Score: 3/5

Safe to merge with one critical configuration issue requiring attention
The refactoring is well-architected with excellent test coverage (900+ lines of new tests), but the headless default change is a breaking configuration change that could cause issues in production environments if not explicitly handled
services/operator/app/config.py requires environment variable audit before deployment

Important Files Changed

Filename	Overview
services/operator/app/services/agent_loop.py	New 824-line agent loop with direct LLM function-calling and Playwright browser automation, replacing OpenCode CLI
services/operator/app/services/browser_pool.py	New browser context pooling with persistent Chromium instance and semaphore-based concurrency control
services/operator/app/config.py	Removed workspace config fields, added vision config, changed headless default from True to False
services/platform/convex/lib/agent_response/generate_response.ts	Added 9-minute platform hard limit cap, improved retry logic with fast model, better timeout budget handling
services/platform/app/components/error-boundaries/core/error-boundary-base.tsx	Added auto-retry with exponential backoff, cleanup for timers in componentWillUnmount
services/platform/app/features/chat/hooks/use-chat-pending-state.ts	Simplified pending state logic to clear only when no assistant messages are in non-terminal state

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Request] --> B[BrowserService]
    B --> C{Initialize?}
    C -->|Yes| D[BrowserPool.initialize]
    D --> E[Launch Chromium]
    C -->|No| F[BrowserPool.acquire]
    F --> G[Create BrowserContext]
    G --> H[run_agent_loop]
    H --> I{Agent Loop}
    I --> J[Call LLM with tools]
    J --> K{Response Type}
    K -->|Tool Calls| L[Execute Playwright Tools]
    L --> M[navigate/snapshot/click/etc]
    M --> N{Continue?}
    N -->|Yes, budget remains| I
    N -->|Timeout/Max turns| O[Phase 2: Summarize]
    K -->|Text Response| P[Return Response]
    O --> P
    P --> Q[BrowserPool.release]
    Q --> R[Close Context]
    R --> S[Return Results]

_{Last reviewed commit: faa8ad3}

greptile-apps

_{40 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai · 2026-02-20T05:30:53Z

📝 Walkthrough

Walkthrough

This PR refactors the Operator service from an OpenCode + Playwright MCP architecture to direct Playwright browser automation with an in-process LLM agent loop. Key changes include removing WorkspaceManager and vision_server modules, introducing BrowserPool for concurrent context management and agent_loop for orchestrating tool-calling agent interactions. Configuration removes workspace-related settings and adds vision API credentials. Additional platform-side changes introduce error boundary retry mechanisms with exponential backoff, propagate incomplete message signals through chat UI components, add useFastModel options to agent factories, refactor authentication flows using getAuthUserIdentity, and impose platform hard timeout limits on agent execution.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix(operator): use @playwright/mcp bundled browser CLI to prevent version mismatch #488 — Modifies the Operator Dockerfile's Playwright installation logic to use direct playwright install chromium instead of MCP-bundled CLI, same as this PR's docker changes.
fix: replace DB-based auth with JWT identity in queries #404 — Replaces authComponent.getAuthUser(...) with getAuthUserIdentity(...) pattern across query handlers, identical refactoring used in this PR's RLS/auth updates.
fix(platform): prevent chat loading state flicker during tool call transitions #492 — Propagates the hasIncompleteAssistantMessage flag through chat UI components (chat-interface, chat-messages, hooks), same signal handling as this PR's chat-related changes.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: operator agent loop refactoring and platform reliability improvements, matching the PR objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/operator-and-platform-updates

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/platform/app/features/chat/hooks/__tests__/use-chat-pending-state.test.ts (1)

8-19: 🧹 Nitpick | 🔵 Trivial

Prefer satisfies UIMessage over as UIMessage in this test fixture helper.

This provides compile-time validation while keeping type inference intact, aligning with the repository's guidance to avoid type casting.

♻️ Suggested refactor

-  return {
+  const message = {
     key: overrides.id,
     role: 'assistant',
     text: '',
     _creationTime: Date.now(),
     status: 'success',
     parts: [],
     ...overrides,
-  } as UIMessage;
+  } satisfies UIMessage;
+  return message;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@services/platform/app/features/chat/hooks/__tests__/use-chat-pending-state.test.ts`
around lines 8 - 19, The test helper createUIMessage currently casts the
returned object with "as UIMessage"; replace that cast by using the TypeScript
"satisfies UIMessage" operator on the object literal (keep the function
signature returning UIMessage) so the object is validated at compile time
without losing inference. Update the return expression in createUIMessage to use
"satisfies UIMessage" instead of "as UIMessage" and ensure the function still
returns UIMessage.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/operator/app/services/agent_loop.py`:
- Around line 736-743: The check `context is not None` is redundant because
`context: BrowserContext` is required; remove that condition and simplify
`should_batch = len(navigate_calls) > 1 and context is not None` to just
`should_batch = len(navigate_calls) > 1`, leaving the batching logic using
`navigate_calls`, `parsed_calls`, `_fetch_single_page`, and `urls` unchanged so
the parallel `asyncio.gather` and `nav_results` creation behave the same.
- Around line 453-460: The select_option branch in the tool handler (tool_name
== "select_option") currently calls _resolve_locator(...) and await
locator.select_option(value) but doesn't wait for page load like the click and
type_text branches; update the handler to await the same post-action load wait
(e.g. await page.wait_for_load_state("domcontentloaded") or the existing helper
used elsewhere) after await locator.select_option(value) to make behavior
consistent and prevent race conditions before returning the "Selected ..."
message.
- Around line 569-620: The function _call_llm_with_tools currently returns
immediately on httpx.TimeoutException instead of retrying; change the handler
for TimeoutException to mirror the general Exception branch (set last_error to a
descriptive timeout message, log a warning with attempt info, backoff with await
asyncio.sleep(2**attempt) and continue if attempt < retries, otherwise
break/allow final error logging), and rename the parameter _retries to retries
(and update all internal uses and any callers) so the retry loop uses the public
name; ensure final logger.error still reports the last_error after retries are
exhausted.

In `@services/operator/app/services/browser_pool.py`:
- Around line 75-76: Replace the two assert statements that check
self._semaphore and self._browser (used in BrowserPool methods around the
asserts at lines showing the checks) with explicit runtime checks that raise a
clear RuntimeError (or custom exception) if either is None; do the same for the
similar check at the other location (around the check near line 92).
Specifically, locate the methods referencing self._semaphore and self._browser
in browser_pool.py, remove the assert lines and insert explicit if-not checks
that raise RuntimeError with a helpful message (e.g., "BrowserPool not
initialized: _semaphore is None" / "_browser is None") so the error cannot be
silently skipped when Python runs with -O.
- Line 54: The disconnected callback creates a fire-and-forget task via
asyncio.create_task(self._on_browser_disconnected()), which can be orphaned
during fast shutdown; change this to store the created Task (e.g.
self._disconnect_task = asyncio.create_task(self._on_browser_disconnected()))
and ensure that any shutdown/cleanup method for BrowserPool (or equivalent)
checks for self._disconnect_task and awaits or cancels it (and awaits) so the
_on_browser_disconnected coroutine is not left running; update the
_on_browser_disconnected reference and any close/shutdown logic to handle a
pending task safely.

In `@services/operator/Dockerfile`:
- Around line 74-75: The Dockerfile has a standalone RUN chown -R pwuser:pwuser
/app that can be merged with adjacent RUN instructions to reduce image layers
(per Hadolint DL3059); update the Dockerfile by combining this chown into the
nearest preceding or following RUN that performs related setup (using shell
chaining like &&) so the ownership change and other setup steps execute in a
single RUN, referencing the existing RUN chown -R pwuser:pwuser /app command and
any neighboring RUN that installs/copies app files.

In
`@services/platform/app/components/error-boundaries/core/error-boundary-base.tsx`:
- Around line 60-71: The retryCount is only incremented in the timeout handler,
which means once a boundary recovers it still accumulates retries; update the
setTimeout callback in the error boundary's retry logic (where retryTimerId is
set and getRetryDelay is used) to reset retryCount to 0 when clearing
hasError/error and isRetrying, e.g. setState should set retryCount: 0 instead of
incrementing prev.retryCount so future independent errors start with fresh
retries.

In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 194-206: Compute remainingPlatformMs = Math.max(platformDeadline -
Date.now(), 0) and use it when computing effectiveTimeoutMs and actionDeadline
to prevent negative timeouts; if remainingPlatformMs === 0 then throw an
AgentTimeoutError immediately (use AgentTimeoutError symbol) to abort early;
ensure rawTimeoutMs calculation still uses args.deadlineMs ?
Math.max(args.deadlineMs - Date.now(), 30_000) : agentConfig.timeoutMs but cap
effectiveTimeoutMs = Math.min(rawTimeoutMs, remainingPlatformMs) and set
actionDeadline = Math.min(args.deadlineMs ?? startTime + effectiveTimeoutMs,
platformDeadline). Also update the recovery timeout logic (the code referenced
around lines 689–725) to clamp any recovery allocation to remainingPlatformMs so
recovery never requests more time than the platform budget.

---

Outside diff comments:
In
`@services/platform/app/features/chat/hooks/__tests__/use-chat-pending-state.test.ts`:
- Around line 8-19: The test helper createUIMessage currently casts the returned
object with "as UIMessage"; replace that cast by using the TypeScript "satisfies
UIMessage" operator on the object literal (keep the function signature returning
UIMessage) so the object is validated at compile time without losing inference.
Update the return expression in createUIMessage to use "satisfies UIMessage"
instead of "as UIMessage" and ensure the function still returns UIMessage.

…tform reliability Restructure operator service with dedicated agent loop and browser pool, removing MCP vision server and workspace manager. Improve platform error boundaries, agent response generation, RLS context handling, branding queries, and chat pending state logic.

…ring shutdown

…hin platform budget

Use contextlib.suppress for CancelledError in BrowserPool shutdown and fix line formatting in OutputAccumulator.

…tform reliability (#499)

greptile-apps Bot reviewed Feb 20, 2026

View reviewed changes

Comment thread services/operator/app/config.py

coderabbitai Bot requested changes Feb 20, 2026

View reviewed changes

coderabbitai Bot approved these changes Feb 20, 2026

View reviewed changes

larryro added 10 commits February 20, 2026 15:10

fix(operator): add wait_for_load_state after select_option in agent loop

6abaea2

fix(operator): retry LLM calls on timeout with exponential backoff

2e67c29

fix(operator): remove redundant None check on required context parameter

27d810d

fix(operator): track disconnect callback task to prevent orphaning du…

7e11e3f

…ring shutdown

fix(operator): replace assert with explicit RuntimeError in BrowserPool

805e8c4

fix(operator): consolidate consecutive RUN instructions in Dockerfile

341b1c1

fix(platform): reset retryCount after successful error boundary recovery

9e3e5ca

fix(convex): guard against non-positive timeouts and cap recovery wit…

a7b0d1b

…hin platform budget

fix(operator): resolve ruff lint and format errors

f0de21c

Use contextlib.suppress for CancelledError in BrowserPool shutdown and fix line formatting in OutputAccumulator.

larryro force-pushed the feat/operator-and-platform-updates branch from 8b3b831 to f0de21c Compare February 20, 2026 07:10

larryro merged commit 754744a into main Feb 20, 2026
17 checks passed

larryro deleted the feat/operator-and-platform-updates branch February 20, 2026 07:30

This was referenced Feb 20, 2026

feat(platform,crawler): add website page embeddings with vector search #501

Merged

feat: migrate website search and embeddings to crawler service #595

Merged

yannickmonney pushed a commit that referenced this pull request Apr 8, 2026

feat(operator,platform): refactor operator agent loop and improve pla…

1b7379e

…tform reliability (#499)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(operator,platform): refactor operator agent loop and improve platform reliability#499

feat(operator,platform): refactor operator agent loop and improve platform reliability#499
larryro merged 10 commits into
mainfrom
feat/operator-and-platform-updates

larryro commented Feb 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot commented Feb 20, 2026

Important Files Changed

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 20, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

larryro commented Feb 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Operator (Python)

Platform (TypeScript/Convex)

Test plan

Summary by CodeRabbit

Uh oh!

greptile-apps Bot commented Feb 20, 2026

Greptile Summary

Operator Service Refactoring

Platform Reliability Improvements

Issues Found

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 20, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Feb 20, 2026 •

edited by coderabbitai Bot

Loading