refactor(agent): unify the three agent-turn loops into one TurnEngine#3012
Conversation
…gine/ Phase 1 of unifying the three agent-turn loops (Agent::turn, run_tool_call_loop, subagent run_inner_loop) behind one engine. Introduces src/openhuman/agent/harness/engine/ and lifts two pieces that were duplicated across the loops, behavior-preserving: - engine::run_one_tool — the per-call tool executor (start event, policy gate, scope guard, approval gate, execute+timeout, scrub/tokenjuice/cap/summarize, audit, completion event). Collapses ~400 lines of branches in tool_loop.rs into one call + a uniform push/writeln/circuit-breaker tail. - engine::spawn_delta_forwarder — the ProviderDelta -> AgentProgress streaming forwarder, previously copied byte-for-byte in tool_loop.rs and turn.rs. No behavior change. tool_loop tests: 24/24 pass.
Phase 2 of the agent-turn unification. Moves the canonical agentic loop body out of run_tool_call_loop into engine::run_turn_engine, behavior-preserving, and introduces the first axis seam: - engine::ToolSource (+ RegistryToolSource) — owns tool advertisement (request_specs) and per-call execution (execute_call). The channel/CLI source resolves registry+extra under the visibility whitelist and delegates to run_one_tool, exactly as before. - engine::run_turn_engine — the single loop (turn/iteration/cost/completed progress, stop hooks, context guard, token-budget trim, multimodal prep, streaming forwarder, native/text parse, circuit breaker, max-iter error). - run_tool_call_loop is now a ~30-line adapter that builds a RegistryToolSource and hands off. Signature unchanged; bus/triage/summarizer/tests unaffected. tool_loop + harness suites: 128/128 pass.
…int seams
Phase 3a. Adds the remaining axis seams so the subagent and Agent loops can route
through run_turn_engine next:
- ProgressReporter (TurnProgress / SubagentProgress / NullProgress) — the engine
no longer names a concrete AgentProgress variant; flavor (Turn* vs Subagent*)
+ streaming-on/off is chosen by the impl. run_one_tool now reports tool
start/complete through it too.
- TurnObserver (NullObserver) — per-caller side effects (context mgmt,
transcript, worker-thread mirroring) around each step; all hooks default no-op.
- CheckpointStrategy (ErrorCheckpoint) — max-iteration outcome (error vs
summarize). run_turn_engine now returns TurnEngineOutcome { text, iterations,
cost } and accumulates a tool digest for summarizing strategies.
run_tool_call_loop wires TurnProgress + NullObserver + ErrorCheckpoint; behavior
unchanged. tool_loop + harness suites: 118/118 pass.
Phase 3. run_inner_loop is replaced by run_turn_engine + three subagent seams:
- SubagentToolSource — lazy toolkit resolution + allowlist gating + the
progressive-disclosure handoff, with per-call execution now via the shared
engine::run_one_tool. Sub-agents thereby gain the same approval gate, audit
rows, credential scrub, tokenjuice and timeout as the channel loop, and tool
success is now structured (not the fragile !starts_with("Error")). force-text
mode falls out of advertising zero specs (engine skips native tools → XML
fallback parse → batched [Tool results]).
- SubagentObserver — accumulates AggregatedUsage, persists the per-iteration
transcript, and mirrors assistant intents / per-call results / text-mode
batches / final responses to the spawn's worker thread.
- SubagentCheckpoint — summarize-on-cap (resumable checkpoint) with the
deterministic-digest fallback; its summarization usage is folded back into
the turn cost + AggregatedUsage by the engine.
Spawn-depth + can't-spawn-subagents enforcement is untouched (it lives in
run_subagent / run_typed_mode setup, outside the loop). Sanctioned behavior
changes: sub-agents now run stop-hooks + context-guard + token-trim, get
scrub/tokenjuice, and the text-mode tool_result block drops the status=
attribute. Engine TurnObserver grew on_assistant/on_tool_result/
on_results_batch/after_iteration hooks; CheckpointStrategy now returns
CheckpointOutcome { text, usage }.
subagent + tool_filter + harness_gap: 68/68; broad harness + agent: 460/460.
…dwork) The engine's hardcoded native-first + XML-fallback parse becomes the DefaultParser behind a ResponseParser seam. This lets Agent::turn (next) plug in its configured ToolDispatcher — crucially PFormat, whose positional name[args] grammar the built-in parser can't read — without the engine knowing about dispatchers. DispatcherParser adapts any ToolDispatcher to the seam (dispatcher::ParsedToolCall -> engine ParsedToolCall). Channel loop + subagent pass DefaultParser; behavior unchanged.
Lets the turn engine's DispatcherParser hold a cheap clone of the dispatcher without borrowing the Agent (whose history/context the turn observer borrows mutably). Builder setter still takes Box; conversion to Arc happens at build(). No behavior change.
…rver hooks (Phase 4 groundwork) - session/agent_tool_exec.rs: run_agent_tool_call — the Agent's per-call path (visibility gate, session policy, per-call permission, ToolPolicy.check, execute_with_options + payload summarizer, per-result budget) as a free fn so both Agent::execute_tool_call and the upcoming AgentToolSource share it. Agent::execute_tool_call now delegates to it (old body kept as execute_tool_call_legacy, dead, to delete in cleanup). - engine: TurnObserver hooks enriched (on_assistant now carries display+raw text, reasoning_content, native + parsed calls; on_tool_result carries success) so Agent::turn can rebuild its typed ConversationMessage history. TurnEngineOutcome gains hit_cap. agent + session::turn + session::tests: 164/164 pass.
…nEngine Phase 4 — the third and final loop. Agent::turn now drives the shared run_turn_engine via three seams, preserving its richer state: - AgentToolSource: owns Arc/value clones of the Agent's tool state (disjoint from the &mut Agent the observer holds) and runs each call through the shared run_agent_tool_call; collects ToolCallRecords for post-turn hooks. Advertises specs only when the dispatcher's should_send_tool_specs() is true (PFormat/XML fall back to the engine's text path, same as subagent force-text-mode). - DispatcherParser: plugs the Agent's ToolDispatcher (native/XML/PFormat) into the engine's ResponseParser seam, so PFormat's positional grammar is preserved. - AgentObserver: borrows the Agent mutably — runs the ContextManager reduction + re-materializes the engine buffer from typed history each iteration, rebuilds the typed ConversationMessage history (Chat + AssistantToolCalls with reasoning_content round-trip + dispatcher.format_results), accumulates usage, persists the transcript. allow_empty_final()=false preserves the EmptyProviderResponse error. - AgentCheckpoint: summarize-on-cap with streaming + deterministic fallback. Supporting: tool_dispatcher is now Arc; turn_checkpoint moved to a session sibling module; execute_tool_call delegates to the shared executor (old body kept as execute_tool_call_legacy, deleted in Phase 5). Sanctioned behavior changes (web chat now matches the channel loop): emits TurnStarted + TurnCostUpdated progress events, gains the repeated-failure circuit breaker, and runs multimodal prep. agent + session + harness: 482/482.
…ts, docs
- Delete Agent::execute_tool_call_legacy (the pre-extraction duplicate body);
Agent::execute_tool_call now solely delegates to the shared run_agent_tool_call.
- Drop now-unused imports across turn.rs + engine/mod.rs re-exports; cargo fmt.
- Document the unified engine in gitbooks agent-harness architecture ("One
engine, three entry points" + the five seams).
Full agent + channels::runtime lib suite: 980/980 pass.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughConsolidates three agent loops (Agent::turn, run_tool_call_loop, run_subagent) into a single run_turn_engine with pluggable seams: ToolSource, ProgressReporter, TurnObserver, CheckpointStrategy, and ResponseParser; three call sites adapt by constructing concrete seam implementations and delegating iteration control to the engine. ChangesUnified Turn Engine Consolidation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…t-turn-engine # Conflicts: # src/openhuman/agent/harness/subagent_runner/ops.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/harness/engine/core.rs`:
- Around line 85-86: The boolean use_native_tools is computed once outside the
loop so it won't reflect lazy registrations; recompute it each iteration right
before it's used instead of once at declaration—replace the fixed
use_native_tools with a local per-iteration evaluation using
provider.supports_native_tools() && !tools.request_specs().is_empty() inside the
loop (where use_native_tools is read), and make the same change in the other
occurrence around the block referenced by the code using lines ~216-220 so
native-tool enablement follows dynamic changes to tools.request_specs().
- Around line 536-548: The max-iteration success path in the turn engine returns
a TurnEngineOutcome after checkpoint.on_max_iter(...) without emitting the
lifecycle event; call progress.turn_completed(...) with the same outcome data
(text/co.text, iterations/max_iterations, cost/turn_cost, hit_cap=true) after
folding usage (turn_cost.add_call and observer.record_usage) and before
returning so consumers receive the terminal event; locate this in the block
handling checkpoint.on_max_iter and ensure progress.turn_completed is invoked
with the constructed TurnEngineOutcome (or equivalent parameters) prior to
Ok(TurnEngineOutcome { ... }).
In `@src/openhuman/agent/harness/engine/tools.rs`:
- Around line 290-296: The tracing::warn call is logging raw tool output
(output) before redaction; update the logic so you scrub sensitive data first
(use scrub_credentials(&output) into scrubbed) and then use that scrubbed value
in the tracing::warn message and pass scrubbed (not output) into
crate::openhuman::tokenjuice::compact_tool_output so no unsanitized tool payload
is emitted; locate the logging and compact call around the agent_loop/tool
handling (references: tracing::warn, scrub_credentials, compact_tool_output,
call.name) and replace uses of output with the scrubbed value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2e06d35c-10f1-4189-a4df-0918908173f1
📒 Files selected for processing (19)
gitbooks/developing/architecture/agent-harness.mdsrc/openhuman/agent/harness/engine/checkpoint.rssrc/openhuman/agent/harness/engine/core.rssrc/openhuman/agent/harness/engine/mod.rssrc/openhuman/agent/harness/engine/parser.rssrc/openhuman/agent/harness/engine/progress.rssrc/openhuman/agent/harness/engine/state.rssrc/openhuman/agent/harness/engine/tool_source.rssrc/openhuman/agent/harness/engine/tools.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/session/agent_tool_exec.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/mod.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/turn_engine_adapter.rssrc/openhuman/agent/harness/session/types.rssrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/agent/harness/tool_loop.rssrc/openhuman/agent/harness/tool_loop_tests.rs
- core.rs: recompute native-tool enablement each iteration (a ToolSource may register tools lazily mid-turn, so it can flip off->on). - core.rs: emit turn_completed on the max-iteration checkpoint exit too, so consumers always get a terminal lifecycle event; drop the now-duplicate manual emit in Agent::turn's post-loop. - tools.rs: scrub the failing tool output before logging it (avoid leaking credentials/PII into logs). harness + agent + subagent suites: 229/229 pass.
…flow tokio worker stack The unified turn engine (PR tinyhumansai#3012) made `run_turn_engine` a ~600-line async fn. When the orchestrator delegates to a sub-agent — e.g. the `chat-harness-subagent` Playwright spec hitting `researcher` via `dispatch_subagent` → `run_subagent` → `run_typed_mode` → `run_inner_loop` → child `run_turn_engine` — the parent's polling stack carries the child's engine state machine inline on top of its own, plus `run_typed_mode`'s ~1000-line state. The sum crosses tokio's 2 MiB default worker stack and the core aborts with: thread 'tokio-rt-worker' has overflowed its stack fatal runtime error: stack overflow, aborting That kills the openhuman-core process mid-test, and every alphabetically later spec in the Playwright `web lane 1/4` shard cascades into `ECONNREFUSED 127.0.0.1:17788` — visible across many unrelated PRs. Fix the root cause at the two unboxed recursion boundaries inside `run_typed_mode`: - box-pin the `run_inner_loop` call so its (engine-wrapping) state lives on the heap independently of `run_typed_mode` - box-pin the child `run_turn_engine` call inside `run_inner_loop` so the child engine's poll frame doesn't pile on top of the parent's Adds `nested_subagent_dispatch_runs_on_a_constrained_worker_stack` as a smoke test that exercises the exact recursion shape (outer subagent → tool exec → inner subagent) on a 1 MiB worker stack so refactors that inline these boxes away are caught at `cargo test` time. The deep end-to-end catcher remains the existing `chat-harness-subagent.spec.ts` Playwright spec, which is exactly what surfaced this bug. No tests are skipped, marked flaky, or otherwise hidden — this fixes the underlying Rust crash so the lane-1 spec and its cascade run for real.
Summary
Agent::turn(web/desktop chat),run_tool_call_loop(other channels + triage), and the subagentrun_inner_loop— into a singleengine::run_turn_enginethat all three drive.ToolSource,ProgressReporter,TurnObserver,CheckpointStrategy,ResponseParser.run_one_tool/run_agent_tool_call), the repeated-failure circuit breaker, and theProviderDelta → AgentProgressstream forwarder are now shared by all three.tool_loop.rsis a ~30-line shim, the subagent + Agent loops are thin adapters.run_subagent/setup).Problem
The same agentic loop (call LLM → parse tool calls → execute tools → append results → repeat to final text / cap) was implemented three times and had drifted: the repeated-failure circuit breaker existed in two of three; streaming in two of three; the subagent computed tool success via a fragile
!starts_with("Error"); each carried its own copy of the per-call approval-gate + audit + tokenjuice path. Every fix had to be applied up to three times.Solution
A single loop in
src/openhuman/agent/harness/engine/, parameterized by trait seams:ToolSource— which tools are advertised (request_specs) + how a call executes (execute_call). Impls:RegistryToolSource(channels/CLI/triage),SubagentToolSource(lazy toolkit resolution + allowlist + progressive-disclosure handoff),AgentToolSource(session policy + per-call permission +execute_with_options).ProgressReporter—TurnProgress(top-levelTurn*events + streaming) vsSubagentProgress(nestedSubagent*events, no streaming) vsNullProgress.TurnObserver— caller-specific side effects (context management, transcript persistence, typed-history rebuild, worker-thread mirroring). The channel loop usesNullObserver; the subagent + Agent rebuild their own state.CheckpointStrategy—ErrorCheckpoint(typedMaxIterationsExceeded) vs summarize-on-cap for the subagent + Agent.ResponseParser—DefaultParser(native + XML fallback) vsDispatcherParser(the Agent'sToolDispatcher, incl. PFormat's positional grammar).Agent::turnkeeps its richer state: it materializes the engine'sChatMessagebuffer from its typedConversationMessagehistory each iteration (via an observer that holds&mut Agent), rebuilds the typed history (withreasoning_contentround-trip +dispatcher.format_results), and persists the transcript — while the engine owns the loop.Landed as 8 behavior-preserving commits (extract shared helpers → engine + channel shim → generalize seams → subagent → Agent groundwork → Agent → cleanup), each green.
Submission Checklist
turn_errors_on_empty_*,turn_emits_checkpoint_at_max_iterations, circuit-breaker, policy-deny, unknown-tool);tool_loop_tests.rsupdated for the shim.Impact
TurnStarted+TurnCostUpdatedprogress events and runs multimodal prep.!starts_with("Error")); sub-agent text-mode<tool_result>blocks drop thestatus=attribute.Agent.tool_dispatcherchangedBox→Arc(internal only).Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:check— no frontend/TS changes in this PR.pnpm typecheck— no frontend/TS changes.cargo test --lib(10,564 passed; 2 pre-existing/environmental failures incomposio::tools+ macOScwd_jailseatbelt, untouched by this diff).cargo fmtapplied;cargo check --manifest-path Cargo.tomlclean; clippy clean on changed files.cargo check --manifest-path app/src-tauri/Cargo.tomlclean.Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
status=).Parity Contract
ConversationMessagehistory,reasoning_contentround-trip, KV-cache transcript prefix,EmptyProviderResponseerror, and dispatcher (native/XML/PFormat) parse are all preserved.MAX_SPAWN_DEPTH) + "sub-agents can't spawn" untouched (live inrun_subagentsetup); approval gate + audit rows shared viarun_one_tool; force-text-mode preserved via advertising zero specs.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests