Skip to content

feat(agent): progressive-disclosure handoff + token-based summarizer threshold (#574)#586

Merged
senamakel merged 6 commits intotinyhumansai:mainfrom
sanil-23:feat/574-summarizer-subagent
Apr 17, 2026
Merged

feat(agent): progressive-disclosure handoff + token-based summarizer threshold (#574)#586
senamakel merged 6 commits intotinyhumansai:mainfrom
sanil-23:feat/574-summarizer-subagent

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented Apr 15, 2026

Summary

  • Dedicated summarizer sub-agent that compresses oversized tool results before they enter the orchestrator's history (the core Implement summarizer sub-agent to prevent orchestrator context overflows #574 ask). Thresholds now in tokens (default 500 000; cap 2 000 000) rather than bytes — serde aliases keep old byte-named fields loading.
  • Progressive-disclosure handoff for typed sub-agents (skills_agent): payloads over 20 000 tokens are stashed in a per-spawn cache and replaced in history with a short placeholder + extract_from_result(result_id, query) tool. Chunked map-reduce extraction with bounded concurrency (buffer_unordered(3)).
  • Text-mode dispatcher fallback for skills_agent so large Composio toolkits (Gmail, Notion, GitHub, Salesforce, HubSpot, Google Workspace) survive the provider's 65 535-grammar-rule ceiling.
  • Per-toolkit fuzzy-filter top_k split: 12 for heavy-schema toolkits, 25 default.
  • csv_export removed from skills_agent extra_tools (kept file_write); it was triggering superfluous export calls after extraction had already answered.

Problem

Two distinct failure modes were observed in staging once Composio toolkits were exercised end-to-end:

  1. Context-budget exhaustion on the orchestrator. Large tool payloads (Gmail / Notion list operations routinely return hundreds of KB or multiple MB of JSON) landed verbatim in orchestrator history, burning context on raw dumps. The only guardrail, tool_result_budget_bytes, hard-truncated mid-payload and lost anything past the cut — no identifiers preserved, no structure.
  2. Provider rejection on skills_agent spawn. When the orchestrator delegated work to skills_agent over a heavy toolkit, the combined per-action JSON schemas in tools: [...] exceeded Fireworks' uint16_t grammar-rule cap (65 535) and the provider returned 400 before any generation ran. Fuzzy filtering the toolkit down didn't help — even 16 Notion actions produced 101 295 rules.
  3. Secondary failure on the sub-agent. Even with grammar-decoding sidestepped, oversized tool results on the sub-agent side still overflowed the 196 607-token model context on the sub-agent's second iteration (observed: 276k-token prompt on a Gmail follow-up).

Solution

Summarizer sub-agent (payload_summarizer.rs, orchestrator-scoped)

Dedicated summarizer sub-agent (model.hint = "summarization") automatically dispatched when a tool result exceeds summarizer_payload_threshold_tokens. Compresses per an extraction contract that preserves identifiers; compressed summary replaces the raw payload. Session-scoped circuit breaker (3 consecutive failures disables for the session). Scoped to orchestrator only — welcome / researcher / planner etc. receive None. Runtime-dispatched only; orchestrator_tools::collect_orchestrator_tools filters the summarizer id so the orchestrator's LLM never sees a delegate_summarizer tool.

Progressive-disclosure handoff (subagent_runner.rs, skills_agent-scoped)

When skills_agent is spawned with a Composio toolkit, a per-spawn ResultHandoffCache + new ExtractFromResultTool are wired in. Any tool result over HANDOFF_OVERSIZE_THRESHOLD_TOKENS (20 000) is stashed and replaced in history with a placeholder:

[oversized tool output: 187432 bytes — stashed as result_id="res_1"]
Preview (first 1500 chars):
...
If the preview does not answer your task, call:
extract_from_result(result_id="res_1", query="<specific question>")

Three options for the sub-agent:

  1. Answer from the preview (zero extra cost).
  2. Call extract_from_result(result_id, query) — dispatches the summarizer against the cached payload with query as the extraction contract. For content over SUMMARIZER_CHUNK_CHAR_BUDGET (60 000 chars) runs a chunked map-reduce: split at blank-line / newline boundaries, map each chunk to a per-chunk partial with bounded concurrency (buffer_unordered(3) — unbounded join_all was storming the gateway with 504s), then reduce partials in one more summarizer turn. Falls back to concatenation if the reduce stage fails.
  3. Re-call the original tool with a narrower filter.

Cache is bounded at 8 entries (FIFO eviction) per sub-agent spawn.

Text-mode dispatcher for skills_agent

run_inner_loop now detects skills_agent and omits tools: [...] from the API request, instead appending an <tool_call>{...}</tool_call> XML protocol block with compact per-tool parameter summaries into the system prompt. Tool calls are parsed from the model's free-form response via the shared parse_tool_calls helper. Mirrors XmlToolDispatcher behaviour. Fireworks never compiles a grammar from the tool list, so the rule-count ceiling stops mattering.

Per-toolkit top-K

HEAVY_SCHEMA_TOOLKITS = [gmail, notion, github, salesforce, hubspot, googledrive, googlesheets, googledocs, microsoftteams]top_k = 12. All other toolkits → top_k = 25 (existing default).

csv_export removed from skills_agent.extra_tools

The export tool was triggering superfluous file writes after extraction had already produced a complete answer. file_write alone covers the Path B export case. The tool implementation remains for anyone who needs to re-enable it.

Submission Checklist

  • Unit testscargo test green across payload_summarizer::tests (6), tool_loop::tests (10, including run_tool_call_loop_intercepts_oversized_tool_results_via_summarizer), subagent_runner::tests (19), builtin_definitions::tests (5), csv_export::tests (7).
  • E2E / integration — Manually verified on staging: handoff fires at 90k tokens on Notion, chunked extraction completes, skills_agent returns from preview in ~14 s on the fast path. No automated E2E spec added in this PR.
  • Doc comments — Module-level rustdoc on payload_summarizer.rs; item-level docs on ResultHandoffCache, ExtractFromResultTool, build_text_mode_tool_instructions, summarise_parameters, chunk_content, top_k_for_toolkit, and build_handoff_placeholder.
  • Inline comments — Grep-friendly prefixes ([subagent_runner:text-mode], [subagent_runner:handoff], [extract_from_result], [payload_summarizer], [agent::builder]) at every decision point, with structured fields (raw_tokens, raw_bytes, chunk_count, chunk_budget, result_id, threshold_tokens, top_k).

Impact

  • Runtime: Rust core only; no Tauri shell changes. Orchestrator unaffected for small payloads (< 500k tokens). skills_agent routing swaps to text mode when toolkit is present — slight reliability trade vs. native structured tool calls, offset by being able to run at all on large toolkits (Notion was returning 400 before).
  • Performance: Progressive disclosure pays ~N+1 summarizer round-trips only when the sub-agent calls extract_from_result; trivial queries answerable from the preview pay zero extra. Bounded parallelism (3) trades some wall-clock time for reliability under gateway limits.
  • Security: No new surface area; tools remain sandboxed per existing SecurityPolicy. Handoff cache is per-spawn (not global) and lives only for the duration of run_typed_mode.
  • Migration: #[serde(alias = "summarizer_payload_threshold_bytes")] / "summarizer_max_payload_bytes" keep existing config.toml files parsing cleanly through the rename.
  • Backward compat: All existing callers of run_inner_loop updated to pass None for the new handoff_cache parameter — zero existing tests broke.

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Summarizer agent: automatic compression/summarization of very large tool outputs
    • CSV export tool: export JSON arrays to CSV files
    • Skills Agent can persist files (file write) and export CSVs
  • Improvements

    • Smarter handling of oversized tool results: extraction, chunking, and cached handoff for large outputs
    • Orchestrator integrates payload summarization for more efficient processing
  • Configuration

    • New thresholds to control when automatic summarization triggers

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

Warning

Rate limit exceeded

@sanil-23 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 37 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 37 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d6665e14-9af7-463c-bd6f-d6fc5a07652f

📥 Commits

Reviewing files that changed from the base of the PR and between 79540c4 and 299233d.

📒 Files selected for processing (4)
  • app/src/pages/Conversations.tsx
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/subagent_runner.rs
📝 Walkthrough

Walkthrough

Introduces a runtime-dispatched "summarizer" sub-agent and payload-summarizer plumbing to compress oversized tool outputs before history/orchestrator consumption; adds AgentDefinition.extra_tools, skills_agent export allowance, CSV export tool, progressive-disclosure handoff, and related session/tool-loop/subagent-runner changes and tests.

Changes

Cohort / File(s) Summary
Built-in agents
src/openhuman/agent/agents/mod.rs, src/openhuman/agent/agents/summarizer/agent.toml, src/openhuman/agent/agents/summarizer/prompt.md
Adds summarizer built-in agent (runtime-dispatched) with prompt and agent.toml; updates builtin count/tests.
Orchestrator registration
src/openhuman/agent/agents/orchestrator/agent.toml, src/openhuman/tools/orchestrator_tools.rs
Registers summarizer as a subagent and prevents synthesis of delegate tools for it (skip delegate generation).
AgentDefinition & harness
src/openhuman/agent/harness/definition.rs, src/openhuman/agent/harness/builtin_definitions.rs, src/openhuman/agent/harness/mod.rs
Adds extra_tools: Vec<String> to AgentDefinition with serde default; initializes in test fixtures; exposes new payload_summarizer harness module.
Skills agent & prompt
src/openhuman/agent/agents/skills_agent/agent.toml, src/openhuman/agent/agents/skills_agent/prompt.md
Adds extra_tools = ["file_write"]; prompt adds "Handling Oversized Tool Results" rules (extract vs persist) and response-length limits.
Payload summarizer implementation
src/openhuman/agent/harness/payload_summarizer.rs
New SubagentPayloadSummarizer + PayloadSummarizer trait: token estimation, prompt construction, circuit breaker, chunking, tests, returns SummarizedPayload on success.
Session wiring
src/openhuman/agent/harness/session/builder.rs, src/openhuman/agent/harness/session/types.rs
AgentBuilder/Agent gain optional payload_summarizer field and setter; orchestrator wiring constructs SubagentPayloadSummarizer when enabled; skills_agent dispatcher override in builder.
Tool loop & turn integration
src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/bus.rs, src/openhuman/agent/harness/tests.rs
Threaded optional payload_summarizer through run_tool_call_loop/execute_tool_call; tool outputs may be summarized before history insertion; call sites/tests updated with extra arg.
Subagent runner & handoff cache
src/openhuman/agent/harness/subagent_runner.rs
Large refactor: dynamic toolkit top-K, skills_agent text-mode XML protocol, parsing of XML tool calls, progressive-disclosure handoff cache, extract_from_result injection, chunked summarization and concurrency controls, many utilities and behavior changes.
CSV export tool
src/openhuman/tools/impl/filesystem/csv_export.rs, src/openhuman/tools/impl/filesystem/mod.rs, src/openhuman/tools/ops.rs
New CsvExportTool (RFC4180 formatting, JSON→CSV, security/path checks, symlink protection) and registration in all_tools_with_runtime; tests added.
Configuration & defaults
src/openhuman/config/schema/context.rs
Adds summarizer_payload_threshold_tokens and summarizer_max_payload_tokens to ContextConfig with defaults and serde aliases.
Tests & fixtures
src/openhuman/channels/runtime/dispatch.rs, src/openhuman/context/debug_dump.rs, various tests
Updated test fixtures to include extra_tools: vec![]; updated run_tool_call_loop test call sites to match new signature.
Frontend minor change
app/src/pages/Conversations.tsx
Silence timer refactor: separate sendingThreadIdRef, armSilenceTimer helper, and effect to re-arm on inference status changes.

Sequence Diagram(s)

sequenceDiagram
    participant Tool as Tool impl
    participant Agent as Agent/run_tool_call_loop
    participant Summarizer as PayloadSummarizer
    participant Subagent as Summarizer Subagent
    participant Registry as AgentDefinitionRegistry

    Tool->>Agent: execute -> returns large payload
    Agent->>Summarizer: maybe_summarize(tool_name, parent_hint, raw_payload)
    Summarizer->>Registry: lookup "summarizer" definition
    Registry-->>Summarizer: AgentDefinition (if present)
    Summarizer->>Subagent: dispatch summarizer subagent with wrapped prompt
    Subagent-->>Summarizer: summary (or error/none)
    alt summary smaller & valid
        Summarizer-->>Agent: Ok(Some(SummarizedPayload{summary,...}))
        Agent->>History: insert summarized payload into history
    else summarization failed or skipped
        Summarizer-->>Agent: Ok(None) or Err
        Agent->>History: insert original payload (possibly truncated later)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • senamakel
  • graycyrus
  • M3gA-Mind

"I nibble bytes and make them small,
I hop through logs and prune the sprawl,
I stitch the facts, preserve each link,
I hide the noise and keep the ink.
A rabbit’s hush — big payloads fall to neat, tiny thrall." 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main features: a progressive-disclosure handoff mechanism and token-based summarizer threshold implementation for handling oversized tool results.
Linked Issues check ✅ Passed All key coding objectives from #574 are implemented: dedicated summarizer sub-agent with token-based thresholds, extraction contract with required identifiers/facts, chunking + map-reduce logic, circuit breaker safeguards, observability logging, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly related to the summarizer implementation and progressive-disclosure handoff. Minor related changes include text-mode dispatcher fallback for skills_agent, per-toolkit fuzzy-filter tuning, config schema updates, and frontend silence timer fix—all supporting the main objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sanil-23 sanil-23 marked this pull request as draft April 15, 2026 21:45
@sanil-23 sanil-23 force-pushed the feat/574-summarizer-subagent branch from ae2f7e2 to 46d6e15 Compare April 17, 2026 07:42
sanil-23 and others added 3 commits April 17, 2026 13:26
Issue tinyhumansai#574.

Before this change, tool results larger than a few KB landed verbatim
in orchestrator history, burning context budget on raw JSON/HTML/file
dumps. The only guardrail was tool_result_budget_bytes which
hard-truncated mid-payload, dropping everything past the cut.

This PR introduces a dedicated `summarizer` sub-agent
(model.hint = "summarization") that the runtime automatically
dispatches whenever a tool result exceeds
summarizer_payload_threshold_bytes (default 100 KB). The summarizer
compresses the payload per an extraction contract that preserves
identifiers and key facts, and the compressed summary replaces the
raw payload before it enters agent history. Payloads above
summarizer_max_payload_bytes (default 5 MB) skip summarization and
fall through to the existing truncation path — paying for an LLM
call on a 5 MB blob is counterproductive.

Scoped to the orchestrator session only. Welcome, skills_agent,
researcher, planner, and every other typed sub-agent get None and
their tool results are untouched. Gated in
build_session_agent_inner by checking agent_id == "orchestrator".

Instrumented at both tool-loop paths:
  - run_tool_call_loop in tool_loop.rs (event-bus/channels path)
  - Agent::execute_tool_call in session/turn.rs (web channel path
    via run_single)

Both paths call into the same `PayloadSummarizer::maybe_summarize`
trait so the threshold check, circuit breaker (3 consecutive
failures disables for the session), and sub-agent dispatch policy
live in exactly one place (agent/harness/payload_summarizer.rs).

The summarizer sub-agent is runtime-dispatched only — it is NOT
exposed as a delegation tool to the orchestrator's LLM. It is listed
in the orchestrator's `subagents = [...]` for explicit registration,
but `collect_orchestrator_tools` filters out the `summarizer` id and
never synthesises a `delegate_summarizer` tool.

Files:
  * NEW: src/openhuman/agent/agents/summarizer/{agent.toml,prompt.md}
  * NEW: src/openhuman/agent/harness/payload_summarizer.rs (trait +
    SubagentPayloadSummarizer impl + circuit breaker + tests)
  * src/openhuman/agent/agents/mod.rs — register summarizer in
    BUILTINS
  * src/openhuman/agent/agents/orchestrator/agent.toml — list
    summarizer in subagents (runtime-only, no LLM delegation tool)
  * src/openhuman/agent/harness/mod.rs — declare module
  * src/openhuman/agent/harness/builtin_definitions.rs — expect
    summarizer in `expected_builtin_ids_are_present`
  * src/openhuman/agent/harness/tool_loop.rs — thread Option<&dyn
    PayloadSummarizer> into run_tool_call_loop + agent_turn,
    intercept at the tool-execution success site, plus a new
    integration test using a MockSummarizer
  * src/openhuman/agent/harness/tests.rs — thread None through the
    existing run_tool_call_loop callsites
  * src/openhuman/agent/harness/session/turn.rs — same interception
    in Agent::execute_tool_call
  * src/openhuman/agent/harness/session/types.rs — payload_summarizer
    field on Agent and AgentBuilder
  * src/openhuman/agent/harness/session/builder.rs —
    .payload_summarizer() setter + orchestrator-only construction in
    build_session_agent_inner
  * src/openhuman/agent/bus.rs — pass None for the bus path
  * src/openhuman/config/schema/context.rs —
    summarizer_payload_threshold_bytes + summarizer_max_payload_bytes
    on ContextConfig
  * src/openhuman/tools/orchestrator_tools.rs — filter out the
    summarizer id from delegation-tool synthesis

Tests: unit tests in payload_summarizer pin the pass-through rules
(below threshold, above max cap, breaker tripped) and the prompt
construction. Integration test in tool_loop::tests uses a
MockSummarizer to verify the interception wires through end-to-end.

Closes tinyhumansai#574.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a Composio tool returns a large payload inside skills_agent,
the agent now has two prompt-driven paths:

  Path A — user wants an answer derived from the data:
    skills_agent extracts the answer in its next iteration
    and returns a targeted response (no file I/O needed).

  Path B — user wants the actual raw dataset:
    skills_agent calls csv_export (tabular data) or file_write
    (non-tabular) to persist the output to workspace/exports/,
    then returns a summary + file path.

Changes:
  * NEW: src/openhuman/tools/impl/filesystem/csv_export.rs
    — CsvExportTool: parses JSON array, formats as CSV, writes
    to workspace/exports/{filename}. Handles missing keys
    (empty cells), nested values (JSON-serialised), and optional
    column ordering. Sandboxed via SecurityPolicy.
  * src/openhuman/tools/impl/filesystem/mod.rs — wire module
  * src/openhuman/tools/ops.rs — register CsvExportTool
  * src/openhuman/agent/harness/definition.rs — new extra_tools
    field on AgentDefinition, allowing named system tools to
    bypass category_filter
  * src/openhuman/agent/harness/subagent_runner.rs — inject
    extra_tools into allowed_indices after category filtering
  * src/openhuman/agent/agents/skills_agent/agent.toml
    — add file_write + csv_export via extra_tools
  * src/openhuman/agent/agents/skills_agent/prompt.md
    — new "Handling Oversized Tool Results" section with
    Path A (extract answer) vs Path B (export file) decision
    tree, intent-detection heuristics, and examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…based summarizer threshold (tinyhumansai#574)

Two layered fixes for tinyhumansai#574 plus supporting changes.

## Summarizer thresholds in tokens, not bytes

Renamed `ContextConfig::summarizer_payload_threshold_bytes` ->
`summarizer_payload_threshold_tokens` (default 500 000, was 100 KB in
bytes) and `summarizer_max_payload_bytes` -> `summarizer_max_payload_tokens`
(default 2 000 000). Token count is estimated at ~4 chars/token via a
local helper that mirrors `tree_summarizer::types::estimate_tokens`.
Serde aliases on both fields keep existing config.toml files parsing.
`payload_summarizer.rs` now compares token estimates for the threshold
and cap, with both `tokens=` and `bytes=` surfaced in log events.

## Progressive-disclosure handoff for skills_agent

`skills_agent` typed subagents (when spawned with a Composio toolkit)
now receive a per-spawn `ResultHandoffCache` and a new built-in
`extract_from_result` tool. When a per-action tool returns more than
`HANDOFF_OVERSIZE_THRESHOLD_TOKENS` (20 000 tokens), the raw payload
is stashed in the cache and a compact placeholder replaces it in
history:

    [oversized tool output: 187432 bytes - stashed as result_id="res_1"]
    Preview (first 1500 chars):
    ...
    If the preview does not answer your task, call:
    extract_from_result(result_id="res_1", query="<specific question>")

The subagent can answer from the preview, call `extract_from_result`
with a targeted query, or re-call the original tool with tighter
filters. `extract_from_result` dispatches the `summarizer` subagent
against the cached payload with the query as the extraction contract.

For payloads over `SUMMARIZER_CHUNK_CHAR_BUDGET` (60 000 chars) the
extractor runs a chunked map-reduce:
- split at blank-line / newline boundaries
- map each chunk to a per-chunk partial via bounded concurrency
  (`buffer_unordered(3)`) to avoid storming the provider gateway
- reduce partials in one more summarizer turn, or fall back to the
  concatenation if the reduce stage fails

Falls back gracefully when the summarizer definition is missing from
the registry (placeholder + preview still work; just no extractor).

## Text-mode dispatcher for skills_agent + tighter tool filter

Provider-side grammar decoders (Fireworks etc.) cap tool-schema rules
at 65 535 (uint16_t). Large Composio toolkits - Gmail, Notion, GitHub,
Salesforce, HubSpot, Google Workspace - ship per-action JSON schemas
dense enough that even a handful of them exceed the cap.

- `subagent_runner::run_inner_loop` now detects `skills_agent` and
  omits `tools: [...]` from the API request, instead appending an
  `<tool_call>{...}</tool_call>` XML protocol block (with compact
  per-tool parameter summaries) into the system prompt. Tool calls
  are parsed out of the model's free-form response via the shared
  `parse_tool_calls` helper. Mirrors XmlToolDispatcher behaviour.
- `top_k_for_toolkit()` picks 12 for HEAVY_SCHEMA_TOOLKITS (the six
  listed above plus googledocs, googlesheets, microsoftteams) vs the
  default 25 for lighter toolkits.

## Skills_agent agent.toml / prompt changes

- Dropped `csv_export` from `skills_agent.extra_tools` (kept
  `file_write`) - the export tool was triggering superfluous file
  writes after extraction had already answered the query.
- `builtin_definitions::skills_agent_has_extra_tools_for_export` test
  inverted to assert `csv_export` is NOT present.
- `prompt.md` re-aligned: Path B now references only `file_write`.

## Wiring

- `session/builder.rs`: threshold rename wiring (`_tokens`).
- `subagent_runner.rs::run_inner_loop`: new `handoff_cache` parameter
  (threaded through both call sites; fork-mode passes `None`).

## Tests

All green:
- `payload_summarizer::tests`: 6/6 (thresholds in tokens)
- `tool_loop::tests`: 10/10 (MockSummarizer integration)
- `subagent_runner::tests`: 19/19
- `builtin_definitions::tests`: 5/5 (csv_export removal)
- `csv_export::tests`: 7/7 (implementation kept, wiring removed)

Closes tinyhumansai#574.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 force-pushed the feat/574-summarizer-subagent branch from 46d6e15 to 13c9ca9 Compare April 17, 2026 14:43
@sanil-23 sanil-23 marked this pull request as ready for review April 17, 2026 14:46
@sanil-23 sanil-23 changed the title feat(agent): summarizer sub-agent compresses oversized tool results (#574) feat(agent): progressive-disclosure handoff + token-based summarizer threshold (#574) Apr 17, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (9)
src/openhuman/tools/ops.rs (1)

291-712: Missing test coverage for csv_export registration.

Given CsvExportTool is now unconditionally registered in all_tools_with_runtime, a one-line regression guard (mirroring the existing all_tools_includes_spawn_subagent and all_tools_includes_complete_onboarding tests) would prevent accidental removal. Not blocking.

Proposed test
+    #[test]
+    fn all_tools_includes_csv_export() {
+        let tmp = TempDir::new().unwrap();
+        let security = Arc::new(SecurityPolicy::default());
+        let mem_cfg = MemoryConfig { backend: "markdown".into(), ..MemoryConfig::default() };
+        let mem: Arc<dyn Memory> = Arc::from(
+            crate::openhuman::memory::create_memory(&mem_cfg, tmp.path(), None).unwrap(),
+        );
+        let browser = BrowserConfig::default();
+        let http = crate::openhuman::config::HttpRequestConfig::default();
+        let cfg = test_config(&tmp);
+        let tools = all_tools(
+            Arc::new(Config::default()), &security, mem, None, None,
+            &browser, &http, tmp.path(), &HashMap::new(), None, &cfg,
+        );
+        let names: Vec<&str> = tools.iter().map(|t| t.name()).collect();
+        assert!(names.contains(&"csv_export"), "got: {names:?}");
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/ops.rs` around lines 291 - 712, The test suite lacks a
regression guard ensuring CsvExportTool remains registered: add a new unit test
in the existing tests mod (same style as all_tools_includes_spawn_subagent) that
constructs the environment (TempDir, SecurityPolicy, Memory via create_memory,
BrowserConfig and HttpRequestConfig, test_config), calls all_tools (or
all_tools_with_runtime if you want runtime-specific registration) and asserts
the returned tools' names contain "csv_export"; reference CsvExportTool,
all_tools_with_runtime (or all_tools) and the tests mod to place the one-line
assertion mirroring the existing spawn_subagent/complete_onboarding tests.
src/openhuman/config/schema/context.rs (1)

75-104: Consider documenting/validating the threshold ≤ max invariant.

Nothing enforces summarizer_payload_threshold_tokens <= summarizer_max_payload_tokens. If a user inverts them, payloads in between would both "trigger summarization" and "skip summarization" — behavior depends on evaluation order in PayloadSummarizer::maybe_summarize. A debug-level validation log (or clamp) on Config load would catch misconfigurations early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/config/schema/context.rs` around lines 75 - 104, Add a
validation step on config load to ensure summarizer_payload_threshold_tokens <=
summarizer_max_payload_tokens: in the Context config
construction/deserialization path (where summarizer_payload_threshold_tokens and
summarizer_max_payload_tokens are set) check the invariant and either clamp
threshold to max or emit a debug/error log and adjust/fail; reference the two
fields by name and mention PayloadSummarizer::maybe_summarize so reviewers can
find the related runtime behavior. Ensure the validation runs once after
deserialization (or in a Context::validate / Context::new) and use a clear log
message indicating the original and adjusted values if you choose to clamp.
src/openhuman/agent/harness/mod.rs (1)

34-34: Module declaration placement deviates from repo convention.

Per coding guidelines, new functionality should live in a dedicated subdirectory module (e.g. payload_summarizer/mod.rs) rather than a standalone payload_summarizer.rs. This is a minor concern here since the file sits under harness/ rather than directly at src/openhuman/, and the surrounding harness siblings (parse.rs, credentials.rs, tool_loop.rs, self_healing.rs) follow the same flat-file pattern, so this is locally consistent. Leaving as nit — promote to a folder if the module grows to have tests/helpers as separate files.

As per coding guidelines: "New functionality in Rust must live in a dedicated subdirectory module (e.g., openhuman/my_domain/mod.rs), not new standalone *.rs files at src/openhuman/ root".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/mod.rs` at line 34, The module declaration
pub(crate) mod payload_summarizer should be converted to a submodule directory:
move the current payload_summarizer.rs into a new folder named
payload_summarizer with a mod.rs (or lib.rs) inside, update the harness mod tree
so the top-level harness mod (mod.rs in harness) continues to declare pub(crate)
mod payload_summarizer, and ensure any internal tests/helpers are placed
alongside inside payload_summarizer/; this keeps the symbol payload_summarizer
as the module root while following the repo convention for expandable modules.
src/openhuman/tools/orchestrator_tools.rs (2)

71-84: Consider moving the runtime-only flag onto AgentDefinition instead of a hardcoded id match.

The string-literal "summarizer" check here creates a distributed contract: every future runtime-dispatched sub-agent must be added to this hardcoded list (and the comment explicitly invites that growth). A cleaner, self-documenting alternative is a boolean on AgentDefinition (e.g. runtime_dispatched_only: bool, defaulting to false with a serde default) sourced from the agent's own agent.toml. That keeps the "this agent is not LLM-visible" property co-located with the agent definition that declares it, and removes the need for orchestrator-tool code to hold a list of special ids.

Not blocking — the current implementation is correct for the single-agent case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/orchestrator_tools.rs` around lines 71 - 84, Replace the
hardcoded agent_id == "summarizer" check with a flag on the agent definition:
add a runtime_dispatched_only: bool (serde default false) field to
AgentDefinition (so it can be set from agent.toml), update the
orchestrator_tools.rs filter to consult agent_def.runtime_dispatched_only
instead of matching the string literal, and ensure any code that
constructs/deserializes AgentDefinition provides the default; this co-locates
the "not LLM-visible" property with AgentDefinition and removes the special-case
id check.

181-348: Missing test coverage for the new summarizer skip branch.

The new runtime-only filter at lines 71–84 has no dedicated unit test. The existing tests cover known agents, unknown agents, skills wildcards, and empty subagents — but not the case where an AgentId entry is explicitly skipped because it's runtime-only. Worth adding a small test that includes SubagentEntry::AgentId("summarizer".into()) in the subagents list and asserts no delegate_summarizer tool is synthesised.

🧪 Suggested test
#[test]
fn runtime_only_summarizer_is_never_exposed_as_delegation_tool() {
    let mut orch = def("orchestrator", "test", None);
    orch.subagents = vec![
        SubagentEntry::AgentId("researcher".into()),
        SubagentEntry::AgentId("summarizer".into()),
    ];
    let mut reg = registry_with_targets();
    reg.insert(def("summarizer", "runtime-only", None));
    let tools = collect_orchestrator_tools(&orch, &reg, &[]);
    let names: Vec<&str> = tools.iter().map(|t| t.name()).collect();
    assert_eq!(names, vec!["research"]);
    assert!(!names.iter().any(|n| n.contains("summarizer")));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/orchestrator_tools.rs` around lines 181 - 348, The tests
are missing coverage for the new runtime-only filter that should skip certain
agents at runtime; add a unit test that exercises collect_orchestrator_tools
with an orchestrator whose subagents include
SubagentEntry::AgentId("summarizer".into()) and a registry entry for
"summarizer" so you can assert the summarizer is not emitted as a delegation
tool (no "delegate_summarizer" in the returned tool names). Ensure the test uses
the existing def(), registry_with_targets()/reg.insert(...), and asserts that
only the expected tools (e.g., "research") are present and that
names.iter().any(|n| n.contains("summarizer")) is false.
src/openhuman/agent/harness/subagent_runner.rs (4)

958-982: starts_with("Error") is a fragile sentinel for "don't stash".

The handoff-cache skip relies on tool outputs that fail to begin with the literal "Error". A tool whose legitimate payload happens to start with that word (error-log exports, "Error codes reference" dumps from Notion/Gmail, etc.) will sidestep handoff and blow the context budget, while the equivalent path at line 985 will also misclassify it as status="error" in text mode.

Consider plumbing ToolResult.is_error (or a typed enum) through alongside the text so both sites use a structured signal rather than a prefix match. Non-blocking, but worth noting since the two checks are likely to drift over time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 958 - 982, The
code currently uses result_text.starts_with("Error") to decide not to stash
outputs; instead pass a structured error flag (e.g., ToolResult::is_error or an
enum) alongside the text from the tool invocation into the handoff logic and use
that flag in the condition here (the block referencing handoff_cache,
result_text, call.name, and build_handoff_placeholder) and at the other site
that sets status="error" so both use the same typed signal; update the call
sites that construct/return tool results to include the boolean/enum and replace
any starts_with("Error") checks with the new is_error property to avoid brittle
prefix matching.

1473-1500: Heavy-schema toolkit list is baked into source — consider a config surface.

HEAVY_SCHEMA_TOOLKITS + the two TOOL_FILTER_TOP_K_* consts require a code change + rebuild every time a new dense Composio toolkit trips the 65 535-rule / 196 607-token ceilings. Given the comment history ("determined empirically from fixture dumps + staging failures"), this list is expected to grow.

Options (in increasing invasiveness): (1) move to a const table keyed by toolkit slug → top_k; (2) expose via AgentConfig so operators can tune without a release; (3) derive dynamically from action count / average schema size in integration.tools. A simple size-based auto-classifier (e.g., "if avg parameters schema > N bytes, use HEAVY top_k") would obsolete the allow-list entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 1473 - 1500, The
hard-coded HEAVY_SCHEMA_TOOLKITS list and TOOL_FILTER_TOP_K_* constants should
be made configurable: add a mapping field to AgentConfig (e.g., toolkit_top_k:
HashMap<String, usize>) and change top_k_for_toolkit(toolkit: &str) to first
look up toolkit_top_k (case-insensitive key) and return that value, then fall
back to TOOL_FILTER_TOP_K_DEFAULT/TOOL_FILTER_TOP_K_HEAVY or a single default;
update callers to pass AgentConfig into top_k_for_toolkit or make it a method
(AgentConfig::top_k_for_toolkit) and persist existing constants as final
fallbacks so behavior is unchanged if config is absent. Ensure names referenced
are HEAVY_SCHEMA_TOOLKITS, TOOL_FILTER_TOP_K_DEFAULT, TOOL_FILTER_TOP_K_HEAVY,
and top_k_for_toolkit so reviewers can locate and remove/replace the hard-coded
list.

788-815: force_text_mode fires for every skills_agent run, not just toolkit-scoped ones.

The surrounding comment says "Scoped to skills_agent because that's the only path where we pass Composio toolkit schemas", but the check is agent_id == "skills_agent" && !tool_specs.is_empty(). When skills_agent is spawned without a toolkit_override (discovery / auth / list-connections flows), dynamic_tools is empty and the only specs are the lightweight composio_* meta-tools — no grammar-cap risk — yet native tool-calling is still disabled and the model is forced into the XML prose protocol.

run_typed_mode already computes is_skills_agent_with_toolkit; consider threading that down (or a dedicated flag) rather than re-deriving from agent_id inside run_inner_loop. This keeps text-mode targeted to the path that actually needs it and preserves native tool-calling's stricter formatting elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 788 - 815, The
current force_text_mode boolean is being derived from agent_id == "skills_agent"
&& !tool_specs.is_empty(), which incorrectly triggers text-mode for all
skills_agent runs (including ones with only composio_* meta-tools). Instead,
thread the already-computed is_skills_agent_with_toolkit (or a new explicit flag
from run_typed_mode/run_inner_loop) down into this module and replace the
derivation: set force_text_mode only when that flag is true and the
toolkit-specific dynamic tool specs are non-empty (not just any composio
meta-tools). Update the condition that sets force_text_mode (and any logic that
appends XML protocol via build_text_mode_tool_instructions) to use
is_skills_agent_with_toolkit (or the new dedicated flag) and ensure native tool
support (supports_native/request_tools) remains enabled for non-toolkit
skills_agent cases.

1416-1464: Missing unit coverage for the new pure helpers.

chunk_content, top_k_for_toolkit, build_handoff_placeholder, summarise_parameters, and first_line_truncated are all deterministic, dependency-free functions with nontrivial edge cases (single-line-over-budget char-boundary slicing, case-insensitive toolkit matching, required-param marker rendering, ellipsis truncation). None of them have tests in the mod tests block below.

Worth adding at least a handful of cases — particularly chunk_content with (a) content ≤ budget, (b) multi-line boundaries, (c) a single line longer than budget with multi-byte UTF-8 near the cut point, and (d) budget equal to content length. Cheap to write and guards against regressions when the summarizer/handoff thresholds are retuned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 1416 - 1464, Add
unit tests covering the pure helpers chunk_content, top_k_for_toolkit,
build_handoff_placeholder, summarise_parameters, and first_line_truncated in the
existing mod tests; specifically for chunk_content add cases: content shorter
than or equal to budget, multi-line content that splits at line boundaries, a
single line longer than budget that includes a multi-byte UTF-8 character near
the cut (to exercise char-boundary slicing), and budget exactly equal to content
length; for top_k_for_toolkit add case-insensitive toolkit matching and
tie/limit behavior; for build_handoff_placeholder and summarise_parameters add
tests for required-param marker rendering and expected summary strings; for
first_line_truncated add tests for strings shorter, equal, and longer than the
truncation length verifying ellipsis placement.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/agents/skills_agent/agent.toml`:
- Around line 13-18: Conflicting tests assert opposite expectations about
csv_export in extra_tools: decide the intended behavior and make the tests and
runtime registration consistent—either (A) if csv_export should be disabled by
default (preferred per the builtin_definitions.rs comment), remove CsvExportTool
from all_tools_with_runtime (ops.rs:78) and update the assertion in
agents::mod.rs that currently requires csv_export to assert it is absent, or (B)
if csv_export must be available, add "csv_export" to extra_tools in agent.toml
and update the builtin_definitions.rs test (and its explanatory comment) to
allow it; ensure the unique symbols CsvExportTool, extra_tools, and
all_tools_with_runtime are updated together so both tests agree with the chosen
design.

In `@src/openhuman/agent/agents/skills_agent/prompt.md`:
- Around line 48-50: The fenced code block containing the file_write snippet in
prompt.md lacks a language hint; update the triple-backtick fence around
file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted
markdown>) to include a language tag (e.g., change ``` to ```text) so
markdownlint MD040 is satisfied and renderers/lints treat it as plain text.

In `@src/openhuman/agent/agents/summarizer/prompt.md`:
- Around line 50-52: The prompt currently requires an exact byte count under the
header "## Original size" using the placeholder "<original_bytes>", which the
runtime does not supply; change the prompt text so it does not demand an exact
byte count — for example replace the placeholder and heading with
optional/approximate wording like "## Original size (if known)" or "Approximate
original size" and allow "unknown" or "approx." as valid values; update the
template lines containing "## Original size" and "<original_bytes>" to reflect
this optional/approximate wording so the model won't try to guess exact bytes,
or alternatively add a clear note that the runtime may inject the exact byte
count if a future Rust change provides it.

In `@src/openhuman/agent/bus.rs`:
- Around line 194-198: The bus handler currently passes None for
payload_summarizer which skips payload summarization for orchestrator turns
routed via channels; change the logic in the bus dispatch where
payload_summarizer is passed (look for the None argument alongside
target_agent_id and Agent::turn/agent.run_turn invocations) so that when
target_agent_id == "orchestrator" you supply the session's/AgentBuilder's
PayloadSummarizer instance instead of None (i.e., forward the same
payload_summarizer used by the AgentBuilder/session context so orchestrator
turns take the same summarization path).

In `@src/openhuman/agent/harness/builtin_definitions.rs`:
- Around line 122-137: The two tests disagree about whether "csv_export" should
be in skills_agent.extra_tools; update the assertion in
skills_agent_is_wildcard_with_skill_category_filter (in agents/mod.rs) to match
the intended state (csv_export removed) so both tests are consistent: locate the
assertion that checks for "csv_export" in that function and remove or invert it
so it no longer requires "csv_export" to be present, leaving the existing
assertion in skills_agent_has_extra_tools_for_export (which verifies absence) as
the canonical expectation.

In `@src/openhuman/agent/harness/payload_summarizer.rs`:
- Around line 326-333: The prompt construction in build_summarizer_prompt embeds
untrusted raw tool output directly between fixed sentinels (the --- BEGIN
---/--- END --- markers) which can be broken out of if raw contains
sentinel-like text; modify build_summarizer_prompt to sanitize or encode raw
before interpolation (e.g., base64 or URL-escape the payload) or replace the
sentinel wrapper with a length-delimited or structured wrapper (include a
"length: N" header or JSON field) so the summarizer cannot be tricked by
injected prompt-like tokens; ensure references to the variable raw and the
build_summarizer_prompt function are updated accordingly.

In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 718-759: The code currently passes r.output() directly to
payload_summarizer.maybe_summarize, risking leakage of credentials; call the
shared scrub_credentials routine on r.output() first and pass that sanitized
string into payload_summarizer.maybe_summarize (and if the summarizer returns
payload.summary, use the already-scrubbed summary or re-run scrub_credentials on
the summary before assigning to output). Update the block that initializes
output (currently let mut output = r.output()) to produce a scrubbed value, and
ensure all uses of payload_summarizer::maybe_summarize and the final (output,
true) return use the scrubbed text.

In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 246-261: The loop that adds definition.extra_tools into
allowed_indices ignores definition.tools (ToolScope) and thus can leak tools;
modify the logic in subagent_runner.rs so each extra_tools entry is also vetted
against definition.tools: when definition.tools is ToolScope::Named([...]) only
allow the tool if its name is listed there; when ToolScope::Wildcard allow as
before; still enforce disallowed_tools via disallow_set and existing
allowed_indices checks; reference definition.extra_tools, definition.tools
(ToolScope), definition.disallowed_tools, parent.all_tools, allowed_indices and
tool.name() to locate and update the condition.

In `@src/openhuman/config/schema/context.rs`:
- Around line 88-104: Update the migration notes/CHANGELOG to call out that the
serde aliases (summarizer_payload_threshold_bytes ->
summarizer_payload_threshold_tokens and summarizer_max_payload_bytes ->
summarizer_max_payload_tokens) preserve parsing but the interpreted units
changed from bytes to tokens; explicitly document that existing numeric values
will now be treated as token counts (give a concrete example such as 100_000
bytes becoming 100k tokens ≈ 400k chars), mention the default changes
(100k→500k), and add a short migration recommendation for users to review and
adjust these two config keys (summarizer_payload_threshold_tokens and
summarizer_max_payload_tokens) if they relied on byte-based tuning.

In `@src/openhuman/tools/impl/filesystem/csv_export.rs`:
- Around line 166-181: After confirming parsed.as_array() returns items, ensure
every element is a JSON object before proceeding: check items.iter().all(|v|
v.is_object()) and if any element is non-object return Err via ToolResult::error
with a clear message (e.g., "Data array must contain only JSON objects (rows)").
Do this check before calling resolve_columns(items, ...) and render_csv(...);
use the items variable and reject mixed arrays like [1,2,3] or arrays with
strings so we never pass non-object rows into resolve_columns or render_csv.

---

Nitpick comments:
In `@src/openhuman/agent/harness/mod.rs`:
- Line 34: The module declaration pub(crate) mod payload_summarizer should be
converted to a submodule directory: move the current payload_summarizer.rs into
a new folder named payload_summarizer with a mod.rs (or lib.rs) inside, update
the harness mod tree so the top-level harness mod (mod.rs in harness) continues
to declare pub(crate) mod payload_summarizer, and ensure any internal
tests/helpers are placed alongside inside payload_summarizer/; this keeps the
symbol payload_summarizer as the module root while following the repo convention
for expandable modules.

In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 958-982: The code currently uses result_text.starts_with("Error")
to decide not to stash outputs; instead pass a structured error flag (e.g.,
ToolResult::is_error or an enum) alongside the text from the tool invocation
into the handoff logic and use that flag in the condition here (the block
referencing handoff_cache, result_text, call.name, and
build_handoff_placeholder) and at the other site that sets status="error" so
both use the same typed signal; update the call sites that construct/return tool
results to include the boolean/enum and replace any starts_with("Error") checks
with the new is_error property to avoid brittle prefix matching.
- Around line 1473-1500: The hard-coded HEAVY_SCHEMA_TOOLKITS list and
TOOL_FILTER_TOP_K_* constants should be made configurable: add a mapping field
to AgentConfig (e.g., toolkit_top_k: HashMap<String, usize>) and change
top_k_for_toolkit(toolkit: &str) to first look up toolkit_top_k
(case-insensitive key) and return that value, then fall back to
TOOL_FILTER_TOP_K_DEFAULT/TOOL_FILTER_TOP_K_HEAVY or a single default; update
callers to pass AgentConfig into top_k_for_toolkit or make it a method
(AgentConfig::top_k_for_toolkit) and persist existing constants as final
fallbacks so behavior is unchanged if config is absent. Ensure names referenced
are HEAVY_SCHEMA_TOOLKITS, TOOL_FILTER_TOP_K_DEFAULT, TOOL_FILTER_TOP_K_HEAVY,
and top_k_for_toolkit so reviewers can locate and remove/replace the hard-coded
list.
- Around line 788-815: The current force_text_mode boolean is being derived from
agent_id == "skills_agent" && !tool_specs.is_empty(), which incorrectly triggers
text-mode for all skills_agent runs (including ones with only composio_*
meta-tools). Instead, thread the already-computed is_skills_agent_with_toolkit
(or a new explicit flag from run_typed_mode/run_inner_loop) down into this
module and replace the derivation: set force_text_mode only when that flag is
true and the toolkit-specific dynamic tool specs are non-empty (not just any
composio meta-tools). Update the condition that sets force_text_mode (and any
logic that appends XML protocol via build_text_mode_tool_instructions) to use
is_skills_agent_with_toolkit (or the new dedicated flag) and ensure native tool
support (supports_native/request_tools) remains enabled for non-toolkit
skills_agent cases.
- Around line 1416-1464: Add unit tests covering the pure helpers chunk_content,
top_k_for_toolkit, build_handoff_placeholder, summarise_parameters, and
first_line_truncated in the existing mod tests; specifically for chunk_content
add cases: content shorter than or equal to budget, multi-line content that
splits at line boundaries, a single line longer than budget that includes a
multi-byte UTF-8 character near the cut (to exercise char-boundary slicing), and
budget exactly equal to content length; for top_k_for_toolkit add
case-insensitive toolkit matching and tie/limit behavior; for
build_handoff_placeholder and summarise_parameters add tests for required-param
marker rendering and expected summary strings; for first_line_truncated add
tests for strings shorter, equal, and longer than the truncation length
verifying ellipsis placement.

In `@src/openhuman/config/schema/context.rs`:
- Around line 75-104: Add a validation step on config load to ensure
summarizer_payload_threshold_tokens <= summarizer_max_payload_tokens: in the
Context config construction/deserialization path (where
summarizer_payload_threshold_tokens and summarizer_max_payload_tokens are set)
check the invariant and either clamp threshold to max or emit a debug/error log
and adjust/fail; reference the two fields by name and mention
PayloadSummarizer::maybe_summarize so reviewers can find the related runtime
behavior. Ensure the validation runs once after deserialization (or in a
Context::validate / Context::new) and use a clear log message indicating the
original and adjusted values if you choose to clamp.

In `@src/openhuman/tools/ops.rs`:
- Around line 291-712: The test suite lacks a regression guard ensuring
CsvExportTool remains registered: add a new unit test in the existing tests mod
(same style as all_tools_includes_spawn_subagent) that constructs the
environment (TempDir, SecurityPolicy, Memory via create_memory, BrowserConfig
and HttpRequestConfig, test_config), calls all_tools (or all_tools_with_runtime
if you want runtime-specific registration) and asserts the returned tools' names
contain "csv_export"; reference CsvExportTool, all_tools_with_runtime (or
all_tools) and the tests mod to place the one-line assertion mirroring the
existing spawn_subagent/complete_onboarding tests.

In `@src/openhuman/tools/orchestrator_tools.rs`:
- Around line 71-84: Replace the hardcoded agent_id == "summarizer" check with a
flag on the agent definition: add a runtime_dispatched_only: bool (serde default
false) field to AgentDefinition (so it can be set from agent.toml), update the
orchestrator_tools.rs filter to consult agent_def.runtime_dispatched_only
instead of matching the string literal, and ensure any code that
constructs/deserializes AgentDefinition provides the default; this co-locates
the "not LLM-visible" property with AgentDefinition and removes the special-case
id check.
- Around line 181-348: The tests are missing coverage for the new runtime-only
filter that should skip certain agents at runtime; add a unit test that
exercises collect_orchestrator_tools with an orchestrator whose subagents
include SubagentEntry::AgentId("summarizer".into()) and a registry entry for
"summarizer" so you can assert the summarizer is not emitted as a delegation
tool (no "delegate_summarizer" in the returned tool names). Ensure the test uses
the existing def(), registry_with_targets()/reg.insert(...), and asserts that
only the expected tools (e.g., "research") are present and that
names.iter().any(|n| n.contains("summarizer")) is false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2b0446e-03e9-4605-ab9c-2bc7fc2142e8

📥 Commits

Reviewing files that changed from the base of the PR and between 6733720 and 13c9ca9.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • src/openhuman/agent/agents/mod.rs
  • src/openhuman/agent/agents/orchestrator/agent.toml
  • src/openhuman/agent/agents/skills_agent/agent.toml
  • src/openhuman/agent/agents/skills_agent/prompt.md
  • src/openhuman/agent/agents/summarizer/agent.toml
  • src/openhuman/agent/agents/summarizer/prompt.md
  • src/openhuman/agent/bus.rs
  • src/openhuman/agent/harness/builtin_definitions.rs
  • src/openhuman/agent/harness/definition.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/payload_summarizer.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/subagent_runner.rs
  • src/openhuman/agent/harness/tests.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/config/schema/context.rs
  • src/openhuman/context/debug_dump.rs
  • src/openhuman/tools/impl/filesystem/csv_export.rs
  • src/openhuman/tools/impl/filesystem/mod.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/tools/orchestrator_tools.rs

Comment on lines +13 to +18
# `file_write` bypasses category_filter so the agent can save
# oversized tool payloads to workspace files (Path B in prompt.md).
# `csv_export` was previously included but removed — it was triggering
# superfluous export calls even after the answer had already been
# synthesised. Re-add here only if a typed CSV path is needed again.
extra_tools = ["file_write"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Any other agent.toml referencing csv_export?
rg -nP --type=toml -C2 'csv_export'
# Any non-toml references (prompt docs, code) naming csv_export?
rg -nP -C2 '\bcsv_export\b'

Repository: tinyhumansai/openhuman

Length of output: 5555


🏁 Script executed:

# Check context around the contradictory assertions
echo "=== Assertion in src/openhuman/agent/agents/mod.rs ===" 
sed -n '300,315p' src/openhuman/agent/agents/mod.rs

echo ""
echo "=== Assertion in src/openhuman/agent/harness/builtin_definitions.rs ===" 
sed -n '125,140p' src/openhuman/agent/harness/builtin_definitions.rs

echo ""
echo "=== Check if csv_export is in all_tools_with_runtime ===" 
rg -nP 'all_tools_with_runtime|CsvExportTool' src/openhuman/tools/ops.rs | head -20

Repository: tinyhumansai/openhuman

Length of output: 1649


Critical: conflicting test assertions about csv_export in extra_tools.

The codebase has two contradictory test assertions:

  • src/openhuman/agent/agents/mod.rs:309–310 asserts that csv_export must be in extra_tools
  • src/openhuman/agent/harness/builtin_definitions.rs:133–135 asserts that csv_export must not be in extra_tools

The actual state is that agent.toml contains only ["file_write"], and CsvExportTool is still globally registered in all_tools_with_runtime (ops.rs:78). One of these tests will fail. The comment in builtin_definitions.rs explains the removal rationale (superfluous export calls), but this contradicts the assertion in agents/mod.rs. Reconcile these tests to match the intended design.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/skills_agent/agent.toml` around lines 13 - 18,
Conflicting tests assert opposite expectations about csv_export in extra_tools:
decide the intended behavior and make the tests and runtime registration
consistent—either (A) if csv_export should be disabled by default (preferred per
the builtin_definitions.rs comment), remove CsvExportTool from
all_tools_with_runtime (ops.rs:78) and update the assertion in agents::mod.rs
that currently requires csv_export to assert it is absent, or (B) if csv_export
must be available, add "csv_export" to extra_tools in agent.toml and update the
builtin_definitions.rs test (and its explanatory comment) to allow it; ensure
the unique symbols CsvExportTool, extra_tools, and all_tools_with_runtime are
updated together so both tests agree with the chosen design.

Comment on lines +48 to +50
```
file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted markdown>)
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: add a language hint to the fenced code block.

markdownlint MD040 flags this fence. Tagging it as text (or rust-adjacent pseudocode) keeps renderers happy and avoids noisy lint in CI.

📝 Proposed diff
-   ```
+   ```text
    file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted markdown>)
-   ```
+   ```
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 48-48: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/skills_agent/prompt.md` around lines 48 - 50, The
fenced code block containing the file_write snippet in prompt.md lacks a
language hint; update the triple-backtick fence around
file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted
markdown>) to include a language tag (e.g., change ``` to ```text) so
markdownlint MD040 is satisfied and renderers/lints treat it as plain text.

Comment on lines +50 to +52
## Original size
<original_bytes> bytes → summary of <this note>
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't require an exact byte count the runtime never passes in.

The prompt asks for ## Original size with <original_bytes>, but this agent only receives tool name, optional hint, and raw output. That forces the model to guess or burn tokens estimating size. Either inject the exact byte count from Rust or make this section approximate/optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/summarizer/prompt.md` around lines 50 - 52, The
prompt currently requires an exact byte count under the header "## Original
size" using the placeholder "<original_bytes>", which the runtime does not
supply; change the prompt text so it does not demand an exact byte count — for
example replace the placeholder and heading with optional/approximate wording
like "## Original size (if known)" or "Approximate original size" and allow
"unknown" or "approx." as valid values; update the template lines containing "##
Original size" and "<original_bytes>" to reflect this optional/approximate
wording so the model won't try to guess exact bytes, or alternatively add a
clear note that the runtime may inject the exact byte count if a future Rust
change provides it.

Comment on lines +194 to +198
// Bus path runs ad-hoc agent turns without an Agent
// handle, so we pass None — payload summarization is
// wired into the orchestrator session via Agent::turn,
// not the bus dispatcher.
None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether orchestrator tool results from channel turns are expected to be summarized
rg -nP -C3 'maybe_summarize|PayloadSummarizer' 
rg -nP -C2 'AGENT_RUN_TURN_METHOD|agent\.run_turn'

Repository: tinyhumansai/openhuman

Length of output: 44901


🏁 Script executed:

# Get the complete bus handler implementation to see what's passed to run_tool_call_loop
sed -n '138,211p' src/openhuman/agent/bus.rs

Repository: tinyhumansai/openhuman

Length of output: 2799


🏁 Script executed:

# Search for how orchestrator turns are executed - look for Agent::turn or agent.run_turn usage with orchestrator
rg -nP 'orchestrator.*turn|Agent::turn' --type rs -A2 -B2 | head -100

Repository: tinyhumansai/openhuman

Length of output: 92


🏁 Script executed:

# Search for how orchestrator turns are executed
rg -nP 'orchestrator.*turn|Agent::turn' -A2 -B2 | head -150

Repository: tinyhumansai/openhuman

Length of output: 8598


🏁 Script executed:

# Search for where orchestrator agent is invoked or routed to
rg -nP 'target_agent_id.*orchestrator|agent_id.*orchestrator' -B3 -A3 | head -150

Repository: tinyhumansai/openhuman

Length of output: 6032


🏁 Script executed:

# Search for whether target_agent_id="orchestrator" is ever passed to the bus handler
rg -nP 'target_agent_id.*=.*"orchestrator"|"orchestrator".*target_agent_id' -B3 -A3

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Search for how target_agent_id is populated and used in dispatch
rg -nP 'target_agent_id' src/openhuman/channels/runtime/dispatch.rs -B2 -A2 | head -100

Repository: tinyhumansai/openhuman

Length of output: 1462


🏁 Script executed:

# Find where target_id is determined (the code that sets target_agent_id)
rg -nP 'target_id\s*=' src/openhuman/channels/runtime/dispatch.rs -B5 -A5

Repository: tinyhumansai/openhuman

Length of output: 509


The scope gap is real: orchestrator turns routed through channels skip payload summarization.

The bus handler passes None for payload_summarizer regardless of whether target_agent_id is "orchestrator". Since channels route orchestrator turns through agent.run_turn when onboarding is complete (src/openhuman/channels/runtime/dispatch.rs:355-356), oversized tool results from orchestrator turns initiated via channels are not summarized—they bypass the PayloadSummarizer logic that is only wired at the AgentBuilder level in session context. This is not intentional channel-specific compaction; it's a gap where orchestrator's summarization machinery is disabled depending on dispatch path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/bus.rs` around lines 194 - 198, The bus handler currently
passes None for payload_summarizer which skips payload summarization for
orchestrator turns routed via channels; change the logic in the bus dispatch
where payload_summarizer is passed (look for the None argument alongside
target_agent_id and Agent::turn/agent.run_turn invocations) so that when
target_agent_id == "orchestrator" you supply the session's/AgentBuilder's
PayloadSummarizer instance instead of None (i.e., forward the same
payload_summarizer used by the AgentBuilder/session context so orchestrator
turns take the same summarization path).

Comment thread src/openhuman/agent/harness/builtin_definitions.rs
Comment thread src/openhuman/agent/harness/payload_summarizer.rs
Comment thread src/openhuman/agent/harness/session/turn.rs
Comment on lines +246 to +261
if !definition.extra_tools.is_empty() {
let disallow_set: std::collections::HashSet<&str> = definition
.disallowed_tools
.iter()
.map(|s| s.as_str())
.collect();
for (i, tool) in parent.all_tools.iter().enumerate() {
let name = tool.name();
if definition.extra_tools.iter().any(|n| n == name)
&& !allowed_indices.contains(&i)
&& !disallow_set.contains(name)
{
allowed_indices.push(i);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find every definition that sets extra_tools together with a Named ToolScope
# to gauge whether this divergence is reachable today.
rg -nP -C6 'extra_tools\s*='
rg -nP -C3 'ToolScope::Named'

Repository: tinyhumansai/openhuman

Length of output: 18041


extra_tools ignores ToolScope restrictions — diverges from definition contract.

The loop force-pushes every name in definition.extra_tools into allowed_indices as long as it isn't in disallowed_tools, but does not consult definition.tools (ToolScope). The field's docstring in src/openhuman/agent/harness/definition.rs:126-132 explicitly states extra_tools "bypass the category_filter check but are still subject to disallowed_tools and ToolScope restrictions."

Currently latent: skills_agent (the only agent declaring extra_tools in TOML) uses ToolScope::Wildcard, so no scope breach occurs today. However, if a future agent declares ToolScope::Named([...]) + extra_tools = [...], this code will incorrectly leak tools the scope intended to exclude. Either enforce scope here, or update the field doc to say scope is also bypassed.

🛠️ Proposed fix (enforce scope)
     if !definition.extra_tools.is_empty() {
         let disallow_set: std::collections::HashSet<&str> = definition
             .disallowed_tools
             .iter()
             .map(|s| s.as_str())
             .collect();
         for (i, tool) in parent.all_tools.iter().enumerate() {
             let name = tool.name();
-            if definition.extra_tools.iter().any(|n| n == name)
-                && !allowed_indices.contains(&i)
-                && !disallow_set.contains(name)
-            {
-                allowed_indices.push(i);
-            }
+            if !definition.extra_tools.iter().any(|n| n == name)
+                || allowed_indices.contains(&i)
+                || disallow_set.contains(name)
+            {
+                continue;
+            }
+            let scope_ok = match &definition.tools {
+                ToolScope::Wildcard => true,
+                ToolScope::Named(allowed) => allowed.iter().any(|n| n == name),
+            };
+            if scope_ok {
+                allowed_indices.push(i);
+            }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 246 - 261, The
loop that adds definition.extra_tools into allowed_indices ignores
definition.tools (ToolScope) and thus can leak tools; modify the logic in
subagent_runner.rs so each extra_tools entry is also vetted against
definition.tools: when definition.tools is ToolScope::Named([...]) only allow
the tool if its name is listed there; when ToolScope::Wildcard allow as before;
still enforce disallowed_tools via disallow_set and existing allowed_indices
checks; reference definition.extra_tools, definition.tools (ToolScope),
definition.disallowed_tools, parent.all_tools, allowed_indices and tool.name()
to locate and update the condition.

Comment thread src/openhuman/config/schema/context.rs
Comment thread src/openhuman/tools/impl/filesystem/csv_export.rs
sanil-23 and others added 2 commits April 18, 2026 01:51
…ed extraction

Two optimisations to the oversize-handoff / extract_from_result pipeline:

1. Concat chunk summaries in order; drop reduce stage.
   The reduce-stage summarizer call was the slowest single turn of the
   pipeline and a single point of failure when the upstream provider
   hung — a hung reduce burned minutes before timing out. Map summaries
   are now sorted by original chunk index and concatenated with a
   plain-separator join. For listing/extraction queries this is
   equivalent; for top-N / global-ordering queries the caller can
   post-process. Observed: 8m24s → ~70s on a Notion SEARCH run.

2. Generic cleaner before the oversize check.
   Strips <script>/<style>/<svg> blocks, HTML comments, data:...;base64,...
   inline URIs, remaining HTML tags, and collapses whitespace. Applied
   on every tool output so results that are mostly markup drop below
   the 20k-token threshold and skip the extract pipeline entirely; the
   ones that remain oversized get chunked on clean bytes. Debug log
   emits before/after bytes + saved_pct per call.

No changes to the extract_from_result tool surface or placeholder format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The client-side safety timer in Conversations.tsx was a flat 120s
deadline from the send, not a silence window. Long agent turns
(chained tool calls, chunked extraction fan-outs, subagent spawns)
would trip the timeout even though the backend was actively making
progress — showing "No response from the assistant after 2 minutes"
while the server was still working.

The effect now watches inferenceStatusByThread and rearms the timer
on every status change for the sending thread. Any tool_call /
tool_result / iteration_start / subagent_{spawned,done} event resets
the 120s window. When status is cleared (chat_done / chat_error),
the timer is dropped. A truly silent server still fails after 120s
as before.

Tracks the sending thread id in a ref (sendingThreadIdRef) so
switching threads mid-turn doesn't move the timer's reference point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
app/src/pages/Conversations.tsx (1)

300-316: Add [chat] debug logs for arm/rearm/clear transitions.

Only the timeout-fire path currently logs. Given this is a new reliability control with three distinct state transitions (initial arm, rearm-on-progress, clear-on-completion), traceable logs would make staging/production incident triage much easier — especially for the rearm path, which is driven by a Redux selector and can be hard to attribute after the fact.

🪵 Proposed logs
   const armSilenceTimer = (threadId: string) => {
-    if (sendingTimeoutRef.current) clearTimeout(sendingTimeoutRef.current);
+    if (sendingTimeoutRef.current) {
+      clearTimeout(sendingTimeoutRef.current);
+      console.debug('[chat] silence-timer rearm', { threadId });
+    } else {
+      console.debug('[chat] silence-timer arm', { threadId });
+    }
     sendingThreadIdRef.current = threadId;
     sendingTimeoutRef.current = setTimeout(() => {
@@
   useEffect(() => {
     const threadId = sendingThreadIdRef.current;
     if (!threadId || !sendingTimeoutRef.current) return;
     const status = inferenceStatusByThread[threadId];
     if (status === undefined) {
+      console.debug('[chat] silence-timer clear (status cleared)', { threadId });
       clearTimeout(sendingTimeoutRef.current);
       sendingTimeoutRef.current = null;
       sendingThreadIdRef.current = null;
       return;
     }
     armSilenceTimer(threadId);

As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows … log critical checkpoints including … state transitions, and error handling paths" and "Use grep-friendly log prefixes".

Also applies to: 323-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/Conversations.tsx` around lines 300 - 316, Add grep-friendly
"[chat]" debug logs tracing the three silence-timer state transitions: when
arming (in armSilenceTimer right after setting sendingTimeoutRef.current and
sendingThreadIdRef.current), when rearming (wherever the selector-driven rearm
logic updates sendingTimeoutRef or calls armSilenceTimer again — log previous
threadId, new threadId and timeout reset), and when clearing (where you call
dispatch(clearRuntimeForThread and dispatch(setActiveThread) or clearTimeout —
log that the timer was cleared and which thread was cleared). Ensure logs appear
before/after the state changes and include sendingThreadIdRef.current, whether a
previous timeout was cleared, and a concise message like "[chat] arm", "[chat]
rearm", "[chat] clear" to aid grep-based triage.
src/openhuman/agent/harness/subagent_runner.rs (2)

789-816: Text-mode override silently no-ops if no system message exists.

history.iter_mut().find(|m| m.role == "system") is best-effort: if an upstream caller ever invokes run_inner_loop with a history that lacks a system message (e.g. a future refactor, or a test harness), the XML tool protocol + catalogue never get injected, the model has no call format to follow, and the subsequent parse_tool_calls pass returns zero calls — the loop then terminates on iteration 1 with whatever prose the model guessed. Both run_typed_mode and run_fork_mode currently guarantee a system message at index 0, so this is latent, but a debug_assert! or an explicit error here would make the invariant explicit and catch regressions early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 789 - 816, The
text-mode injection silently no-ops if no system message exists; update the
force_text_mode path (around force_text_mode and the history.iter_mut().find(|m|
m.role == "system") usage) to assert or fail fast when a system message is
missing: add a debug_assert! that history contains a message with role "system"
(or return an Err / tracing::error and bail from run_inner_loop) before calling
build_text_mode_tool_instructions so the XML protocol is never silently omitted;
reference run_inner_loop / build_text_mode_tool_instructions / parse_tool_calls
in the check to make the invariant explicit and catch regressions early.

966-1019: Fragile starts_with("Error") used for both skip-handoff and status classification.

Lines 968 and 1008 both rely on the cleaned text starting with "Error" to decide (a) whether to bypass the handoff cache and (b) what status to emit on the <tool_result> tag. A legitimate tool output whose first characters happen to be "Error" (e.g. a docs lookup about HTTP error codes, or a Composio action that returned a summary prefixed with "Error budget remaining…") will both skip the handoff (landing a potentially multi‑MB payload inline and risking the context-overflow this PR is meant to prevent) and get mislabelled status="error" to the model.

The error path in the tool dispatch above already has precise signals — result.is_error, the Err / timeout arms, and the explicit "not available" string — so a dedicated is_error: bool accumulated alongside result_text would be more robust than string-sniffing the final text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 966 - 1019, The
code is using fragile string-sniffing (result_text.starts_with("Error")) to
decide handoff bypass and tool_result status; instead, propagate a boolean error
flag (e.g., is_error from the tool dispatch — result.is_error / Err / timeout
branches) together with result_text and use that flag in both the handoff logic
(skip_cleaning) and when computing status for force_text_mode, replacing both
starts_with checks; update the code paths that set result_text (where
result.is_error is available) to also set is_error, then reference is_error in
the skip_cleaning condition (alongside call.name checks) and when setting status
for text_mode_result_block so oversized outputs aren't misclassified or inlined
due to incidental text content.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 1069-1090: The single-shot extraction path (extract_single_shot)
is never taken because HANDOFF_OVERSIZE_THRESHOLD_TOKENS (20_000 ≈ 80k chars)
makes cached entries larger than SUMMARIZER_CHUNK_CHAR_BUDGET (60_000), so
reduce the mismatch by either lowering HANDOFF_OVERSIZE_THRESHOLD_TOKENS to
~15_000 (≈60k chars) or raising SUMMARIZER_CHUNK_CHAR_BUDGET above the 80k floor
so cached.content.len() can be <= budget, and add a unit/integration test that
constructs a cached entry around the threshold to assert extract_single_shot is
invoked; update references to HANDOFF_OVERSIZE_THRESHOLD_TOKENS,
SUMMARIZER_CHUNK_CHAR_BUDGET, and extract_single_shot accordingly.

---

Nitpick comments:
In `@app/src/pages/Conversations.tsx`:
- Around line 300-316: Add grep-friendly "[chat]" debug logs tracing the three
silence-timer state transitions: when arming (in armSilenceTimer right after
setting sendingTimeoutRef.current and sendingThreadIdRef.current), when rearming
(wherever the selector-driven rearm logic updates sendingTimeoutRef or calls
armSilenceTimer again — log previous threadId, new threadId and timeout reset),
and when clearing (where you call dispatch(clearRuntimeForThread and
dispatch(setActiveThread) or clearTimeout — log that the timer was cleared and
which thread was cleared). Ensure logs appear before/after the state changes and
include sendingThreadIdRef.current, whether a previous timeout was cleared, and
a concise message like "[chat] arm", "[chat] rearm", "[chat] clear" to aid
grep-based triage.

In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 789-816: The text-mode injection silently no-ops if no system
message exists; update the force_text_mode path (around force_text_mode and the
history.iter_mut().find(|m| m.role == "system") usage) to assert or fail fast
when a system message is missing: add a debug_assert! that history contains a
message with role "system" (or return an Err / tracing::error and bail from
run_inner_loop) before calling build_text_mode_tool_instructions so the XML
protocol is never silently omitted; reference run_inner_loop /
build_text_mode_tool_instructions / parse_tool_calls in the check to make the
invariant explicit and catch regressions early.
- Around line 966-1019: The code is using fragile string-sniffing
(result_text.starts_with("Error")) to decide handoff bypass and tool_result
status; instead, propagate a boolean error flag (e.g., is_error from the tool
dispatch — result.is_error / Err / timeout branches) together with result_text
and use that flag in both the handoff logic (skip_cleaning) and when computing
status for force_text_mode, replacing both starts_with checks; update the code
paths that set result_text (where result.is_error is available) to also set
is_error, then reference is_error in the skip_cleaning condition (alongside
call.name checks) and when setting status for text_mode_result_block so
oversized outputs aren't misclassified or inlined due to incidental text
content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a6ccf19-887f-4b07-aff7-cb36dcbc1aea

📥 Commits

Reviewing files that changed from the base of the PR and between 13c9ca9 and 79540c4.

📒 Files selected for processing (2)
  • app/src/pages/Conversations.tsx
  • src/openhuman/agent/harness/subagent_runner.rs

Comment on lines +1069 to +1090
/// Token threshold above which a tool result is routed to the handoff
/// cache instead of being pushed into history raw. 20 000 tokens keeps
/// the sub-agent's per-iteration prompt well below the 196 607-token
/// model ceiling even after many iterations, and leaves comfortable
/// headroom for the system prompt + tool catalogue. Token count is
/// estimated at ~4 chars/token (mirrors
/// [`crate::openhuman::agent::harness::payload_summarizer`] and
/// [`crate::openhuman::tree_summarizer::types::estimate_tokens`]).
const HANDOFF_OVERSIZE_THRESHOLD_TOKENS: usize = 20_000;

/// Characters of the raw payload to surface in the placeholder preview.
/// Enough for the sub-agent to recognise the shape (JSON keys, first
/// record) and often small enough to answer trivial questions without a
/// follow-up `extract_from_result` call.
const HANDOFF_PREVIEW_CHARS: usize = 1500;

/// Maximum entries per session. Bounded to keep memory use predictable
/// on long-running sub-agents that might call many large tools. When
/// over capacity we evict the oldest entry (FIFO); callers see "no
/// cached result" for evicted ids and can either re-run the tool or
/// ask the user/orchestrator to narrow the request.
const HANDOFF_MAX_ENTRIES: usize = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Single-shot extraction path is unreachable with current thresholds.

  • HANDOFF_OVERSIZE_THRESHOLD_TOKENS = 20_000 at ~4 chars/token means the cache only stores payloads when cleaned.len() ≥ 80_001 bytes (line 985‑987).
  • SUMMARIZER_CHUNK_CHAR_BUDGET = 60_000 bytes gates the fast path at line 1249 (cached.content.len() <= 60_000).

Every cached entry is therefore strictly larger than the single-shot budget, so extract_single_shot is dead code in production. Functionally harmless (the chunk path with vec![content] would produce equivalent output for a 1-chunk split), but you lose the stated wall-clock optimization and the extra scaffolding runs for no reason.

Consider either lowering HANDOFF_OVERSIZE_THRESHOLD_TOKENS to ~15_000 (≈60k bytes) so small-ish oversized payloads can take the fast path, or raising SUMMARIZER_CHUNK_CHAR_BUDGET above the 80k-byte floor, or adding a test that pins the single-shot path to catch future drift.

Also applies to: 1249-1258, 1411-1411

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 1069 - 1090, The
single-shot extraction path (extract_single_shot) is never taken because
HANDOFF_OVERSIZE_THRESHOLD_TOKENS (20_000 ≈ 80k chars) makes cached entries
larger than SUMMARIZER_CHUNK_CHAR_BUDGET (60_000), so reduce the mismatch by
either lowering HANDOFF_OVERSIZE_THRESHOLD_TOKENS to ~15_000 (≈60k chars) or
raising SUMMARIZER_CHUNK_CHAR_BUDGET above the 80k floor so cached.content.len()
can be <= budget, and add a unit/integration test that constructs a cached entry
around the threshold to assert extract_single_shot is invoked; update references
to HANDOFF_OVERSIZE_THRESHOLD_TOKENS, SUMMARIZER_CHUNK_CHAR_BUDGET, and
extract_single_shot accordingly.

…-subagent

# Conflicts:
#	src/openhuman/agent/agents/mod.rs
@senamakel senamakel merged commit 8d4934e into tinyhumansai:main Apr 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement summarizer sub-agent to prevent orchestrator context overflows

2 participants