feat(agent): progressive-disclosure handoff + token-based summarizer threshold (#574)#586
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 37 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughIntroduces a runtime-dispatched "summarizer" sub-agent and payload-summarizer plumbing to compress oversized tool outputs before history/orchestrator consumption; adds AgentDefinition.extra_tools, skills_agent export allowance, CSV export tool, progressive-disclosure handoff, and related session/tool-loop/subagent-runner changes and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Tool as Tool impl
participant Agent as Agent/run_tool_call_loop
participant Summarizer as PayloadSummarizer
participant Subagent as Summarizer Subagent
participant Registry as AgentDefinitionRegistry
Tool->>Agent: execute -> returns large payload
Agent->>Summarizer: maybe_summarize(tool_name, parent_hint, raw_payload)
Summarizer->>Registry: lookup "summarizer" definition
Registry-->>Summarizer: AgentDefinition (if present)
Summarizer->>Subagent: dispatch summarizer subagent with wrapped prompt
Subagent-->>Summarizer: summary (or error/none)
alt summary smaller & valid
Summarizer-->>Agent: Ok(Some(SummarizedPayload{summary,...}))
Agent->>History: insert summarized payload into history
else summarization failed or skipped
Summarizer-->>Agent: Ok(None) or Err
Agent->>History: insert original payload (possibly truncated later)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ae2f7e2 to
46d6e15
Compare
Issue tinyhumansai#574. Before this change, tool results larger than a few KB landed verbatim in orchestrator history, burning context budget on raw JSON/HTML/file dumps. The only guardrail was tool_result_budget_bytes which hard-truncated mid-payload, dropping everything past the cut. This PR introduces a dedicated `summarizer` sub-agent (model.hint = "summarization") that the runtime automatically dispatches whenever a tool result exceeds summarizer_payload_threshold_bytes (default 100 KB). The summarizer compresses the payload per an extraction contract that preserves identifiers and key facts, and the compressed summary replaces the raw payload before it enters agent history. Payloads above summarizer_max_payload_bytes (default 5 MB) skip summarization and fall through to the existing truncation path — paying for an LLM call on a 5 MB blob is counterproductive. Scoped to the orchestrator session only. Welcome, skills_agent, researcher, planner, and every other typed sub-agent get None and their tool results are untouched. Gated in build_session_agent_inner by checking agent_id == "orchestrator". Instrumented at both tool-loop paths: - run_tool_call_loop in tool_loop.rs (event-bus/channels path) - Agent::execute_tool_call in session/turn.rs (web channel path via run_single) Both paths call into the same `PayloadSummarizer::maybe_summarize` trait so the threshold check, circuit breaker (3 consecutive failures disables for the session), and sub-agent dispatch policy live in exactly one place (agent/harness/payload_summarizer.rs). The summarizer sub-agent is runtime-dispatched only — it is NOT exposed as a delegation tool to the orchestrator's LLM. It is listed in the orchestrator's `subagents = [...]` for explicit registration, but `collect_orchestrator_tools` filters out the `summarizer` id and never synthesises a `delegate_summarizer` tool. Files: * NEW: src/openhuman/agent/agents/summarizer/{agent.toml,prompt.md} * NEW: src/openhuman/agent/harness/payload_summarizer.rs (trait + SubagentPayloadSummarizer impl + circuit breaker + tests) * src/openhuman/agent/agents/mod.rs — register summarizer in BUILTINS * src/openhuman/agent/agents/orchestrator/agent.toml — list summarizer in subagents (runtime-only, no LLM delegation tool) * src/openhuman/agent/harness/mod.rs — declare module * src/openhuman/agent/harness/builtin_definitions.rs — expect summarizer in `expected_builtin_ids_are_present` * src/openhuman/agent/harness/tool_loop.rs — thread Option<&dyn PayloadSummarizer> into run_tool_call_loop + agent_turn, intercept at the tool-execution success site, plus a new integration test using a MockSummarizer * src/openhuman/agent/harness/tests.rs — thread None through the existing run_tool_call_loop callsites * src/openhuman/agent/harness/session/turn.rs — same interception in Agent::execute_tool_call * src/openhuman/agent/harness/session/types.rs — payload_summarizer field on Agent and AgentBuilder * src/openhuman/agent/harness/session/builder.rs — .payload_summarizer() setter + orchestrator-only construction in build_session_agent_inner * src/openhuman/agent/bus.rs — pass None for the bus path * src/openhuman/config/schema/context.rs — summarizer_payload_threshold_bytes + summarizer_max_payload_bytes on ContextConfig * src/openhuman/tools/orchestrator_tools.rs — filter out the summarizer id from delegation-tool synthesis Tests: unit tests in payload_summarizer pin the pass-through rules (below threshold, above max cap, breaker tripped) and the prompt construction. Integration test in tool_loop::tests uses a MockSummarizer to verify the interception wires through end-to-end. Closes tinyhumansai#574. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a Composio tool returns a large payload inside skills_agent,
the agent now has two prompt-driven paths:
Path A — user wants an answer derived from the data:
skills_agent extracts the answer in its next iteration
and returns a targeted response (no file I/O needed).
Path B — user wants the actual raw dataset:
skills_agent calls csv_export (tabular data) or file_write
(non-tabular) to persist the output to workspace/exports/,
then returns a summary + file path.
Changes:
* NEW: src/openhuman/tools/impl/filesystem/csv_export.rs
— CsvExportTool: parses JSON array, formats as CSV, writes
to workspace/exports/{filename}. Handles missing keys
(empty cells), nested values (JSON-serialised), and optional
column ordering. Sandboxed via SecurityPolicy.
* src/openhuman/tools/impl/filesystem/mod.rs — wire module
* src/openhuman/tools/ops.rs — register CsvExportTool
* src/openhuman/agent/harness/definition.rs — new extra_tools
field on AgentDefinition, allowing named system tools to
bypass category_filter
* src/openhuman/agent/harness/subagent_runner.rs — inject
extra_tools into allowed_indices after category filtering
* src/openhuman/agent/agents/skills_agent/agent.toml
— add file_write + csv_export via extra_tools
* src/openhuman/agent/agents/skills_agent/prompt.md
— new "Handling Oversized Tool Results" section with
Path A (extract answer) vs Path B (export file) decision
tree, intent-detection heuristics, and examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…based summarizer threshold (tinyhumansai#574) Two layered fixes for tinyhumansai#574 plus supporting changes. ## Summarizer thresholds in tokens, not bytes Renamed `ContextConfig::summarizer_payload_threshold_bytes` -> `summarizer_payload_threshold_tokens` (default 500 000, was 100 KB in bytes) and `summarizer_max_payload_bytes` -> `summarizer_max_payload_tokens` (default 2 000 000). Token count is estimated at ~4 chars/token via a local helper that mirrors `tree_summarizer::types::estimate_tokens`. Serde aliases on both fields keep existing config.toml files parsing. `payload_summarizer.rs` now compares token estimates for the threshold and cap, with both `tokens=` and `bytes=` surfaced in log events. ## Progressive-disclosure handoff for skills_agent `skills_agent` typed subagents (when spawned with a Composio toolkit) now receive a per-spawn `ResultHandoffCache` and a new built-in `extract_from_result` tool. When a per-action tool returns more than `HANDOFF_OVERSIZE_THRESHOLD_TOKENS` (20 000 tokens), the raw payload is stashed in the cache and a compact placeholder replaces it in history: [oversized tool output: 187432 bytes - stashed as result_id="res_1"] Preview (first 1500 chars): ... If the preview does not answer your task, call: extract_from_result(result_id="res_1", query="<specific question>") The subagent can answer from the preview, call `extract_from_result` with a targeted query, or re-call the original tool with tighter filters. `extract_from_result` dispatches the `summarizer` subagent against the cached payload with the query as the extraction contract. For payloads over `SUMMARIZER_CHUNK_CHAR_BUDGET` (60 000 chars) the extractor runs a chunked map-reduce: - split at blank-line / newline boundaries - map each chunk to a per-chunk partial via bounded concurrency (`buffer_unordered(3)`) to avoid storming the provider gateway - reduce partials in one more summarizer turn, or fall back to the concatenation if the reduce stage fails Falls back gracefully when the summarizer definition is missing from the registry (placeholder + preview still work; just no extractor). ## Text-mode dispatcher for skills_agent + tighter tool filter Provider-side grammar decoders (Fireworks etc.) cap tool-schema rules at 65 535 (uint16_t). Large Composio toolkits - Gmail, Notion, GitHub, Salesforce, HubSpot, Google Workspace - ship per-action JSON schemas dense enough that even a handful of them exceed the cap. - `subagent_runner::run_inner_loop` now detects `skills_agent` and omits `tools: [...]` from the API request, instead appending an `<tool_call>{...}</tool_call>` XML protocol block (with compact per-tool parameter summaries) into the system prompt. Tool calls are parsed out of the model's free-form response via the shared `parse_tool_calls` helper. Mirrors XmlToolDispatcher behaviour. - `top_k_for_toolkit()` picks 12 for HEAVY_SCHEMA_TOOLKITS (the six listed above plus googledocs, googlesheets, microsoftteams) vs the default 25 for lighter toolkits. ## Skills_agent agent.toml / prompt changes - Dropped `csv_export` from `skills_agent.extra_tools` (kept `file_write`) - the export tool was triggering superfluous file writes after extraction had already answered the query. - `builtin_definitions::skills_agent_has_extra_tools_for_export` test inverted to assert `csv_export` is NOT present. - `prompt.md` re-aligned: Path B now references only `file_write`. ## Wiring - `session/builder.rs`: threshold rename wiring (`_tokens`). - `subagent_runner.rs::run_inner_loop`: new `handoff_cache` parameter (threaded through both call sites; fork-mode passes `None`). ## Tests All green: - `payload_summarizer::tests`: 6/6 (thresholds in tokens) - `tool_loop::tests`: 10/10 (MockSummarizer integration) - `subagent_runner::tests`: 19/19 - `builtin_definitions::tests`: 5/5 (csv_export removal) - `csv_export::tests`: 7/7 (implementation kept, wiring removed) Closes tinyhumansai#574. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
46d6e15 to
13c9ca9
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (9)
src/openhuman/tools/ops.rs (1)
291-712: Missing test coverage forcsv_exportregistration.Given
CsvExportToolis now unconditionally registered inall_tools_with_runtime, a one-line regression guard (mirroring the existingall_tools_includes_spawn_subagentandall_tools_includes_complete_onboardingtests) would prevent accidental removal. Not blocking.Proposed test
+ #[test] + fn all_tools_includes_csv_export() { + let tmp = TempDir::new().unwrap(); + let security = Arc::new(SecurityPolicy::default()); + let mem_cfg = MemoryConfig { backend: "markdown".into(), ..MemoryConfig::default() }; + let mem: Arc<dyn Memory> = Arc::from( + crate::openhuman::memory::create_memory(&mem_cfg, tmp.path(), None).unwrap(), + ); + let browser = BrowserConfig::default(); + let http = crate::openhuman::config::HttpRequestConfig::default(); + let cfg = test_config(&tmp); + let tools = all_tools( + Arc::new(Config::default()), &security, mem, None, None, + &browser, &http, tmp.path(), &HashMap::new(), None, &cfg, + ); + let names: Vec<&str> = tools.iter().map(|t| t.name()).collect(); + assert!(names.contains(&"csv_export"), "got: {names:?}"); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/ops.rs` around lines 291 - 712, The test suite lacks a regression guard ensuring CsvExportTool remains registered: add a new unit test in the existing tests mod (same style as all_tools_includes_spawn_subagent) that constructs the environment (TempDir, SecurityPolicy, Memory via create_memory, BrowserConfig and HttpRequestConfig, test_config), calls all_tools (or all_tools_with_runtime if you want runtime-specific registration) and asserts the returned tools' names contain "csv_export"; reference CsvExportTool, all_tools_with_runtime (or all_tools) and the tests mod to place the one-line assertion mirroring the existing spawn_subagent/complete_onboarding tests.src/openhuman/config/schema/context.rs (1)
75-104: Consider documenting/validating the threshold ≤ max invariant.Nothing enforces
summarizer_payload_threshold_tokens <= summarizer_max_payload_tokens. If a user inverts them, payloads in between would both "trigger summarization" and "skip summarization" — behavior depends on evaluation order inPayloadSummarizer::maybe_summarize. A debug-level validation log (or clamp) on Config load would catch misconfigurations early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/config/schema/context.rs` around lines 75 - 104, Add a validation step on config load to ensure summarizer_payload_threshold_tokens <= summarizer_max_payload_tokens: in the Context config construction/deserialization path (where summarizer_payload_threshold_tokens and summarizer_max_payload_tokens are set) check the invariant and either clamp threshold to max or emit a debug/error log and adjust/fail; reference the two fields by name and mention PayloadSummarizer::maybe_summarize so reviewers can find the related runtime behavior. Ensure the validation runs once after deserialization (or in a Context::validate / Context::new) and use a clear log message indicating the original and adjusted values if you choose to clamp.src/openhuman/agent/harness/mod.rs (1)
34-34: Module declaration placement deviates from repo convention.Per coding guidelines, new functionality should live in a dedicated subdirectory module (e.g.
payload_summarizer/mod.rs) rather than a standalonepayload_summarizer.rs. This is a minor concern here since the file sits underharness/rather than directly atsrc/openhuman/, and the surrounding harness siblings (parse.rs,credentials.rs,tool_loop.rs,self_healing.rs) follow the same flat-file pattern, so this is locally consistent. Leaving as nit — promote to a folder if the module grows to have tests/helpers as separate files.As per coding guidelines: "New functionality in Rust must live in a dedicated subdirectory module (e.g., openhuman/my_domain/mod.rs), not new standalone *.rs files at src/openhuman/ root".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/mod.rs` at line 34, The module declaration pub(crate) mod payload_summarizer should be converted to a submodule directory: move the current payload_summarizer.rs into a new folder named payload_summarizer with a mod.rs (or lib.rs) inside, update the harness mod tree so the top-level harness mod (mod.rs in harness) continues to declare pub(crate) mod payload_summarizer, and ensure any internal tests/helpers are placed alongside inside payload_summarizer/; this keeps the symbol payload_summarizer as the module root while following the repo convention for expandable modules.src/openhuman/tools/orchestrator_tools.rs (2)
71-84: Consider moving the runtime-only flag ontoAgentDefinitioninstead of a hardcoded id match.The string-literal
"summarizer"check here creates a distributed contract: every future runtime-dispatched sub-agent must be added to this hardcoded list (and the comment explicitly invites that growth). A cleaner, self-documenting alternative is a boolean onAgentDefinition(e.g.runtime_dispatched_only: bool, defaulting tofalsewith a serde default) sourced from the agent's ownagent.toml. That keeps the "this agent is not LLM-visible" property co-located with the agent definition that declares it, and removes the need for orchestrator-tool code to hold a list of special ids.Not blocking — the current implementation is correct for the single-agent case.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/orchestrator_tools.rs` around lines 71 - 84, Replace the hardcoded agent_id == "summarizer" check with a flag on the agent definition: add a runtime_dispatched_only: bool (serde default false) field to AgentDefinition (so it can be set from agent.toml), update the orchestrator_tools.rs filter to consult agent_def.runtime_dispatched_only instead of matching the string literal, and ensure any code that constructs/deserializes AgentDefinition provides the default; this co-locates the "not LLM-visible" property with AgentDefinition and removes the special-case id check.
181-348: Missing test coverage for the new summarizer skip branch.The new runtime-only filter at lines 71–84 has no dedicated unit test. The existing tests cover known agents, unknown agents, skills wildcards, and empty subagents — but not the case where an AgentId entry is explicitly skipped because it's runtime-only. Worth adding a small test that includes
SubagentEntry::AgentId("summarizer".into())in the subagents list and asserts nodelegate_summarizertool is synthesised.🧪 Suggested test
#[test] fn runtime_only_summarizer_is_never_exposed_as_delegation_tool() { let mut orch = def("orchestrator", "test", None); orch.subagents = vec![ SubagentEntry::AgentId("researcher".into()), SubagentEntry::AgentId("summarizer".into()), ]; let mut reg = registry_with_targets(); reg.insert(def("summarizer", "runtime-only", None)); let tools = collect_orchestrator_tools(&orch, ®, &[]); let names: Vec<&str> = tools.iter().map(|t| t.name()).collect(); assert_eq!(names, vec!["research"]); assert!(!names.iter().any(|n| n.contains("summarizer"))); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/orchestrator_tools.rs` around lines 181 - 348, The tests are missing coverage for the new runtime-only filter that should skip certain agents at runtime; add a unit test that exercises collect_orchestrator_tools with an orchestrator whose subagents include SubagentEntry::AgentId("summarizer".into()) and a registry entry for "summarizer" so you can assert the summarizer is not emitted as a delegation tool (no "delegate_summarizer" in the returned tool names). Ensure the test uses the existing def(), registry_with_targets()/reg.insert(...), and asserts that only the expected tools (e.g., "research") are present and that names.iter().any(|n| n.contains("summarizer")) is false.src/openhuman/agent/harness/subagent_runner.rs (4)
958-982:starts_with("Error")is a fragile sentinel for "don't stash".The handoff-cache skip relies on tool outputs that fail to begin with the literal
"Error". A tool whose legitimate payload happens to start with that word (error-log exports, "Error codes reference" dumps from Notion/Gmail, etc.) will sidestep handoff and blow the context budget, while the equivalent path at line 985 will also misclassify it asstatus="error"in text mode.Consider plumbing
ToolResult.is_error(or a typed enum) through alongside the text so both sites use a structured signal rather than a prefix match. Non-blocking, but worth noting since the two checks are likely to drift over time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 958 - 982, The code currently uses result_text.starts_with("Error") to decide not to stash outputs; instead pass a structured error flag (e.g., ToolResult::is_error or an enum) alongside the text from the tool invocation into the handoff logic and use that flag in the condition here (the block referencing handoff_cache, result_text, call.name, and build_handoff_placeholder) and at the other site that sets status="error" so both use the same typed signal; update the call sites that construct/return tool results to include the boolean/enum and replace any starts_with("Error") checks with the new is_error property to avoid brittle prefix matching.
1473-1500: Heavy-schema toolkit list is baked into source — consider a config surface.
HEAVY_SCHEMA_TOOLKITS+ the twoTOOL_FILTER_TOP_K_*consts require a code change + rebuild every time a new dense Composio toolkit trips the 65 535-rule / 196 607-token ceilings. Given the comment history ("determined empirically from fixture dumps + staging failures"), this list is expected to grow.Options (in increasing invasiveness): (1) move to a
consttable keyed by toolkit slug → top_k; (2) expose viaAgentConfigso operators can tune without a release; (3) derive dynamically from action count / average schema size inintegration.tools. A simple size-based auto-classifier (e.g., "if avg parameters schema > N bytes, use HEAVY top_k") would obsolete the allow-list entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 1473 - 1500, The hard-coded HEAVY_SCHEMA_TOOLKITS list and TOOL_FILTER_TOP_K_* constants should be made configurable: add a mapping field to AgentConfig (e.g., toolkit_top_k: HashMap<String, usize>) and change top_k_for_toolkit(toolkit: &str) to first look up toolkit_top_k (case-insensitive key) and return that value, then fall back to TOOL_FILTER_TOP_K_DEFAULT/TOOL_FILTER_TOP_K_HEAVY or a single default; update callers to pass AgentConfig into top_k_for_toolkit or make it a method (AgentConfig::top_k_for_toolkit) and persist existing constants as final fallbacks so behavior is unchanged if config is absent. Ensure names referenced are HEAVY_SCHEMA_TOOLKITS, TOOL_FILTER_TOP_K_DEFAULT, TOOL_FILTER_TOP_K_HEAVY, and top_k_for_toolkit so reviewers can locate and remove/replace the hard-coded list.
788-815:force_text_modefires for everyskills_agentrun, not just toolkit-scoped ones.The surrounding comment says "Scoped to
skills_agentbecause that's the only path where we pass Composio toolkit schemas", but the check isagent_id == "skills_agent" && !tool_specs.is_empty(). Whenskills_agentis spawned without atoolkit_override(discovery / auth / list-connections flows),dynamic_toolsis empty and the only specs are the lightweightcomposio_*meta-tools — no grammar-cap risk — yet native tool-calling is still disabled and the model is forced into the XML prose protocol.
run_typed_modealready computesis_skills_agent_with_toolkit; consider threading that down (or a dedicated flag) rather than re-deriving fromagent_idinsiderun_inner_loop. This keeps text-mode targeted to the path that actually needs it and preserves native tool-calling's stricter formatting elsewhere.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 788 - 815, The current force_text_mode boolean is being derived from agent_id == "skills_agent" && !tool_specs.is_empty(), which incorrectly triggers text-mode for all skills_agent runs (including ones with only composio_* meta-tools). Instead, thread the already-computed is_skills_agent_with_toolkit (or a new explicit flag from run_typed_mode/run_inner_loop) down into this module and replace the derivation: set force_text_mode only when that flag is true and the toolkit-specific dynamic tool specs are non-empty (not just any composio meta-tools). Update the condition that sets force_text_mode (and any logic that appends XML protocol via build_text_mode_tool_instructions) to use is_skills_agent_with_toolkit (or the new dedicated flag) and ensure native tool support (supports_native/request_tools) remains enabled for non-toolkit skills_agent cases.
1416-1464: Missing unit coverage for the new pure helpers.
chunk_content,top_k_for_toolkit,build_handoff_placeholder,summarise_parameters, andfirst_line_truncatedare all deterministic, dependency-free functions with nontrivial edge cases (single-line-over-budget char-boundary slicing, case-insensitive toolkit matching, required-param marker rendering, ellipsis truncation). None of them have tests in themod testsblock below.Worth adding at least a handful of cases — particularly
chunk_contentwith (a) content ≤ budget, (b) multi-line boundaries, (c) a single line longer than budget with multi-byte UTF-8 near the cut point, and (d) budget equal to content length. Cheap to write and guards against regressions when the summarizer/handoff thresholds are retuned.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 1416 - 1464, Add unit tests covering the pure helpers chunk_content, top_k_for_toolkit, build_handoff_placeholder, summarise_parameters, and first_line_truncated in the existing mod tests; specifically for chunk_content add cases: content shorter than or equal to budget, multi-line content that splits at line boundaries, a single line longer than budget that includes a multi-byte UTF-8 character near the cut (to exercise char-boundary slicing), and budget exactly equal to content length; for top_k_for_toolkit add case-insensitive toolkit matching and tie/limit behavior; for build_handoff_placeholder and summarise_parameters add tests for required-param marker rendering and expected summary strings; for first_line_truncated add tests for strings shorter, equal, and longer than the truncation length verifying ellipsis placement.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/agents/skills_agent/agent.toml`:
- Around line 13-18: Conflicting tests assert opposite expectations about
csv_export in extra_tools: decide the intended behavior and make the tests and
runtime registration consistent—either (A) if csv_export should be disabled by
default (preferred per the builtin_definitions.rs comment), remove CsvExportTool
from all_tools_with_runtime (ops.rs:78) and update the assertion in
agents::mod.rs that currently requires csv_export to assert it is absent, or (B)
if csv_export must be available, add "csv_export" to extra_tools in agent.toml
and update the builtin_definitions.rs test (and its explanatory comment) to
allow it; ensure the unique symbols CsvExportTool, extra_tools, and
all_tools_with_runtime are updated together so both tests agree with the chosen
design.
In `@src/openhuman/agent/agents/skills_agent/prompt.md`:
- Around line 48-50: The fenced code block containing the file_write snippet in
prompt.md lacks a language hint; update the triple-backtick fence around
file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted
markdown>) to include a language tag (e.g., change ``` to ```text) so
markdownlint MD040 is satisfied and renderers/lints treat it as plain text.
In `@src/openhuman/agent/agents/summarizer/prompt.md`:
- Around line 50-52: The prompt currently requires an exact byte count under the
header "## Original size" using the placeholder "<original_bytes>", which the
runtime does not supply; change the prompt text so it does not demand an exact
byte count — for example replace the placeholder and heading with
optional/approximate wording like "## Original size (if known)" or "Approximate
original size" and allow "unknown" or "approx." as valid values; update the
template lines containing "## Original size" and "<original_bytes>" to reflect
this optional/approximate wording so the model won't try to guess exact bytes,
or alternatively add a clear note that the runtime may inject the exact byte
count if a future Rust change provides it.
In `@src/openhuman/agent/bus.rs`:
- Around line 194-198: The bus handler currently passes None for
payload_summarizer which skips payload summarization for orchestrator turns
routed via channels; change the logic in the bus dispatch where
payload_summarizer is passed (look for the None argument alongside
target_agent_id and Agent::turn/agent.run_turn invocations) so that when
target_agent_id == "orchestrator" you supply the session's/AgentBuilder's
PayloadSummarizer instance instead of None (i.e., forward the same
payload_summarizer used by the AgentBuilder/session context so orchestrator
turns take the same summarization path).
In `@src/openhuman/agent/harness/builtin_definitions.rs`:
- Around line 122-137: The two tests disagree about whether "csv_export" should
be in skills_agent.extra_tools; update the assertion in
skills_agent_is_wildcard_with_skill_category_filter (in agents/mod.rs) to match
the intended state (csv_export removed) so both tests are consistent: locate the
assertion that checks for "csv_export" in that function and remove or invert it
so it no longer requires "csv_export" to be present, leaving the existing
assertion in skills_agent_has_extra_tools_for_export (which verifies absence) as
the canonical expectation.
In `@src/openhuman/agent/harness/payload_summarizer.rs`:
- Around line 326-333: The prompt construction in build_summarizer_prompt embeds
untrusted raw tool output directly between fixed sentinels (the --- BEGIN
---/--- END --- markers) which can be broken out of if raw contains
sentinel-like text; modify build_summarizer_prompt to sanitize or encode raw
before interpolation (e.g., base64 or URL-escape the payload) or replace the
sentinel wrapper with a length-delimited or structured wrapper (include a
"length: N" header or JSON field) so the summarizer cannot be tricked by
injected prompt-like tokens; ensure references to the variable raw and the
build_summarizer_prompt function are updated accordingly.
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 718-759: The code currently passes r.output() directly to
payload_summarizer.maybe_summarize, risking leakage of credentials; call the
shared scrub_credentials routine on r.output() first and pass that sanitized
string into payload_summarizer.maybe_summarize (and if the summarizer returns
payload.summary, use the already-scrubbed summary or re-run scrub_credentials on
the summary before assigning to output). Update the block that initializes
output (currently let mut output = r.output()) to produce a scrubbed value, and
ensure all uses of payload_summarizer::maybe_summarize and the final (output,
true) return use the scrubbed text.
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 246-261: The loop that adds definition.extra_tools into
allowed_indices ignores definition.tools (ToolScope) and thus can leak tools;
modify the logic in subagent_runner.rs so each extra_tools entry is also vetted
against definition.tools: when definition.tools is ToolScope::Named([...]) only
allow the tool if its name is listed there; when ToolScope::Wildcard allow as
before; still enforce disallowed_tools via disallow_set and existing
allowed_indices checks; reference definition.extra_tools, definition.tools
(ToolScope), definition.disallowed_tools, parent.all_tools, allowed_indices and
tool.name() to locate and update the condition.
In `@src/openhuman/config/schema/context.rs`:
- Around line 88-104: Update the migration notes/CHANGELOG to call out that the
serde aliases (summarizer_payload_threshold_bytes ->
summarizer_payload_threshold_tokens and summarizer_max_payload_bytes ->
summarizer_max_payload_tokens) preserve parsing but the interpreted units
changed from bytes to tokens; explicitly document that existing numeric values
will now be treated as token counts (give a concrete example such as 100_000
bytes becoming 100k tokens ≈ 400k chars), mention the default changes
(100k→500k), and add a short migration recommendation for users to review and
adjust these two config keys (summarizer_payload_threshold_tokens and
summarizer_max_payload_tokens) if they relied on byte-based tuning.
In `@src/openhuman/tools/impl/filesystem/csv_export.rs`:
- Around line 166-181: After confirming parsed.as_array() returns items, ensure
every element is a JSON object before proceeding: check items.iter().all(|v|
v.is_object()) and if any element is non-object return Err via ToolResult::error
with a clear message (e.g., "Data array must contain only JSON objects (rows)").
Do this check before calling resolve_columns(items, ...) and render_csv(...);
use the items variable and reject mixed arrays like [1,2,3] or arrays with
strings so we never pass non-object rows into resolve_columns or render_csv.
---
Nitpick comments:
In `@src/openhuman/agent/harness/mod.rs`:
- Line 34: The module declaration pub(crate) mod payload_summarizer should be
converted to a submodule directory: move the current payload_summarizer.rs into
a new folder named payload_summarizer with a mod.rs (or lib.rs) inside, update
the harness mod tree so the top-level harness mod (mod.rs in harness) continues
to declare pub(crate) mod payload_summarizer, and ensure any internal
tests/helpers are placed alongside inside payload_summarizer/; this keeps the
symbol payload_summarizer as the module root while following the repo convention
for expandable modules.
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 958-982: The code currently uses result_text.starts_with("Error")
to decide not to stash outputs; instead pass a structured error flag (e.g.,
ToolResult::is_error or an enum) alongside the text from the tool invocation
into the handoff logic and use that flag in the condition here (the block
referencing handoff_cache, result_text, call.name, and
build_handoff_placeholder) and at the other site that sets status="error" so
both use the same typed signal; update the call sites that construct/return tool
results to include the boolean/enum and replace any starts_with("Error") checks
with the new is_error property to avoid brittle prefix matching.
- Around line 1473-1500: The hard-coded HEAVY_SCHEMA_TOOLKITS list and
TOOL_FILTER_TOP_K_* constants should be made configurable: add a mapping field
to AgentConfig (e.g., toolkit_top_k: HashMap<String, usize>) and change
top_k_for_toolkit(toolkit: &str) to first look up toolkit_top_k
(case-insensitive key) and return that value, then fall back to
TOOL_FILTER_TOP_K_DEFAULT/TOOL_FILTER_TOP_K_HEAVY or a single default; update
callers to pass AgentConfig into top_k_for_toolkit or make it a method
(AgentConfig::top_k_for_toolkit) and persist existing constants as final
fallbacks so behavior is unchanged if config is absent. Ensure names referenced
are HEAVY_SCHEMA_TOOLKITS, TOOL_FILTER_TOP_K_DEFAULT, TOOL_FILTER_TOP_K_HEAVY,
and top_k_for_toolkit so reviewers can locate and remove/replace the hard-coded
list.
- Around line 788-815: The current force_text_mode boolean is being derived from
agent_id == "skills_agent" && !tool_specs.is_empty(), which incorrectly triggers
text-mode for all skills_agent runs (including ones with only composio_*
meta-tools). Instead, thread the already-computed is_skills_agent_with_toolkit
(or a new explicit flag from run_typed_mode/run_inner_loop) down into this
module and replace the derivation: set force_text_mode only when that flag is
true and the toolkit-specific dynamic tool specs are non-empty (not just any
composio meta-tools). Update the condition that sets force_text_mode (and any
logic that appends XML protocol via build_text_mode_tool_instructions) to use
is_skills_agent_with_toolkit (or the new dedicated flag) and ensure native tool
support (supports_native/request_tools) remains enabled for non-toolkit
skills_agent cases.
- Around line 1416-1464: Add unit tests covering the pure helpers chunk_content,
top_k_for_toolkit, build_handoff_placeholder, summarise_parameters, and
first_line_truncated in the existing mod tests; specifically for chunk_content
add cases: content shorter than or equal to budget, multi-line content that
splits at line boundaries, a single line longer than budget that includes a
multi-byte UTF-8 character near the cut (to exercise char-boundary slicing), and
budget exactly equal to content length; for top_k_for_toolkit add
case-insensitive toolkit matching and tie/limit behavior; for
build_handoff_placeholder and summarise_parameters add tests for required-param
marker rendering and expected summary strings; for first_line_truncated add
tests for strings shorter, equal, and longer than the truncation length
verifying ellipsis placement.
In `@src/openhuman/config/schema/context.rs`:
- Around line 75-104: Add a validation step on config load to ensure
summarizer_payload_threshold_tokens <= summarizer_max_payload_tokens: in the
Context config construction/deserialization path (where
summarizer_payload_threshold_tokens and summarizer_max_payload_tokens are set)
check the invariant and either clamp threshold to max or emit a debug/error log
and adjust/fail; reference the two fields by name and mention
PayloadSummarizer::maybe_summarize so reviewers can find the related runtime
behavior. Ensure the validation runs once after deserialization (or in a
Context::validate / Context::new) and use a clear log message indicating the
original and adjusted values if you choose to clamp.
In `@src/openhuman/tools/ops.rs`:
- Around line 291-712: The test suite lacks a regression guard ensuring
CsvExportTool remains registered: add a new unit test in the existing tests mod
(same style as all_tools_includes_spawn_subagent) that constructs the
environment (TempDir, SecurityPolicy, Memory via create_memory, BrowserConfig
and HttpRequestConfig, test_config), calls all_tools (or all_tools_with_runtime
if you want runtime-specific registration) and asserts the returned tools' names
contain "csv_export"; reference CsvExportTool, all_tools_with_runtime (or
all_tools) and the tests mod to place the one-line assertion mirroring the
existing spawn_subagent/complete_onboarding tests.
In `@src/openhuman/tools/orchestrator_tools.rs`:
- Around line 71-84: Replace the hardcoded agent_id == "summarizer" check with a
flag on the agent definition: add a runtime_dispatched_only: bool (serde default
false) field to AgentDefinition (so it can be set from agent.toml), update the
orchestrator_tools.rs filter to consult agent_def.runtime_dispatched_only
instead of matching the string literal, and ensure any code that
constructs/deserializes AgentDefinition provides the default; this co-locates
the "not LLM-visible" property with AgentDefinition and removes the special-case
id check.
- Around line 181-348: The tests are missing coverage for the new runtime-only
filter that should skip certain agents at runtime; add a unit test that
exercises collect_orchestrator_tools with an orchestrator whose subagents
include SubagentEntry::AgentId("summarizer".into()) and a registry entry for
"summarizer" so you can assert the summarizer is not emitted as a delegation
tool (no "delegate_summarizer" in the returned tool names). Ensure the test uses
the existing def(), registry_with_targets()/reg.insert(...), and asserts that
only the expected tools (e.g., "research") are present and that
names.iter().any(|n| n.contains("summarizer")) is false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d2b0446e-03e9-4605-ab9c-2bc7fc2142e8
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (24)
src/openhuman/agent/agents/mod.rssrc/openhuman/agent/agents/orchestrator/agent.tomlsrc/openhuman/agent/agents/skills_agent/agent.tomlsrc/openhuman/agent/agents/skills_agent/prompt.mdsrc/openhuman/agent/agents/summarizer/agent.tomlsrc/openhuman/agent/agents/summarizer/prompt.mdsrc/openhuman/agent/bus.rssrc/openhuman/agent/harness/builtin_definitions.rssrc/openhuman/agent/harness/definition.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/payload_summarizer.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/types.rssrc/openhuman/agent/harness/subagent_runner.rssrc/openhuman/agent/harness/tests.rssrc/openhuman/agent/harness/tool_loop.rssrc/openhuman/channels/runtime/dispatch.rssrc/openhuman/config/schema/context.rssrc/openhuman/context/debug_dump.rssrc/openhuman/tools/impl/filesystem/csv_export.rssrc/openhuman/tools/impl/filesystem/mod.rssrc/openhuman/tools/ops.rssrc/openhuman/tools/orchestrator_tools.rs
| # `file_write` bypasses category_filter so the agent can save | ||
| # oversized tool payloads to workspace files (Path B in prompt.md). | ||
| # `csv_export` was previously included but removed — it was triggering | ||
| # superfluous export calls even after the answer had already been | ||
| # synthesised. Re-add here only if a typed CSV path is needed again. | ||
| extra_tools = ["file_write"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Any other agent.toml referencing csv_export?
rg -nP --type=toml -C2 'csv_export'
# Any non-toml references (prompt docs, code) naming csv_export?
rg -nP -C2 '\bcsv_export\b'Repository: tinyhumansai/openhuman
Length of output: 5555
🏁 Script executed:
# Check context around the contradictory assertions
echo "=== Assertion in src/openhuman/agent/agents/mod.rs ==="
sed -n '300,315p' src/openhuman/agent/agents/mod.rs
echo ""
echo "=== Assertion in src/openhuman/agent/harness/builtin_definitions.rs ==="
sed -n '125,140p' src/openhuman/agent/harness/builtin_definitions.rs
echo ""
echo "=== Check if csv_export is in all_tools_with_runtime ==="
rg -nP 'all_tools_with_runtime|CsvExportTool' src/openhuman/tools/ops.rs | head -20Repository: tinyhumansai/openhuman
Length of output: 1649
Critical: conflicting test assertions about csv_export in extra_tools.
The codebase has two contradictory test assertions:
src/openhuman/agent/agents/mod.rs:309–310asserts thatcsv_exportmust be inextra_toolssrc/openhuman/agent/harness/builtin_definitions.rs:133–135asserts thatcsv_exportmust not be inextra_tools
The actual state is that agent.toml contains only ["file_write"], and CsvExportTool is still globally registered in all_tools_with_runtime (ops.rs:78). One of these tests will fail. The comment in builtin_definitions.rs explains the removal rationale (superfluous export calls), but this contradicts the assertion in agents/mod.rs. Reconcile these tests to match the intended design.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/agents/skills_agent/agent.toml` around lines 13 - 18,
Conflicting tests assert opposite expectations about csv_export in extra_tools:
decide the intended behavior and make the tests and runtime registration
consistent—either (A) if csv_export should be disabled by default (preferred per
the builtin_definitions.rs comment), remove CsvExportTool from
all_tools_with_runtime (ops.rs:78) and update the assertion in agents::mod.rs
that currently requires csv_export to assert it is absent, or (B) if csv_export
must be available, add "csv_export" to extra_tools in agent.toml and update the
builtin_definitions.rs test (and its explanatory comment) to allow it; ensure
the unique symbols CsvExportTool, extra_tools, and all_tools_with_runtime are
updated together so both tests agree with the chosen design.
| ``` | ||
| file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted markdown>) | ||
| ``` |
There was a problem hiding this comment.
Minor: add a language hint to the fenced code block.
markdownlint MD040 flags this fence. Tagging it as text (or rust-adjacent pseudocode) keeps renderers happy and avoids noisy lint in CI.
📝 Proposed diff
- ```
+ ```text
file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted markdown>)
- ```
+ ```🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 48-48: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/agents/skills_agent/prompt.md` around lines 48 - 50, The
fenced code block containing the file_write snippet in prompt.md lacks a
language hint; update the triple-backtick fence around
file_write(path="exports/slack-thread-general-2026-04-16.md", content=<formatted
markdown>) to include a language tag (e.g., change ``` to ```text) so
markdownlint MD040 is satisfied and renderers/lints treat it as plain text.
| ## Original size | ||
| <original_bytes> bytes → summary of <this note> | ||
| ``` |
There was a problem hiding this comment.
Don't require an exact byte count the runtime never passes in.
The prompt asks for ## Original size with <original_bytes>, but this agent only receives tool name, optional hint, and raw output. That forces the model to guess or burn tokens estimating size. Either inject the exact byte count from Rust or make this section approximate/optional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/agents/summarizer/prompt.md` around lines 50 - 52, The
prompt currently requires an exact byte count under the header "## Original
size" using the placeholder "<original_bytes>", which the runtime does not
supply; change the prompt text so it does not demand an exact byte count — for
example replace the placeholder and heading with optional/approximate wording
like "## Original size (if known)" or "Approximate original size" and allow
"unknown" or "approx." as valid values; update the template lines containing "##
Original size" and "<original_bytes>" to reflect this optional/approximate
wording so the model won't try to guess exact bytes, or alternatively add a
clear note that the runtime may inject the exact byte count if a future Rust
change provides it.
| // Bus path runs ad-hoc agent turns without an Agent | ||
| // handle, so we pass None — payload summarization is | ||
| // wired into the orchestrator session via Agent::turn, | ||
| // not the bus dispatcher. | ||
| None, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether orchestrator tool results from channel turns are expected to be summarized
rg -nP -C3 'maybe_summarize|PayloadSummarizer'
rg -nP -C2 'AGENT_RUN_TURN_METHOD|agent\.run_turn'Repository: tinyhumansai/openhuman
Length of output: 44901
🏁 Script executed:
# Get the complete bus handler implementation to see what's passed to run_tool_call_loop
sed -n '138,211p' src/openhuman/agent/bus.rsRepository: tinyhumansai/openhuman
Length of output: 2799
🏁 Script executed:
# Search for how orchestrator turns are executed - look for Agent::turn or agent.run_turn usage with orchestrator
rg -nP 'orchestrator.*turn|Agent::turn' --type rs -A2 -B2 | head -100Repository: tinyhumansai/openhuman
Length of output: 92
🏁 Script executed:
# Search for how orchestrator turns are executed
rg -nP 'orchestrator.*turn|Agent::turn' -A2 -B2 | head -150Repository: tinyhumansai/openhuman
Length of output: 8598
🏁 Script executed:
# Search for where orchestrator agent is invoked or routed to
rg -nP 'target_agent_id.*orchestrator|agent_id.*orchestrator' -B3 -A3 | head -150Repository: tinyhumansai/openhuman
Length of output: 6032
🏁 Script executed:
# Search for whether target_agent_id="orchestrator" is ever passed to the bus handler
rg -nP 'target_agent_id.*=.*"orchestrator"|"orchestrator".*target_agent_id' -B3 -A3Repository: tinyhumansai/openhuman
Length of output: 48
🏁 Script executed:
# Search for how target_agent_id is populated and used in dispatch
rg -nP 'target_agent_id' src/openhuman/channels/runtime/dispatch.rs -B2 -A2 | head -100Repository: tinyhumansai/openhuman
Length of output: 1462
🏁 Script executed:
# Find where target_id is determined (the code that sets target_agent_id)
rg -nP 'target_id\s*=' src/openhuman/channels/runtime/dispatch.rs -B5 -A5Repository: tinyhumansai/openhuman
Length of output: 509
The scope gap is real: orchestrator turns routed through channels skip payload summarization.
The bus handler passes None for payload_summarizer regardless of whether target_agent_id is "orchestrator". Since channels route orchestrator turns through agent.run_turn when onboarding is complete (src/openhuman/channels/runtime/dispatch.rs:355-356), oversized tool results from orchestrator turns initiated via channels are not summarized—they bypass the PayloadSummarizer logic that is only wired at the AgentBuilder level in session context. This is not intentional channel-specific compaction; it's a gap where orchestrator's summarization machinery is disabled depending on dispatch path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/bus.rs` around lines 194 - 198, The bus handler currently
passes None for payload_summarizer which skips payload summarization for
orchestrator turns routed via channels; change the logic in the bus dispatch
where payload_summarizer is passed (look for the None argument alongside
target_agent_id and Agent::turn/agent.run_turn invocations) so that when
target_agent_id == "orchestrator" you supply the session's/AgentBuilder's
PayloadSummarizer instance instead of None (i.e., forward the same
payload_summarizer used by the AgentBuilder/session context so orchestrator
turns take the same summarization path).
| if !definition.extra_tools.is_empty() { | ||
| let disallow_set: std::collections::HashSet<&str> = definition | ||
| .disallowed_tools | ||
| .iter() | ||
| .map(|s| s.as_str()) | ||
| .collect(); | ||
| for (i, tool) in parent.all_tools.iter().enumerate() { | ||
| let name = tool.name(); | ||
| if definition.extra_tools.iter().any(|n| n == name) | ||
| && !allowed_indices.contains(&i) | ||
| && !disallow_set.contains(name) | ||
| { | ||
| allowed_indices.push(i); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find every definition that sets extra_tools together with a Named ToolScope
# to gauge whether this divergence is reachable today.
rg -nP -C6 'extra_tools\s*='
rg -nP -C3 'ToolScope::Named'Repository: tinyhumansai/openhuman
Length of output: 18041
extra_tools ignores ToolScope restrictions — diverges from definition contract.
The loop force-pushes every name in definition.extra_tools into allowed_indices as long as it isn't in disallowed_tools, but does not consult definition.tools (ToolScope). The field's docstring in src/openhuman/agent/harness/definition.rs:126-132 explicitly states extra_tools "bypass the category_filter check but are still subject to disallowed_tools and ToolScope restrictions."
Currently latent: skills_agent (the only agent declaring extra_tools in TOML) uses ToolScope::Wildcard, so no scope breach occurs today. However, if a future agent declares ToolScope::Named([...]) + extra_tools = [...], this code will incorrectly leak tools the scope intended to exclude. Either enforce scope here, or update the field doc to say scope is also bypassed.
🛠️ Proposed fix (enforce scope)
if !definition.extra_tools.is_empty() {
let disallow_set: std::collections::HashSet<&str> = definition
.disallowed_tools
.iter()
.map(|s| s.as_str())
.collect();
for (i, tool) in parent.all_tools.iter().enumerate() {
let name = tool.name();
- if definition.extra_tools.iter().any(|n| n == name)
- && !allowed_indices.contains(&i)
- && !disallow_set.contains(name)
- {
- allowed_indices.push(i);
- }
+ if !definition.extra_tools.iter().any(|n| n == name)
+ || allowed_indices.contains(&i)
+ || disallow_set.contains(name)
+ {
+ continue;
+ }
+ let scope_ok = match &definition.tools {
+ ToolScope::Wildcard => true,
+ ToolScope::Named(allowed) => allowed.iter().any(|n| n == name),
+ };
+ if scope_ok {
+ allowed_indices.push(i);
+ }
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 246 - 261, The
loop that adds definition.extra_tools into allowed_indices ignores
definition.tools (ToolScope) and thus can leak tools; modify the logic in
subagent_runner.rs so each extra_tools entry is also vetted against
definition.tools: when definition.tools is ToolScope::Named([...]) only allow
the tool if its name is listed there; when ToolScope::Wildcard allow as before;
still enforce disallowed_tools via disallow_set and existing allowed_indices
checks; reference definition.extra_tools, definition.tools (ToolScope),
definition.disallowed_tools, parent.all_tools, allowed_indices and tool.name()
to locate and update the condition.
…ed extraction Two optimisations to the oversize-handoff / extract_from_result pipeline: 1. Concat chunk summaries in order; drop reduce stage. The reduce-stage summarizer call was the slowest single turn of the pipeline and a single point of failure when the upstream provider hung — a hung reduce burned minutes before timing out. Map summaries are now sorted by original chunk index and concatenated with a plain-separator join. For listing/extraction queries this is equivalent; for top-N / global-ordering queries the caller can post-process. Observed: 8m24s → ~70s on a Notion SEARCH run. 2. Generic cleaner before the oversize check. Strips <script>/<style>/<svg> blocks, HTML comments, data:...;base64,... inline URIs, remaining HTML tags, and collapses whitespace. Applied on every tool output so results that are mostly markup drop below the 20k-token threshold and skip the extract pipeline entirely; the ones that remain oversized get chunked on clean bytes. Debug log emits before/after bytes + saved_pct per call. No changes to the extract_from_result tool surface or placeholder format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The client-side safety timer in Conversations.tsx was a flat 120s
deadline from the send, not a silence window. Long agent turns
(chained tool calls, chunked extraction fan-outs, subagent spawns)
would trip the timeout even though the backend was actively making
progress — showing "No response from the assistant after 2 minutes"
while the server was still working.
The effect now watches inferenceStatusByThread and rearms the timer
on every status change for the sending thread. Any tool_call /
tool_result / iteration_start / subagent_{spawned,done} event resets
the 120s window. When status is cleared (chat_done / chat_error),
the timer is dropped. A truly silent server still fails after 120s
as before.
Tracks the sending thread id in a ref (sendingThreadIdRef) so
switching threads mid-turn doesn't move the timer's reference point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
app/src/pages/Conversations.tsx (1)
300-316: Add[chat]debug logs for arm/rearm/clear transitions.Only the timeout-fire path currently logs. Given this is a new reliability control with three distinct state transitions (initial arm, rearm-on-progress, clear-on-completion), traceable logs would make staging/production incident triage much easier — especially for the rearm path, which is driven by a Redux selector and can be hard to attribute after the fact.
🪵 Proposed logs
const armSilenceTimer = (threadId: string) => { - if (sendingTimeoutRef.current) clearTimeout(sendingTimeoutRef.current); + if (sendingTimeoutRef.current) { + clearTimeout(sendingTimeoutRef.current); + console.debug('[chat] silence-timer rearm', { threadId }); + } else { + console.debug('[chat] silence-timer arm', { threadId }); + } sendingThreadIdRef.current = threadId; sendingTimeoutRef.current = setTimeout(() => { @@ useEffect(() => { const threadId = sendingThreadIdRef.current; if (!threadId || !sendingTimeoutRef.current) return; const status = inferenceStatusByThread[threadId]; if (status === undefined) { + console.debug('[chat] silence-timer clear (status cleared)', { threadId }); clearTimeout(sendingTimeoutRef.current); sendingTimeoutRef.current = null; sendingThreadIdRef.current = null; return; } armSilenceTimer(threadId);As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows … log critical checkpoints including … state transitions, and error handling paths" and "Use grep-friendly log prefixes".
Also applies to: 323-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/pages/Conversations.tsx` around lines 300 - 316, Add grep-friendly "[chat]" debug logs tracing the three silence-timer state transitions: when arming (in armSilenceTimer right after setting sendingTimeoutRef.current and sendingThreadIdRef.current), when rearming (wherever the selector-driven rearm logic updates sendingTimeoutRef or calls armSilenceTimer again — log previous threadId, new threadId and timeout reset), and when clearing (where you call dispatch(clearRuntimeForThread and dispatch(setActiveThread) or clearTimeout — log that the timer was cleared and which thread was cleared). Ensure logs appear before/after the state changes and include sendingThreadIdRef.current, whether a previous timeout was cleared, and a concise message like "[chat] arm", "[chat] rearm", "[chat] clear" to aid grep-based triage.src/openhuman/agent/harness/subagent_runner.rs (2)
789-816: Text-mode override silently no-ops if no system message exists.
history.iter_mut().find(|m| m.role == "system")is best-effort: if an upstream caller ever invokesrun_inner_loopwith a history that lacks a system message (e.g. a future refactor, or a test harness), the XML tool protocol + catalogue never get injected, the model has no call format to follow, and the subsequentparse_tool_callspass returns zero calls — the loop then terminates on iteration 1 with whatever prose the model guessed. Bothrun_typed_modeandrun_fork_modecurrently guarantee a system message at index 0, so this is latent, but adebug_assert!or an explicit error here would make the invariant explicit and catch regressions early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 789 - 816, The text-mode injection silently no-ops if no system message exists; update the force_text_mode path (around force_text_mode and the history.iter_mut().find(|m| m.role == "system") usage) to assert or fail fast when a system message is missing: add a debug_assert! that history contains a message with role "system" (or return an Err / tracing::error and bail from run_inner_loop) before calling build_text_mode_tool_instructions so the XML protocol is never silently omitted; reference run_inner_loop / build_text_mode_tool_instructions / parse_tool_calls in the check to make the invariant explicit and catch regressions early.
966-1019: Fragilestarts_with("Error")used for both skip-handoff and status classification.Lines 968 and 1008 both rely on the cleaned text starting with
"Error"to decide (a) whether to bypass the handoff cache and (b) whatstatusto emit on the<tool_result>tag. A legitimate tool output whose first characters happen to be"Error"(e.g. a docs lookup about HTTP error codes, or a Composio action that returned a summary prefixed with "Error budget remaining…") will both skip the handoff (landing a potentially multi‑MB payload inline and risking the context-overflow this PR is meant to prevent) and get mislabelledstatus="error"to the model.The error path in the tool dispatch above already has precise signals —
result.is_error, theErr/timeoutarms, and the explicit"not available"string — so a dedicatedis_error: boolaccumulated alongsideresult_textwould be more robust than string-sniffing the final text.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 966 - 1019, The code is using fragile string-sniffing (result_text.starts_with("Error")) to decide handoff bypass and tool_result status; instead, propagate a boolean error flag (e.g., is_error from the tool dispatch — result.is_error / Err / timeout branches) together with result_text and use that flag in both the handoff logic (skip_cleaning) and when computing status for force_text_mode, replacing both starts_with checks; update the code paths that set result_text (where result.is_error is available) to also set is_error, then reference is_error in the skip_cleaning condition (alongside call.name checks) and when setting status for text_mode_result_block so oversized outputs aren't misclassified or inlined due to incidental text content.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 1069-1090: The single-shot extraction path (extract_single_shot)
is never taken because HANDOFF_OVERSIZE_THRESHOLD_TOKENS (20_000 ≈ 80k chars)
makes cached entries larger than SUMMARIZER_CHUNK_CHAR_BUDGET (60_000), so
reduce the mismatch by either lowering HANDOFF_OVERSIZE_THRESHOLD_TOKENS to
~15_000 (≈60k chars) or raising SUMMARIZER_CHUNK_CHAR_BUDGET above the 80k floor
so cached.content.len() can be <= budget, and add a unit/integration test that
constructs a cached entry around the threshold to assert extract_single_shot is
invoked; update references to HANDOFF_OVERSIZE_THRESHOLD_TOKENS,
SUMMARIZER_CHUNK_CHAR_BUDGET, and extract_single_shot accordingly.
---
Nitpick comments:
In `@app/src/pages/Conversations.tsx`:
- Around line 300-316: Add grep-friendly "[chat]" debug logs tracing the three
silence-timer state transitions: when arming (in armSilenceTimer right after
setting sendingTimeoutRef.current and sendingThreadIdRef.current), when rearming
(wherever the selector-driven rearm logic updates sendingTimeoutRef or calls
armSilenceTimer again — log previous threadId, new threadId and timeout reset),
and when clearing (where you call dispatch(clearRuntimeForThread and
dispatch(setActiveThread) or clearTimeout — log that the timer was cleared and
which thread was cleared). Ensure logs appear before/after the state changes and
include sendingThreadIdRef.current, whether a previous timeout was cleared, and
a concise message like "[chat] arm", "[chat] rearm", "[chat] clear" to aid
grep-based triage.
In `@src/openhuman/agent/harness/subagent_runner.rs`:
- Around line 789-816: The text-mode injection silently no-ops if no system
message exists; update the force_text_mode path (around force_text_mode and the
history.iter_mut().find(|m| m.role == "system") usage) to assert or fail fast
when a system message is missing: add a debug_assert! that history contains a
message with role "system" (or return an Err / tracing::error and bail from
run_inner_loop) before calling build_text_mode_tool_instructions so the XML
protocol is never silently omitted; reference run_inner_loop /
build_text_mode_tool_instructions / parse_tool_calls in the check to make the
invariant explicit and catch regressions early.
- Around line 966-1019: The code is using fragile string-sniffing
(result_text.starts_with("Error")) to decide handoff bypass and tool_result
status; instead, propagate a boolean error flag (e.g., is_error from the tool
dispatch — result.is_error / Err / timeout branches) together with result_text
and use that flag in both the handoff logic (skip_cleaning) and when computing
status for force_text_mode, replacing both starts_with checks; update the code
paths that set result_text (where result.is_error is available) to also set
is_error, then reference is_error in the skip_cleaning condition (alongside
call.name checks) and when setting status for text_mode_result_block so
oversized outputs aren't misclassified or inlined due to incidental text
content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7a6ccf19-887f-4b07-aff7-cb36dcbc1aea
📒 Files selected for processing (2)
app/src/pages/Conversations.tsxsrc/openhuman/agent/harness/subagent_runner.rs
| /// Token threshold above which a tool result is routed to the handoff | ||
| /// cache instead of being pushed into history raw. 20 000 tokens keeps | ||
| /// the sub-agent's per-iteration prompt well below the 196 607-token | ||
| /// model ceiling even after many iterations, and leaves comfortable | ||
| /// headroom for the system prompt + tool catalogue. Token count is | ||
| /// estimated at ~4 chars/token (mirrors | ||
| /// [`crate::openhuman::agent::harness::payload_summarizer`] and | ||
| /// [`crate::openhuman::tree_summarizer::types::estimate_tokens`]). | ||
| const HANDOFF_OVERSIZE_THRESHOLD_TOKENS: usize = 20_000; | ||
|
|
||
| /// Characters of the raw payload to surface in the placeholder preview. | ||
| /// Enough for the sub-agent to recognise the shape (JSON keys, first | ||
| /// record) and often small enough to answer trivial questions without a | ||
| /// follow-up `extract_from_result` call. | ||
| const HANDOFF_PREVIEW_CHARS: usize = 1500; | ||
|
|
||
| /// Maximum entries per session. Bounded to keep memory use predictable | ||
| /// on long-running sub-agents that might call many large tools. When | ||
| /// over capacity we evict the oldest entry (FIFO); callers see "no | ||
| /// cached result" for evicted ids and can either re-run the tool or | ||
| /// ask the user/orchestrator to narrow the request. | ||
| const HANDOFF_MAX_ENTRIES: usize = 8; |
There was a problem hiding this comment.
Single-shot extraction path is unreachable with current thresholds.
HANDOFF_OVERSIZE_THRESHOLD_TOKENS = 20_000at ~4 chars/token means the cache only stores payloads whencleaned.len() ≥ 80_001bytes (line 985‑987).SUMMARIZER_CHUNK_CHAR_BUDGET = 60_000bytes gates the fast path at line 1249 (cached.content.len() <= 60_000).
Every cached entry is therefore strictly larger than the single-shot budget, so extract_single_shot is dead code in production. Functionally harmless (the chunk path with vec![content] would produce equivalent output for a 1-chunk split), but you lose the stated wall-clock optimization and the extra scaffolding runs for no reason.
Consider either lowering HANDOFF_OVERSIZE_THRESHOLD_TOKENS to ~15_000 (≈60k bytes) so small-ish oversized payloads can take the fast path, or raising SUMMARIZER_CHUNK_CHAR_BUDGET above the 80k-byte floor, or adding a test that pins the single-shot path to catch future drift.
Also applies to: 1249-1258, 1411-1411
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/harness/subagent_runner.rs` around lines 1069 - 1090, The
single-shot extraction path (extract_single_shot) is never taken because
HANDOFF_OVERSIZE_THRESHOLD_TOKENS (20_000 ≈ 80k chars) makes cached entries
larger than SUMMARIZER_CHUNK_CHAR_BUDGET (60_000), so reduce the mismatch by
either lowering HANDOFF_OVERSIZE_THRESHOLD_TOKENS to ~15_000 (≈60k chars) or
raising SUMMARIZER_CHUNK_CHAR_BUDGET above the 80k floor so cached.content.len()
can be <= budget, and add a unit/integration test that constructs a cached entry
around the threshold to assert extract_single_shot is invoked; update references
to HANDOFF_OVERSIZE_THRESHOLD_TOKENS, SUMMARIZER_CHUNK_CHAR_BUDGET, and
extract_single_shot accordingly.
…-subagent # Conflicts: # src/openhuman/agent/agents/mod.rs
Summary
skills_agent): payloads over 20 000 tokens are stashed in a per-spawn cache and replaced in history with a short placeholder +extract_from_result(result_id, query)tool. Chunked map-reduce extraction with bounded concurrency (buffer_unordered(3)).skills_agentso large Composio toolkits (Gmail, Notion, GitHub, Salesforce, HubSpot, Google Workspace) survive the provider's 65 535-grammar-rule ceiling.top_ksplit: 12 for heavy-schema toolkits, 25 default.csv_exportremoved fromskills_agentextra_tools(keptfile_write); it was triggering superfluous export calls after extraction had already answered.Problem
Two distinct failure modes were observed in staging once Composio toolkits were exercised end-to-end:
tool_result_budget_bytes, hard-truncated mid-payload and lost anything past the cut — no identifiers preserved, no structure.skills_agentspawn. When the orchestrator delegated work toskills_agentover a heavy toolkit, the combined per-action JSON schemas intools: [...]exceeded Fireworks'uint16_tgrammar-rule cap (65 535) and the provider returned 400 before any generation ran. Fuzzy filtering the toolkit down didn't help — even 16 Notion actions produced 101 295 rules.Solution
Summarizer sub-agent (
payload_summarizer.rs, orchestrator-scoped)Dedicated
summarizersub-agent (model.hint = "summarization") automatically dispatched when a tool result exceedssummarizer_payload_threshold_tokens. Compresses per an extraction contract that preserves identifiers; compressed summary replaces the raw payload. Session-scoped circuit breaker (3 consecutive failures disables for the session). Scoped to orchestrator only — welcome / researcher / planner etc. receiveNone. Runtime-dispatched only;orchestrator_tools::collect_orchestrator_toolsfilters the summarizer id so the orchestrator's LLM never sees adelegate_summarizertool.Progressive-disclosure handoff (
subagent_runner.rs,skills_agent-scoped)When
skills_agentis spawned with a Composio toolkit, a per-spawnResultHandoffCache+ newExtractFromResultToolare wired in. Any tool result overHANDOFF_OVERSIZE_THRESHOLD_TOKENS(20 000) is stashed and replaced in history with a placeholder:Three options for the sub-agent:
extract_from_result(result_id, query)— dispatches the summarizer against the cached payload withqueryas the extraction contract. For content overSUMMARIZER_CHUNK_CHAR_BUDGET(60 000 chars) runs a chunked map-reduce: split at blank-line / newline boundaries, map each chunk to a per-chunk partial with bounded concurrency (buffer_unordered(3)— unboundedjoin_allwas storming the gateway with 504s), then reduce partials in one more summarizer turn. Falls back to concatenation if the reduce stage fails.Cache is bounded at 8 entries (FIFO eviction) per sub-agent spawn.
Text-mode dispatcher for
skills_agentrun_inner_loopnow detectsskills_agentand omitstools: [...]from the API request, instead appending an<tool_call>{...}</tool_call>XML protocol block with compact per-tool parameter summaries into the system prompt. Tool calls are parsed from the model's free-form response via the sharedparse_tool_callshelper. MirrorsXmlToolDispatcherbehaviour. Fireworks never compiles a grammar from the tool list, so the rule-count ceiling stops mattering.Per-toolkit top-K
HEAVY_SCHEMA_TOOLKITS = [gmail, notion, github, salesforce, hubspot, googledrive, googlesheets, googledocs, microsoftteams]→top_k = 12. All other toolkits →top_k = 25(existing default).csv_exportremoved fromskills_agent.extra_toolsThe export tool was triggering superfluous file writes after extraction had already produced a complete answer.
file_writealone covers the Path B export case. The tool implementation remains for anyone who needs to re-enable it.Submission Checklist
cargo testgreen acrosspayload_summarizer::tests(6),tool_loop::tests(10, includingrun_tool_call_loop_intercepts_oversized_tool_results_via_summarizer),subagent_runner::tests(19),builtin_definitions::tests(5),csv_export::tests(7).skills_agentreturns from preview in ~14 s on the fast path. No automated E2E spec added in this PR.payload_summarizer.rs; item-level docs onResultHandoffCache,ExtractFromResultTool,build_text_mode_tool_instructions,summarise_parameters,chunk_content,top_k_for_toolkit, andbuild_handoff_placeholder.[subagent_runner:text-mode],[subagent_runner:handoff],[extract_from_result],[payload_summarizer],[agent::builder]) at every decision point, with structured fields (raw_tokens,raw_bytes,chunk_count,chunk_budget,result_id,threshold_tokens,top_k).Impact
skills_agentrouting swaps to text mode when toolkit is present — slight reliability trade vs. native structured tool calls, offset by being able to run at all on large toolkits (Notion was returning 400 before).extract_from_result; trivial queries answerable from the preview pay zero extra. Bounded parallelism (3) trades some wall-clock time for reliability under gateway limits.SecurityPolicy. Handoff cache is per-spawn (not global) and lives only for the duration ofrun_typed_mode.#[serde(alias = "summarizer_payload_threshold_bytes")]/"summarizer_max_payload_bytes"keep existingconfig.tomlfiles parsing cleanly through the rename.run_inner_loopupdated to passNonefor the newhandoff_cacheparameter — zero existing tests broke.Related
model.hint = "summarization"and trusts model routing to resolve the hint; Wire new summarizer model into routing and agent architecture #575 is out of scope here.skills_agentextract-loop discipline: the model occasionally re-callsextract_from_resultmultiple times despite a clear first answer. Needs a stronger in-prompt stopping rule or a dedupe at the tool layer.file_readat 50 KB, heavier Composio actions at a higher cap).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Configuration