feat(agent): codify chat → reasoning → worker spawn hierarchy#2026
Conversation
Introduces an `AgentTier` ({Chat, Reasoning, Worker}, default Worker)
field on AgentDefinition and a loader-time `validate_tier_hierarchy`
check so the spawn surface mirrors the cost/latency split between
models:
* chat (fast UX, e.g. orchestrator) → reasoning OR worker, never chat
* reasoning (deep thinking, e.g. planner) → worker, never reasoning
* worker (leaf executors) → nothing in `subagents`
Tags `orchestrator = chat` and `planner = reasoning`; all other
built-ins inherit the worker default. Skill-wildcard entries are
exempt because they collapse to a single `delegate_to_integrations_agent`
tool aimed at a worker.
Adds matching prompt-level rules to orchestrator/prompt.md and
planner/prompt.md, and a new "Spawn hierarchy and tiers" section in
gitbooks/developing/architecture/agent-harness.md.
Registry::load() re-validates after merging workspace TOML overrides
so custom user agents are held to the same contract.
Runtime depth gate (MAX_SPAWN_DEPTH = 3 task-local) is referenced in
the doc and prompts as defence-in-depth but is deferred to a follow-up.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughThis PR adds tier-based spawn-hierarchy validation to the agent harness. Built-in agents (orchestrator=Chat, planner=Reasoning) are assigned tiers that enforce static delegation rules (Chat→Reasoning/Worker, Reasoning→Worker, Worker is leaf). A new validator runs at loader and registry build time, while test fixtures are updated to include the required ChangesAgent Tier Hierarchy and Validation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/openhuman/agent/agents/planner/prompt.md (1)
43-43: 💤 Low valueClarify the tier prohibition phrasing.
The parenthetical examples group "no planner-spawns-planner" (reasoning→reasoning) with "no planner-spawns-orchestrator" (reasoning→chat) under the label "Never delegate to another reasoning agent", which incorrectly implies the orchestrator is a reasoning-tier agent. The orchestrator is chat tier (per
orchestrator/agent.tomlline 13).The rule is correct (reasoning tier can only spawn worker tier), but the phrasing could be clearer.
Suggested rephrasing for clarity
-**You are the reasoning tier.** The chat-tier Orchestrator handed off to you because the task needs sustained thinking. Compose plans for the **worker tier** — `code_executor`, `researcher`, `critic`, `integrations_agent`, `archivist`. **Never delegate to another reasoning agent** (no planner-spawns-planner, no planner-spawns-orchestrator); the loader and the harness depth gate will reject it. If a single worker can't cover a node, split the node — don't smuggle a second reasoning hop in. +**You are the reasoning tier.** The chat-tier Orchestrator handed off to you because the task needs sustained thinking. Compose plans for the **worker tier** — `code_executor`, `researcher`, `critic`, `integrations_agent`, `archivist`. **Never delegate to chat or reasoning tiers** (no planner-spawns-planner, no planner-spawns-orchestrator); the loader and the harness depth gate will reject it. If a single worker can't cover a node, split the node — don't smuggle a second reasoning hop in.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/agents/planner/prompt.md` at line 43, The phrasing incorrectly groups "no planner-spawns-orchestrator" with reasoning→reasoning examples and may imply the orchestrator is a reasoning-tier agent; update the sentence in the "You are the reasoning tier." paragraph to state clearly that the reasoning tier may only spawn worker-tier agents (code_executor, researcher, critic, integrations_agent, archivist) and not other reasoning or chat-tier agents, and remove or reword the parenthetical so it does not list "orchestrator" as an example of a reasoning agent (refer to the symbol orchestrator and the worker names to locate the text to edit).gitbooks/developing/architecture/agent-harness.md (1)
199-204: ⚡ Quick winClarify implementation status of runtime depth gate.
The description on line 202 uses present tense ("caps total spawn chain depth") but the status note on line 204 indicates this is "sketched" rather than live. Consider rewording the runtime enforcement description to make it clear this is a planned safeguard, not yet active.
✏️ Suggested clarification
-2. **Runtime depth gate (dynamic).** Independent of tier, the sub-agent runner caps total spawn chain depth at `MAX_SPAWN_DEPTH = 3` via a task-local counter incremented across `run_subagent`. A user-shipped TOML that drops the tier annotation still can't recurse past three hops. The harness surfaces this as the `SpawnDepthExceeded` agent error. +2. **Runtime depth gate (dynamic, planned).** Independent of tier, the sub-agent runner will cap total spawn chain depth at `MAX_SPAWN_DEPTH = 3` via a task-local counter incremented across `run_subagent`. A user-shipped TOML that drops the tier annotation will not be able to recurse past three hops. The harness will surface this as the `SpawnDepthExceeded` agent error.Alternatively, if the runtime gate is partially implemented, clarify which parts are live vs. sketched.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gitbooks/developing/architecture/agent-harness.md` around lines 199 - 204, The runtime depth-gate description uses present-tense but the status note says it's only sketched; update the wording to clearly mark the runtime enforcement as planned/partial or describe which pieces are implemented vs. sketched: mention MAX_SPAWN_DEPTH, the task-local counter sketch in harness/fork_context.rs, and the gating in subagent_runner::run_subagent as not-yet-fully-active (or list which of those are already implemented), while keeping the loader-time enforcement (agents::loader::validate_tier_hierarchy and the agent_tier field) identified as live and keep SpawnDepthExceeded referenced as the intended surfaced error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gitbooks/developing/architecture/agent-harness.md`:
- Around line 174-182: The fenced ASCII diagram block beginning with "Chat
(fast, UX-focused — e.g. orchestrator on `chat` hint)" is missing a language
specifier; update the opening triple backticks to include a language (e.g.,
```text) so the diagram renders correctly, leaving the block contents unchanged.
In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Line 43: The text in the spawn hierarchy section asserts "Total chain depth is
capped at 3 hops by the harness" but the runtime enforcement is described
elsewhere as a planned gate (MAX_SPAWN_DEPTH = 3), so update the sentence in
prompt.md (the "Spawn hierarchy (hard rule)" line) to reflect that enforcement
is not yet live: either change to "will be capped at 3 hops" or append a
parenthetical note like "(enforcement tracked in `#XXXX` / planned via
MAX_SPAWN_DEPTH = 3)"; ensure references to MAX_SPAWN_DEPTH remain consistent
and add the issue/PR number if available.
---
Nitpick comments:
In `@gitbooks/developing/architecture/agent-harness.md`:
- Around line 199-204: The runtime depth-gate description uses present-tense but
the status note says it's only sketched; update the wording to clearly mark the
runtime enforcement as planned/partial or describe which pieces are implemented
vs. sketched: mention MAX_SPAWN_DEPTH, the task-local counter sketch in
harness/fork_context.rs, and the gating in subagent_runner::run_subagent as
not-yet-fully-active (or list which of those are already implemented), while
keeping the loader-time enforcement (agents::loader::validate_tier_hierarchy and
the agent_tier field) identified as live and keep SpawnDepthExceeded referenced
as the intended surfaced error.
In `@src/openhuman/agent/agents/planner/prompt.md`:
- Line 43: The phrasing incorrectly groups "no planner-spawns-orchestrator" with
reasoning→reasoning examples and may imply the orchestrator is a reasoning-tier
agent; update the sentence in the "You are the reasoning tier." paragraph to
state clearly that the reasoning tier may only spawn worker-tier agents
(code_executor, researcher, critic, integrations_agent, archivist) and not other
reasoning or chat-tier agents, and remove or reword the parenthetical so it does
not list "orchestrator" as an example of a reasoning agent (refer to the symbol
orchestrator and the worker names to locate the text to edit).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6076be2b-20f7-4cec-b08f-03e5547aaf68
📒 Files selected for processing (14)
gitbooks/developing/architecture/agent-harness.mdsrc/openhuman/agent/agents/loader.rssrc/openhuman/agent/agents/mod.rssrc/openhuman/agent/agents/orchestrator/agent.tomlsrc/openhuman/agent/agents/orchestrator/prompt.mdsrc/openhuman/agent/agents/planner/agent.tomlsrc/openhuman/agent/agents/planner/prompt.mdsrc/openhuman/agent/harness/builtin_definitions.rssrc/openhuman/agent/harness/definition.rssrc/openhuman/agent/harness/definition_tests.rssrc/openhuman/agent/harness/payload_summarizer.rssrc/openhuman/agent/harness/subagent_runner/ops_tests.rssrc/openhuman/channels/runtime/dispatch.rssrc/openhuman/tools/orchestrator_tools.rs
Address CodeRabbit suggestions on PR tinyhumansai#2026: - arch-doc ASCII diagram: add `text` language tag to the fenced block (markdownlint MD040). - orchestrator/prompt.md, planner/prompt.md, agent-harness.md: soften "Total chain depth is capped at 3 hops by the harness" to reflect that the runtime `MAX_SPAWN_DEPTH` task-local is a planned follow-up; only the loader-time tier check is live today.
Summary
AgentTierenum ({Chat, Reasoning, Worker}, default Worker) onAgentDefinition; taggedorchestrator = chatandplanner = reasoning. Everything else inherits the worker default.validate_tier_hierarchy()enforces: chat must not spawn chat, reasoning must not spawn reasoning, worker must not list any subagents. Skill-wildcard entries are exempt (they collapse to onedelegate_to_integrations_agenttool pointed at a worker).orchestrator/prompt.md(hand off to reasoning tier for sustained thinking; never spawn chat) andplanner/prompt.md(never delegate to another reasoning agent).gitbooks/developing/architecture/agent-harness.mdwith the ASCII diagram, tier table, and enforcement notes.Registry::load()re-validates after merging workspace TOML overrides so custom user agents are held to the same contract.Problem
The newly-introduced chat tier (fast UX model on the
chathint,reasoning-quick-v1/ Kimi K2.6 Turbo) is great for TTFT but weak at sustained multi-step reasoning. There was no codified rule that:Without these rules, an over-eager router or a custom user TOML can produce
chat → chat → chatchains or recursive reasoning loops that burn tokens and latency without buying capability.Solution
Codify the hierarchy at three layers so the rules are visible to humans, the model, and the registry loader:
harness/definition.rs):AgentTierenum +agent_tierfield. DefaultWorkerso existing specialists need no edits. Documented contract on the enum doc comment.agents/loader.rs):validate_tier_hierarchy()walkssubagentslists and rejects same-tier and worker-with-subagents entries. Called fromload_builtins()and fromAgentDefinitionRegistry::load()after workspace overrides are merged.chat; planner →reasoning; all others inheritworker.MAX_SPAWN_DEPTH = 3) is referenced as the planned defence-in-depth follow-up.Tests: 5 new contract tests (
orchestrator_is_chat_tier,planner_is_reasoning_tier,other_builtins_default_to_worker_tier,rejects_chat_to_chat_delegation,rejects_reasoning_to_reasoning_delegation,rejects_worker_with_subagents,allows_skill_wildcards_on_any_non_worker_tier). All 32agents::loadertests pass.Submission Checklist
loader.rs(validate_tier_hierarchy) is covered by 5 new tests (chat→chat, reasoning→reasoning, worker-with-subagents, skill-wildcards-allowed, plus tier-tag assertions). NewAgentTierfield is exercised in every loader test. No Vitest changes — Rust-only.## Related— no matrix entries touched.Closes #NNN— no issue tracker entry for this proactive refactor.Impact
O(n_agents * n_subagents), trivial). No per-spawn overhead.agent_tierfield default toworker(so they keep working unless they also declared subagents — which would now correctly fail).MAX_SPAWN_DEPTH = 3task-local gate +SpawnDepthExceedederror variant is referenced in the doc as defence-in-depth; deferred to a separate PR to keep this change focused.Related
SPAWN_DEPTHinharness/fork_context.rs, gated insubagent_runner::run_subagent, newAgentError::SpawnDepthExceededvariant — see the gap noted inharness_gap_tests.rs).AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
cargo test --lib agents::loader→ 32/32 pass (including 6 new tier contract tests)cargo fmt && cargo checkcleanValidation Blocked
Behavior Changes
Parity Contract
agent_tierdefaults toworker, every existing built-in keeps its current subagent surface, and all pre-existing tests still pass.integrations_agentworker via a single delegation tool — not a recursive spawn.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Chores