Skip to content

feat(agent): codify chat → reasoning → worker spawn hierarchy#2026

Merged
senamakel merged 3 commits into
tinyhumansai:mainfrom
senamakel:feat/agent-spawn-hierarchy-tiers
May 18, 2026
Merged

feat(agent): codify chat → reasoning → worker spawn hierarchy#2026
senamakel merged 3 commits into
tinyhumansai:mainfrom
senamakel:feat/agent-spawn-hierarchy-tiers

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 18, 2026

Summary

  • New AgentTier enum ({Chat, Reasoning, Worker}, default Worker) on AgentDefinition; tagged orchestrator = chat and planner = reasoning. Everything else inherits the worker default.
  • Loader-time validate_tier_hierarchy() enforces: chat must not spawn chat, reasoning must not spawn reasoning, worker must not list any subagents. Skill-wildcard entries are exempt (they collapse to one delegate_to_integrations_agent tool pointed at a worker).
  • Prompt-level rules added to orchestrator/prompt.md (hand off to reasoning tier for sustained thinking; never spawn chat) and planner/prompt.md (never delegate to another reasoning agent).
  • New "Spawn hierarchy and tiers" section in gitbooks/developing/architecture/agent-harness.md with the ASCII diagram, tier table, and enforcement notes.
  • Registry::load() re-validates after merging workspace TOML overrides so custom user agents are held to the same contract.

Problem

The newly-introduced chat tier (fast UX model on the chat hint, reasoning-quick-v1 / Kimi K2.6 Turbo) is great for TTFT but weak at sustained multi-step reasoning. There was no codified rule that:

  1. The chat tier must not spawn another chat agent (defeats the fast-tier purpose; doubles TTFT).
  2. The reasoning tier must not spawn another reasoning agent (chains of slow models re-decompose the same problem and blow up depth).
  3. Workers are leaves (so the parent always sees one compact result, not a transcript of nested delegations).
  4. Total spawn chain depth has a ceiling.

Without these rules, an over-eager router or a custom user TOML can produce chat → chat → chat chains or recursive reasoning loops that burn tokens and latency without buying capability.

Solution

Codify the hierarchy at three layers so the rules are visible to humans, the model, and the registry loader:

  • Data model (harness/definition.rs): AgentTier enum + agent_tier field. Default Worker so existing specialists need no edits. Documented contract on the enum doc comment.
  • Loader validation (agents/loader.rs): validate_tier_hierarchy() walks subagents lists and rejects same-tier and worker-with-subagents entries. Called from load_builtins() and from AgentDefinitionRegistry::load() after workspace overrides are merged.
  • TOML tagging: orchestrator → chat; planner → reasoning; all others inherit worker.
  • Prompts: chat-tier and reasoning-tier rules added to the two agents that occupy those tiers today.
  • Doc: canonical "Spawn hierarchy and tiers" section with the ASCII diagram and a tier table; runtime depth gate (MAX_SPAWN_DEPTH = 3) is referenced as the planned defence-in-depth follow-up.

Tests: 5 new contract tests (orchestrator_is_chat_tier, planner_is_reasoning_tier, other_builtins_default_to_worker_tier, rejects_chat_to_chat_delegation, rejects_reasoning_to_reasoning_delegation, rejects_worker_with_subagents, allows_skill_wildcards_on_any_non_worker_tier). All 32 agents::loader tests pass.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — new code in loader.rs (validate_tier_hierarchy) is covered by 5 new tests (chat→chat, reasoning→reasoning, worker-with-subagents, skill-wildcards-allowed, plus tier-tag assertions). New AgentTier field is exercised in every loader test. No Vitest changes — Rust-only.
  • N/A: Coverage matrix updated — behaviour-only / architecture change; no new user-visible feature row.
  • N/A: All affected feature IDs from the matrix are listed in the PR description under ## Related — no matrix entries touched.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: Manual smoke checklist updated — change is internal to the agent harness; no release-cut surface.
  • N/A: Linked issue closed via Closes #NNN — no issue tracker entry for this proactive refactor.

Impact

  • Runtime: Desktop core only. No frontend, Tauri, mobile, or CLI surface affected.
  • Performance: Loader runs the new validation once at boot (O(n_agents * n_subagents), trivial). No per-spawn overhead.
  • Security: Tightens the spawn surface — workers can no longer be tagged with subagents in custom TOMLs, and reasoning/chat loops are statically rejected at boot rather than discovered at runtime.
  • Migration: Backwards-compatible. Existing custom user TOMLs without an agent_tier field default to worker (so they keep working unless they also declared subagents — which would now correctly fail).
  • Follow-up: Runtime MAX_SPAWN_DEPTH = 3 task-local gate + SpawnDepthExceeded error variant is referenced in the doc as defence-in-depth; deferred to a separate PR to keep this change focused.

Related

  • Closes:
  • Follow-up PR(s)/TODOs: runtime depth-gate task-local (SPAWN_DEPTH in harness/fork_context.rs, gated in subagent_runner::run_subagent, new AgentError::SpawnDepthExceeded variant — see the gap noted in harness_gap_tests.rs).

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/agent-spawn-hierarchy-tiers
  • Commit SHA: 1d2efd4

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — Rust-only change
  • N/A: pnpm typecheck — Rust-only change
  • Focused tests: cargo test --lib agents::loader → 32/32 pass (including 6 new tier contract tests)
  • Rust fmt/check (if changed): cargo fmt && cargo check clean
  • N/A: Tauri fmt/check (if changed) — Tauri shell untouched

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: Loader now rejects agent registries that declare chat→chat or reasoning→reasoning delegation, or workers with non-empty subagent lists.
  • User-visible effect: None at runtime today (no built-in violates the contract). Custom user TOMLs that violate it will now fail at boot with a descriptive error instead of behaving subtly wrong at spawn time.

Parity Contract

  • Legacy behavior preserved: Yes — agent_tier defaults to worker, every existing built-in keeps its current subagent surface, and all pre-existing tests still pass.
  • Guard/fallback/dispatch parity checks: Skill-wildcard expansion is intentionally exempt from the tier check (documented inline) because it always routes to the integrations_agent worker via a single delegation tool — not a recursive spawn.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution: N/A

Summary by CodeRabbit

  • New Features

    • Enforced a three-tier agent spawn hierarchy (Chat, Reasoning, Worker) at registry/load time to prevent invalid delegation chains.
    • Clarified orchestrator/planner roles and updated prompts to enforce tiered delegation rules.
    • Planned runtime spawn-depth gate (max chain depth = 3) documented but not yet activated.
  • Chores

    • Updated built-in agent definitions, docs, and tests to reflect tier-based spawning.

Review Change Stack

senamakel added 2 commits May 17, 2026 18:15
Introduces an `AgentTier` ({Chat, Reasoning, Worker}, default Worker)
field on AgentDefinition and a loader-time `validate_tier_hierarchy`
check so the spawn surface mirrors the cost/latency split between
models:

  * chat (fast UX, e.g. orchestrator) → reasoning OR worker, never chat
  * reasoning (deep thinking, e.g. planner) → worker, never reasoning
  * worker (leaf executors) → nothing in `subagents`

Tags `orchestrator = chat` and `planner = reasoning`; all other
built-ins inherit the worker default. Skill-wildcard entries are
exempt because they collapse to a single `delegate_to_integrations_agent`
tool aimed at a worker.

Adds matching prompt-level rules to orchestrator/prompt.md and
planner/prompt.md, and a new "Spawn hierarchy and tiers" section in
gitbooks/developing/architecture/agent-harness.md.

Registry::load() re-validates after merging workspace TOML overrides
so custom user agents are held to the same contract.

Runtime depth gate (MAX_SPAWN_DEPTH = 3 task-local) is referenced in
the doc and prompts as defence-in-depth but is deferred to a follow-up.
@senamakel senamakel requested a review from a team May 18, 2026 01:19
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05f488a1-1baf-4d19-a15c-20b457316568

📥 Commits

Reviewing files that changed from the base of the PR and between 1d2efd4 and ab14f2e.

📒 Files selected for processing (3)
  • gitbooks/developing/architecture/agent-harness.md
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/agents/planner/prompt.md
✅ Files skipped from review due to trivial changes (2)
  • src/openhuman/agent/agents/planner/prompt.md
  • gitbooks/developing/architecture/agent-harness.md

📝 Walkthrough

Walkthrough

This PR adds tier-based spawn-hierarchy validation to the agent harness. Built-in agents (orchestrator=Chat, planner=Reasoning) are assigned tiers that enforce static delegation rules (Chat→Reasoning/Worker, Reasoning→Worker, Worker is leaf). A new validator runs at loader and registry build time, while test fixtures are updated to include the required agent_tier field.

Changes

Agent Tier Hierarchy and Validation

Layer / File(s) Summary
Tier Definition and Core Data Structures
src/openhuman/agent/harness/definition.rs
AgentTier enum (Chat, Reasoning, Worker) and agent_tier: AgentTier field added to AgentDefinition with serde defaults and comprehensive tier-semantics documentation.
Tier Hierarchy Validation Implementation
src/openhuman/agent/agents/loader.rs
validate_tier_hierarchy() function builds a tier lookup, iterates subagents, enforces tier constraints (Worker=leaf, Chat/Reasoning no-self-delegation, Skills wildcard exempt), with unit tests verifying built-in assignments and failure cases.
Built-in Loader and Module Exports
src/openhuman/agent/agents/loader.rs, src/openhuman/agent/agents/mod.rs
load_builtins() validates tier hierarchy on loaded built-ins; validator is re-exported from agents::mod.rs public API.
Registry Load-time Validation
src/openhuman/agent/harness/definition.rs
AgentDefinitionRegistry::load() re-validates merged (custom + built-in) definitions after overrides, surfacing hierarchy violations with context.
Built-in Agent Tier Assignments and Prompts
src/openhuman/agent/agents/orchestrator/agent.toml, src/openhuman/agent/agents/orchestrator/prompt.md, src/openhuman/agent/agents/planner/agent.toml, src/openhuman/agent/agents/planner/prompt.md
Orchestrator assigned Chat tier with delegation rules (use reasoning or worker, never chat→chat); planner assigned Reasoning tier with worker-only delegation constraints. Prompts and TOML updated with explicit tier-based handoff guidance.
Test Fixture Updates Across Harness
src/openhuman/agent/harness/builtin_definitions.rs, src/openhuman/agent/harness/definition_tests.rs, src/openhuman/agent/harness/payload_summarizer.rs, src/openhuman/agent/harness/subagent_runner/ops_tests.rs, src/openhuman/channels/runtime/dispatch.rs, src/openhuman/tools/orchestrator_tools.rs
All test-only AgentDefinition constructors updated with explicit agent_tier: AgentTier::Worker to match new required field.
Architecture Documentation
gitbooks/developing/architecture/agent-harness.md
New "Spawn hierarchy and tiers" section documents tier constraints, loader-time static validation, runtime spawn-depth enforcement, and status notes on implementation coverage.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1957: Updates test-only agent definitions in builtin_definitions.rs that now require tier assignments per this PR's schema changes.

Suggested labels

working

Poem

🐰 I hop through tiers where thoughts align,
Chat greets the user, plans trace the line,
Reasoning crafts maps for workers to run,
Load-time guards keep the spawn-chain to one,
A rabbit applauds: rules checked, tasks done.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and precisely describes the main change: introduction of a formal spawn-hierarchy structure for agents organized into three tiers (chat, reasoning, worker) with arrows indicating the delegation flow.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/openhuman/agent/agents/planner/prompt.md (1)

43-43: 💤 Low value

Clarify the tier prohibition phrasing.

The parenthetical examples group "no planner-spawns-planner" (reasoning→reasoning) with "no planner-spawns-orchestrator" (reasoning→chat) under the label "Never delegate to another reasoning agent", which incorrectly implies the orchestrator is a reasoning-tier agent. The orchestrator is chat tier (per orchestrator/agent.toml line 13).

The rule is correct (reasoning tier can only spawn worker tier), but the phrasing could be clearer.

Suggested rephrasing for clarity
-**You are the reasoning tier.** The chat-tier Orchestrator handed off to you because the task needs sustained thinking. Compose plans for the **worker tier** — `code_executor`, `researcher`, `critic`, `integrations_agent`, `archivist`. **Never delegate to another reasoning agent** (no planner-spawns-planner, no planner-spawns-orchestrator); the loader and the harness depth gate will reject it. If a single worker can't cover a node, split the node — don't smuggle a second reasoning hop in.
+**You are the reasoning tier.** The chat-tier Orchestrator handed off to you because the task needs sustained thinking. Compose plans for the **worker tier** — `code_executor`, `researcher`, `critic`, `integrations_agent`, `archivist`. **Never delegate to chat or reasoning tiers** (no planner-spawns-planner, no planner-spawns-orchestrator); the loader and the harness depth gate will reject it. If a single worker can't cover a node, split the node — don't smuggle a second reasoning hop in.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/agents/planner/prompt.md` at line 43, The phrasing
incorrectly groups "no planner-spawns-orchestrator" with reasoning→reasoning
examples and may imply the orchestrator is a reasoning-tier agent; update the
sentence in the "You are the reasoning tier." paragraph to state clearly that
the reasoning tier may only spawn worker-tier agents (code_executor, researcher,
critic, integrations_agent, archivist) and not other reasoning or chat-tier
agents, and remove or reword the parenthetical so it does not list
"orchestrator" as an example of a reasoning agent (refer to the symbol
orchestrator and the worker names to locate the text to edit).
gitbooks/developing/architecture/agent-harness.md (1)

199-204: ⚡ Quick win

Clarify implementation status of runtime depth gate.

The description on line 202 uses present tense ("caps total spawn chain depth") but the status note on line 204 indicates this is "sketched" rather than live. Consider rewording the runtime enforcement description to make it clear this is a planned safeguard, not yet active.

✏️ Suggested clarification
-2. **Runtime depth gate (dynamic).** Independent of tier, the sub-agent runner caps total spawn chain depth at `MAX_SPAWN_DEPTH = 3` via a task-local counter incremented across `run_subagent`. A user-shipped TOML that drops the tier annotation still can't recurse past three hops. The harness surfaces this as the `SpawnDepthExceeded` agent error.
+2. **Runtime depth gate (dynamic, planned).** Independent of tier, the sub-agent runner will cap total spawn chain depth at `MAX_SPAWN_DEPTH = 3` via a task-local counter incremented across `run_subagent`. A user-shipped TOML that drops the tier annotation will not be able to recurse past three hops. The harness will surface this as the `SpawnDepthExceeded` agent error.

Alternatively, if the runtime gate is partially implemented, clarify which parts are live vs. sketched.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@gitbooks/developing/architecture/agent-harness.md` around lines 199 - 204,
The runtime depth-gate description uses present-tense but the status note says
it's only sketched; update the wording to clearly mark the runtime enforcement
as planned/partial or describe which pieces are implemented vs. sketched:
mention MAX_SPAWN_DEPTH, the task-local counter sketch in
harness/fork_context.rs, and the gating in subagent_runner::run_subagent as
not-yet-fully-active (or list which of those are already implemented), while
keeping the loader-time enforcement (agents::loader::validate_tier_hierarchy and
the agent_tier field) identified as live and keep SpawnDepthExceeded referenced
as the intended surfaced error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@gitbooks/developing/architecture/agent-harness.md`:
- Around line 174-182: The fenced ASCII diagram block beginning with "Chat      
(fast, UX-focused — e.g. orchestrator on `chat` hint)" is missing a language
specifier; update the opening triple backticks to include a language (e.g.,
```text) so the diagram renders correctly, leaving the block contents unchanged.

In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Line 43: The text in the spawn hierarchy section asserts "Total chain depth is
capped at 3 hops by the harness" but the runtime enforcement is described
elsewhere as a planned gate (MAX_SPAWN_DEPTH = 3), so update the sentence in
prompt.md (the "Spawn hierarchy (hard rule)" line) to reflect that enforcement
is not yet live: either change to "will be capped at 3 hops" or append a
parenthetical note like "(enforcement tracked in `#XXXX` / planned via
MAX_SPAWN_DEPTH = 3)"; ensure references to MAX_SPAWN_DEPTH remain consistent
and add the issue/PR number if available.

---

Nitpick comments:
In `@gitbooks/developing/architecture/agent-harness.md`:
- Around line 199-204: The runtime depth-gate description uses present-tense but
the status note says it's only sketched; update the wording to clearly mark the
runtime enforcement as planned/partial or describe which pieces are implemented
vs. sketched: mention MAX_SPAWN_DEPTH, the task-local counter sketch in
harness/fork_context.rs, and the gating in subagent_runner::run_subagent as
not-yet-fully-active (or list which of those are already implemented), while
keeping the loader-time enforcement (agents::loader::validate_tier_hierarchy and
the agent_tier field) identified as live and keep SpawnDepthExceeded referenced
as the intended surfaced error.

In `@src/openhuman/agent/agents/planner/prompt.md`:
- Line 43: The phrasing incorrectly groups "no planner-spawns-orchestrator" with
reasoning→reasoning examples and may imply the orchestrator is a reasoning-tier
agent; update the sentence in the "You are the reasoning tier." paragraph to
state clearly that the reasoning tier may only spawn worker-tier agents
(code_executor, researcher, critic, integrations_agent, archivist) and not other
reasoning or chat-tier agents, and remove or reword the parenthetical so it does
not list "orchestrator" as an example of a reasoning agent (refer to the symbol
orchestrator and the worker names to locate the text to edit).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6076be2b-20f7-4cec-b08f-03e5547aaf68

📥 Commits

Reviewing files that changed from the base of the PR and between ac245a0 and 1d2efd4.

📒 Files selected for processing (14)
  • gitbooks/developing/architecture/agent-harness.md
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/mod.rs
  • src/openhuman/agent/agents/orchestrator/agent.toml
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/agents/planner/agent.toml
  • src/openhuman/agent/agents/planner/prompt.md
  • src/openhuman/agent/harness/builtin_definitions.rs
  • src/openhuman/agent/harness/definition.rs
  • src/openhuman/agent/harness/definition_tests.rs
  • src/openhuman/agent/harness/payload_summarizer.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/tools/orchestrator_tools.rs

Comment thread gitbooks/developing/architecture/agent-harness.md Outdated
Comment thread src/openhuman/agent/agents/orchestrator/prompt.md Outdated
Address CodeRabbit suggestions on PR tinyhumansai#2026:

- arch-doc ASCII diagram: add `text` language tag to the fenced block
  (markdownlint MD040).
- orchestrator/prompt.md, planner/prompt.md, agent-harness.md: soften
  "Total chain depth is capped at 3 hops by the harness" to reflect
  that the runtime `MAX_SPAWN_DEPTH` task-local is a planned
  follow-up; only the loader-time tier check is live today.
@senamakel senamakel merged commit 0257b2e into tinyhumansai:main May 18, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant