Multi-Agent is a coordination tax — why we chose Multi-Skill instead #12

xg-gh-25 · 2026-05-18T05:01:36Z

xg-gh-25
May 18, 2026
Maintainer

"Coordination is a tax on limited cognition. Don't pay taxes you don't owe." — KD29

The Industry Consensus (and why we disagree)

The AI agent industry has converged on multi-agent as the architecture for complex tasks:

CrewAI: Define agents with roles (researcher, writer, reviewer), orchestrate them
LangGraph: Build DAGs of agent nodes passing messages
AutoGen: Multiple agents conversing in group chat
gstack (Garry Tan): 23 "specialists" — CEO, Designer, Eng Manager, QA...
OpenClaw (Claude Code): Spawns sub-agents via Task tool

The pitch: "Division of labor works for humans, so it works for AI."

We chose the opposite. One agent, multiple skills, role-switching within a single context. Here's why.

The Coordination Tax

Every time you split work across agents, you pay:

Tax	Cost	Example
Context transfer	Lost nuance	Agent A knows WHY the user wants X. Agent B only sees "do X."
State sync	Race conditions	Agent A changes a file. Agent B reads the old version.
Handoff overhead	Time + tokens	Summarizing context for the next agent = re-processing work already done.
Error propagation	Cascading failures	Agent A makes a wrong assumption. Agents B, C, D build on it.
Coordination protocol	Complexity	Who goes first? Who resolves conflicts? Who owns the final output?

These aren't implementation details you can engineer away. They're structural properties of distributed systems. The CAP theorem applies to agents too.

What Multi-Agent Actually Solves

Multi-agent genuinely helps when:

Parallel execution — tasks with zero dependency can run simultaneously
Isolation — one agent's failure shouldn't corrupt another's state
Resource limits — one context window isn't enough for the full problem

For everything else, it's overhead disguised as architecture.

Our Alternative: Multi-Skill Orchestration

ONE agent + MANY skills + ROLE-SWITCHING within pipeline stages

EVALUATE (judgment role) → THINK (researcher role) → PLAN (architect role) →
BUILD (engineer role) → REVIEW (adversarial reviewer role) → TEST (QA role) →
DELIVER (release engineer role) → REFLECT (retrospective role)

Same "virtual team" as gstack's 23 specialists. But:

Zero context transfer cost — the agent switching from "builder" to "reviewer" already knows everything
Zero state sync — one process, one filesystem view, one git state
Zero handoff — no summarization needed, full conversation history preserved
Adversarial review still works — we spawn sub-agents ONLY for review (fresh context = genuine second opinion)

When We DO Use Multiple Agents

The exception proves the rule. We use sub-agents for exactly ONE thing: adversarial review.

Why? Because the builder's context IS the blind spot. A reviewer who read the same conversation has the same blind spots. A fresh-context reviewer catches what self-review structurally cannot.

But this is "multiple contexts for review" — not "multiple agents coordinating on one task."

The gstack Observation

Garry Tan's gstack is interesting because it looks like multi-agent but isn't:

/plan-ceo-review    → Claude in "CEO" role, same context
/review             → Claude in "reviewer" role, same context
/qa                 → Claude in "QA" role, same context
/ship               → Claude in "release eng" role, same context

It's role-based prompting within one agent. The "23 specialists" are 23 system prompts, not 23 processes. This is exactly our architecture — skills = roles, pipeline stages = role transitions, one agent = one context.

The branding says "virtual team." The architecture says "single agent, multiple hats."

The Deeper Principle

Division of labor is a compromise for limited human cognitive bandwidth, not an optimal design.

Humans need teams because one person can't hold all context. AI agents with 1M context windows DON'T have this limitation. Splitting them up re-introduces the exact problem (limited context) that the technology solved.

Every industry trend eliminates handoffs:

Full-stack > frontend + backend
DevOps > dev + ops
Cross-functional pods > siloed departments

Multi-agent frameworks go the opposite direction. They re-introduce handoffs artificially.

When Multi-Agent WILL Win (future)

Our position changes if:

Models get shared real-time memory (one agent writes, another reads instantly)
Context windows shrink (forcing distribution)
Tool-use becomes blocking (parallel execution matters more)

Until then: one agent, many skills, role-switching. Pay for coordination only where it's structurally unavoidable (adversarial review).

Questions

Is the "one agent, many roles" approach actually scalable? What breaks at 100 skills?
Does gstack's success (98K stars) validate single-agent-multi-role, or is it just good marketing?
When CrewAI/LangGraph users say "multi-agent works for us" — are they solving a real coordination problem, or cargo-culting human org design?
Is there a task complexity threshold where multi-agent genuinely outperforms? (Our hypothesis: only when tasks require >1M tokens of context total.)

Our architecture: 61 skills, 1 agent, 8 pipeline stages, zero coordination overhead. SwarmAI

Wyifei · 2026-05-22T06:36:28Z

Wyifei
May 22, 2026

单agent跑多个step，每个step通过skill渐进式加载，但这种方式也会带来上下文腐坏的问题，应该如何去解决呢

0 replies

xg-gh-25 · 2026-05-22T10:38:58Z

xg-gh-25
May 22, 2026
Maintainer Author

好问题。上下文腐坏（context corruption）是单 Agent 长链路执行的核心挑战 —— 我们在生产中确实遇到过，这里分享实际的应对机制。

"上下文腐坏" 的三种形态

形态	表现	例子
陈旧假设	Step 1 的观察到了 Step 5 已经过期	"这个文件有 200 行" —— 但 Step 3 改了它
注意力衰减	窗口前端的指令被后面的内容淹没	Skill 的 INSTRUCTIONS.md 被 3000 行工具输出覆盖
幻觉叠加	早期 step 的小推测在后续 step 被当作事实	"应该有个 config.yaml" → 后续围绕不存在的文件规划

我们的 5 层防御（生产验证过的）

1. Skill 即时加载 + 即时遗忘

Skill 不是"一次性全部灌入"。Lazy skill 只在调用时加载 INSTRUCTIONS.md，执行完毕后这些指令随自然对话流向上滚动。关键：skill 不往 system prompt 里堆积 —— system prompt 是固定的 11 个 context 文件，skill 指令在 conversation turns 里，自然参与 compaction。

2. SDK Auto-Compaction（最关键的一层）

Claude Agent SDK 在 context 接近容量时触发自动压缩 —— 保留 system prompt + 最近 N 轮 + 结构化摘要替代早期轮次。这意味着 Step 1 的 3000 行 grep 输出不会永远占着窗口 —— 它会被压缩成 "搜索了 X，发现了 Y" 的摘要。

但压缩有风险：如果 Step 1 的一个关键细节在压缩时丢了，后续 step 就在错误假设上工作。我们的应对：

3. Pipeline Stage 隔离 + 显式交接物

我们的 Pipeline（EVALUATE → PLAN → BUILD → REVIEW → DELIVER）每个阶段产出显式 artifact 写入文件系统，而不是依赖对话记忆：

PLAN → .artifacts/plan.md（写到磁盘）
BUILD → 读 plan.md（从磁盘，不是从记忆）
REVIEW → 读实际代码（从磁盘，不是从 BUILD 的记忆）

每个 stage 的 "上下文" 来自文件系统状态，不是对话历史。对话可以被压缩、可以丢失细节 —— 但磁盘上的 artifact 是精确的。

4. 验证 > 推理（P1 原则）

这是我们 SOUL.md 里的认知原则：Agent 对自己记忆的信任度应该低于对文件系统的信任度。

实操：每个 step 开始时，如果需要上一步的产出，不要回忆 —— 重新 Read。这看起来浪费 token，但消除了陈旧假设。比如 REVIEW 阶段不信任 BUILD 阶段说 "我改了 X" —— 它用 git diff 验证实际改了什么。

5. 长链路超过阈值 → 新 Session

当一个任务跑到 context 的 ~70%（约 700K tokens 对于 1M 模型），我们做 session checkpoint + resume：

把当前进度写入结构化 checkpoint（最近请求、关键发现、未完成步骤）
在新 session 恢复，注入 ~50-100K tokens 的结构化上下文
新 session 的 context 是干净的 —— 没有早期 step 的噪音

诚实回答：单 Agent 上下文腐坏 vs Multi-Agent 上下文丢失

这是一个 trade-off，不是一个已解决的问题：

	单 Agent + Multi-Skill	Multi-Agent
腐坏风险	高（长对话积累噪音）	低（每个 Agent 独立窗口）
丢失风险	低（全在同一窗口）	高（handoff 摘要必然丢细节）
修复成本	低（Agent 自己能回看）	高（需要重新问上游 Agent）

我们选了 "腐坏但可自修" 而不是 "丢失且不可逆"。关键词是可观察性 —— 当上下文在同一个窗口里，Agent 至少有机会发现矛盾（"等等，我刚才说文件有 200 行，但现在 git diff 显示 250 行"）。跨 Agent 的丢失是静默的 —— Agent B 不知道 Agent A 知道什么。

最后：什么情况下应该切换到 Multi-Agent

如果你的任务满足以下条件，Multi-Agent 可能更合适：

单 step 输出 > 50K tokens（一个 Agent 窗口放不下）
Steps 之间真正零依赖（可并行）
需要不同的 model/temperature/tool 配置

否则，单 Agent + 上面 5 层防御 + 诚实的 "checkpoint and restart" 在我们的经验里比 Multi-Agent 更可靠。

Related:

Design Philosophy — Six Pillars（第二支柱 "验证 > 推理" 就是应对腐坏的核心原则）
Agent Memory Architecture（4 层记忆系统的完整实现）

0 replies

xg-gh-25 · 2026-05-22T10:55:53Z

xg-gh-25
May 22, 2026
Maintainer Author

一句话总结： 对话窗口是易腐品，文件系统是确定性桥梁。

单 Agent 的上下文腐坏问题，本质上通过 "每步写磁盘、下步从磁盘读" 来解决 —— 文件系统承担了跨步骤的状态传递职责，对话记忆只负责 "当前在想什么"，不负责 "之前确认了什么"。

这样，单 Agent 偷到了 Multi-Agent 的显式状态传递优势（精确、不丢），同时保留了单窗口的全局视野（能发现矛盾、能回看上下文）。两个世界的好处，不付协调税。

0 replies

xg-gh-25 · 2026-05-25T08:08:04Z

xg-gh-25
May 25, 2026
Maintainer Author

好奇你实际在跑的场景是什么？比如：

是一个 pipeline 串联 5+ 个 skill（类似 evaluate → build → review → test）
还是单次对话里反复调用不同工具，但累计 token 很深
还是长时间挂着的 daemon 任务（几小时级别）

不同场景下腐坏的主要形态不一样，对应的防御优先级也不同。如果你在跑前者（pipeline 串联），我们最近刚上了一个 checkpoint + resume 机制专门解决跨 stage 状态丢失 —— 有兴趣可以展开聊聊具体实现。

0 replies

MertEnesYurtseven · 2026-06-14T16:09:37Z

MertEnesYurtseven
Jun 14, 2026

Your breakdown of the coordination tax (context transfer, state sync, handoff overhead, error propagation, coordination protocol) is precise. But these are properties of message-passing architectures — not multi-agent systems per se.

An alternative: agents never message each other. They read from and write to a single versioned causal tree. Evidence traces are append-only and content-addressed (SHA-256), so there is no context transfer — Agent B reads the same branch Agent A wrote to. The orient pass (an automated cleanup phase between investigation rounds) merges near-duplicate branches, detects contradictions, re-parents misattributed work, and flags orphans — covering handoff and error propagation without any agent-to-agent coordination.

structura implements this architecture: https://github.com/MertEnesYurtseven/structura

How do you handle the equivalent of orient between EVALUATE→PLAN→BUILD stages in SwarmAI's pipeline — is that purely human-mediated, or is there an automated cleanup step?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Agent is a coordination tax — why we chose Multi-Skill instead #12

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi-Agent is a coordination tax — why we chose Multi-Skill instead #12

Uh oh!

xg-gh-25 May 18, 2026 Maintainer

The Industry Consensus (and why we disagree)

The Coordination Tax

What Multi-Agent Actually Solves

Our Alternative: Multi-Skill Orchestration

When We DO Use Multiple Agents

The gstack Observation

The Deeper Principle

When Multi-Agent WILL Win (future)

Questions

Replies: 5 comments

Uh oh!

Wyifei May 22, 2026

Uh oh!

xg-gh-25 May 22, 2026 Maintainer Author

"上下文腐坏" 的三种形态

我们的 5 层防御（生产验证过的）

1. Skill 即时加载 + 即时遗忘

2. SDK Auto-Compaction（最关键的一层）

3. Pipeline Stage 隔离 + 显式交接物

4. 验证 > 推理（P1 原则）

5. 长链路超过阈值 → 新 Session

诚实回答：单 Agent 上下文腐坏 vs Multi-Agent 上下文丢失

最后：什么情况下应该切换到 Multi-Agent

Uh oh!

xg-gh-25 May 22, 2026 Maintainer Author

Uh oh!

xg-gh-25 May 25, 2026 Maintainer Author

Uh oh!

MertEnesYurtseven Jun 14, 2026

xg-gh-25
May 18, 2026
Maintainer

Wyifei
May 22, 2026

xg-gh-25
May 22, 2026
Maintainer Author

xg-gh-25
May 22, 2026
Maintainer Author

xg-gh-25
May 25, 2026
Maintainer Author

MertEnesYurtseven
Jun 14, 2026