Replies: 5 comments
-
|
单agent跑多个step,每个step通过skill渐进式加载,但这种方式也会带来上下文腐坏的问题,应该如何去解决呢 |
Beta Was this translation helpful? Give feedback.
-
|
好问题。上下文腐坏(context corruption)是单 Agent 长链路执行的核心挑战 —— 我们在生产中确实遇到过,这里分享实际的应对机制。 "上下文腐坏" 的三种形态
我们的 5 层防御(生产验证过的)1. Skill 即时加载 + 即时遗忘Skill 不是"一次性全部灌入"。Lazy skill 只在调用时加载 INSTRUCTIONS.md,执行完毕后这些指令随自然对话流向上滚动。关键:skill 不往 system prompt 里堆积 —— system prompt 是固定的 11 个 context 文件,skill 指令在 conversation turns 里,自然参与 compaction。 2. SDK Auto-Compaction(最关键的一层)Claude Agent SDK 在 context 接近容量时触发自动压缩 —— 保留 system prompt + 最近 N 轮 + 结构化摘要替代早期轮次。这意味着 Step 1 的 3000 行 grep 输出不会永远占着窗口 —— 它会被压缩成 "搜索了 X,发现了 Y" 的摘要。 但压缩有风险:如果 Step 1 的一个关键细节在压缩时丢了,后续 step 就在错误假设上工作。我们的应对: 3. Pipeline Stage 隔离 + 显式交接物我们的 Pipeline(EVALUATE → PLAN → BUILD → REVIEW → DELIVER)每个阶段产出显式 artifact 写入文件系统,而不是依赖对话记忆: 每个 stage 的 "上下文" 来自文件系统状态,不是对话历史。对话可以被压缩、可以丢失细节 —— 但磁盘上的 artifact 是精确的。 4. 验证 > 推理(P1 原则)这是我们 SOUL.md 里的认知原则:Agent 对自己记忆的信任度应该低于对文件系统的信任度。 实操:每个 step 开始时,如果需要上一步的产出,不要回忆 —— 重新 Read。这看起来浪费 token,但消除了陈旧假设。比如 REVIEW 阶段不信任 BUILD 阶段说 "我改了 X" —— 它用 5. 长链路超过阈值 → 新 Session当一个任务跑到 context 的 ~70%(约 700K tokens 对于 1M 模型),我们做 session checkpoint + resume:
诚实回答:单 Agent 上下文腐坏 vs Multi-Agent 上下文丢失这是一个 trade-off,不是一个已解决的问题:
我们选了 "腐坏但可自修" 而不是 "丢失且不可逆"。关键词是可观察性 —— 当上下文在同一个窗口里,Agent 至少有机会发现矛盾("等等,我刚才说文件有 200 行,但现在 git diff 显示 250 行")。跨 Agent 的丢失是静默的 —— Agent B 不知道 Agent A 知道什么。 最后:什么情况下应该切换到 Multi-Agent如果你的任务满足以下条件,Multi-Agent 可能更合适:
否则,单 Agent + 上面 5 层防御 + 诚实的 "checkpoint and restart" 在我们的经验里比 Multi-Agent 更可靠。 Related:
|
Beta Was this translation helpful? Give feedback.
-
|
一句话总结: 对话窗口是易腐品,文件系统是确定性桥梁。 单 Agent 的上下文腐坏问题,本质上通过 "每步写磁盘、下步从磁盘读" 来解决 —— 文件系统承担了跨步骤的状态传递职责,对话记忆只负责 "当前在想什么",不负责 "之前确认了什么"。 这样,单 Agent 偷到了 Multi-Agent 的显式状态传递优势(精确、不丢),同时保留了单窗口的全局视野(能发现矛盾、能回看上下文)。两个世界的好处,不付协调税。 |
Beta Was this translation helpful? Give feedback.
-
|
好奇你实际在跑的场景是什么?比如:
不同场景下腐坏的主要形态不一样,对应的防御优先级也不同。如果你在跑前者(pipeline 串联),我们最近刚上了一个 checkpoint + resume 机制专门解决跨 stage 状态丢失 —— 有兴趣可以展开聊聊具体实现。 |
Beta Was this translation helpful? Give feedback.
-
|
Your breakdown of the coordination tax (context transfer, state sync, handoff overhead, error propagation, coordination protocol) is precise. But these are properties of message-passing architectures — not multi-agent systems per se. An alternative: agents never message each other. They read from and write to a single versioned causal tree. Evidence traces are append-only and content-addressed (SHA-256), so there is no context transfer — Agent B reads the same branch Agent A wrote to. The orient pass (an automated cleanup phase between investigation rounds) merges near-duplicate branches, detects contradictions, re-parents misattributed work, and flags orphans — covering handoff and error propagation without any agent-to-agent coordination. structura implements this architecture: https://github.com/MertEnesYurtseven/structura How do you handle the equivalent of orient between EVALUATE→PLAN→BUILD stages in SwarmAI's pipeline — is that purely human-mediated, or is there an automated cleanup step? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The Industry Consensus (and why we disagree)
The AI agent industry has converged on multi-agent as the architecture for complex tasks:
The pitch: "Division of labor works for humans, so it works for AI."
We chose the opposite. One agent, multiple skills, role-switching within a single context. Here's why.
The Coordination Tax
Every time you split work across agents, you pay:
These aren't implementation details you can engineer away. They're structural properties of distributed systems. The CAP theorem applies to agents too.
What Multi-Agent Actually Solves
Multi-agent genuinely helps when:
For everything else, it's overhead disguised as architecture.
Our Alternative: Multi-Skill Orchestration
Same "virtual team" as gstack's 23 specialists. But:
When We DO Use Multiple Agents
The exception proves the rule. We use sub-agents for exactly ONE thing: adversarial review.
Why? Because the builder's context IS the blind spot. A reviewer who read the same conversation has the same blind spots. A fresh-context reviewer catches what self-review structurally cannot.
But this is "multiple contexts for review" — not "multiple agents coordinating on one task."
The gstack Observation
Garry Tan's gstack is interesting because it looks like multi-agent but isn't:
It's role-based prompting within one agent. The "23 specialists" are 23 system prompts, not 23 processes. This is exactly our architecture — skills = roles, pipeline stages = role transitions, one agent = one context.
The branding says "virtual team." The architecture says "single agent, multiple hats."
The Deeper Principle
Humans need teams because one person can't hold all context. AI agents with 1M context windows DON'T have this limitation. Splitting them up re-introduces the exact problem (limited context) that the technology solved.
Every industry trend eliminates handoffs:
Multi-agent frameworks go the opposite direction. They re-introduce handoffs artificially.
When Multi-Agent WILL Win (future)
Our position changes if:
Until then: one agent, many skills, role-switching. Pay for coordination only where it's structurally unavoidable (adversarial review).
Questions
Our architecture: 61 skills, 1 agent, 8 pipeline stages, zero coordination overhead. SwarmAI
Beta Was this translation helpful? Give feedback.
All reactions