Replies: 1 comment
-
超出预期的诚实 — 关于"读源代码,而不是文档"的真知灼见@xg-gh-25 这篇讨论证实了一个我在 AI 系统设计中反复遇到但很少有人敢直言的问题:CLIs 和 SDKs 是完全不同的东西,但大多数工具设计者都装作它们一样。 你发现的两个 bug 反映的更深层问题Bug #1:maxTurns=100 的无声失败 你的代码为每个会话动态计算 token budget,这表明你已经学会了:每个部署场景都需要不同的默认值。 但 Claude Code SDK 用一个默认值覆盖所有场景。 Bug #2:task_budget=128K 的隐形压缩 你提到 1M 上下文模型,而 128K task budget 意味着:agent 只用了可用窗口的 12.8% 就被迫遗忘。这就像给一个有完整图书馆的研究人员说"你只能一次看 12 本书"。 对这两个 bug 修复的三个观察1. 你不仅修复了,还建立了防御层 if result.subtype == "error_max_turns":
yield {"type": "turn_limit_reached", "content": result.content}这不是修补。这是重新分类:把"错误"变成"状态转换"。告诉前端"不是崩溃,是暂停,用户可以继续"。这个转换对用户体验的影响是 10 倍的。 2. 你的 task_budget override 很保守,这很聪慧
而不是"没有限制"。这表明:you understand that API cost is real,但 128K 作为默认值是荒谬的。 3. 你做了 SDK 消费者应该必做但几乎没人做的事 文档化了这个发现并把它变成了团队知识:
这个一行关键句胜过 100 行设计文档。它告诉每个后来者:"读源代码。文档会骗你。" "Read the source, not the docs" — 这个原则为什么这么重要你列出的"Hidden Default Matrix"是金矿:
任何 SDK 消费者在生产前都应该问一个魔鬼问题清单:
你的 maxTurns 和 task_budget 是这个清单的完美答案。 一个更大的建筑问题你提到了这个:
但我想进一步:Design team 没有充分的动机去发现这个区别。 为什么?因为:
这不是任何人的错。这是激励结构的失败。SDK 可靠性和 CLI UX 被优化的不是同一个目标。 建议(既然你已经修复了)1. 向 Anthropic 提交 issue / Discussion 不只是 bug 报告,而是设计建议:
2. 发布一个 SDK 消费者的最佳实践指南 基于你在 SwarmAI 中发现和修复的:
3. 把这个问题变成测试 在 CI 中:
这样下一个无人值守 agent 系统的维护者就能说"我已经在这个坑里跌过了"。 Context(给读这个的人)这不是 Claude Code 的批判。Claude Code 很好。这是关于:当一个为交互式场景设计的工具被当作 SDK 使用时,默认值会变成地雷。任何这样的工具都会有这个问题。 Anthropic 能通过一件事解决这个:在 SDK 文档中有一节标题是"Interactive CLI 和 Autonomous SDK 的默认值有所不同",后面列出所有需要 override 的参数。不是"读源代码找到隐藏的默认值",而是"这是文档中第一页"。 你已经做了这项工作。Anthropic 应该标准化这个。 这一切的讽刺:你用来修复这个问题的系统(SwarmAI 的 context loader)再现了相同的模式——动态 token budget、profile-aware 配置、显式 override。你不只是修复了 bug,你还学会了如何设计避免它的系统。 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Your AI agent works perfectly for 18 minutes — tools calling, code flowing, pipeline running beautifully. Then suddenly: "Interrupted." All content gone. No warning. No graceful degradation.
We just spent a full debugging session on two P0 production bugs caused by undocumented CLI defaults in the Claude Code SDK. This post shares what we found, how we fixed it, and what every SDK consumer should know.
Bug #1: The Invisible Turn Limit (
maxTurns=100)Symptom: Our autonomous pipeline (EVALUATE→BUILD→REVIEW→TEST→DELIVER) ran perfectly for ~18 minutes. At turn 101, the CLI subprocess exited with
is_error=True, subtype="error_max_turns". Frontend displayed "Interrupted" and the user lost all visible progress.Root cause: Claude Code CLI defaults to
maxTurns=100. This limit:claude --helpFor interactive terminal use (where a human can type
/continue), 100 turns is generous. For SDK consumers running autonomous pipelines, it's a landmine.The fix:
Plus graceful handling when the limit IS hit:
Bug #2: The Context Compaction Trap (
task_budget=128K)Symptom: During deep investigations, the agent would suddenly "forget" everything it had discovered mid-task. It would re-read files it already analyzed, re-ask questions it already answered, and lose its chain of reasoning.
Root cause: Claude Code CLI defaults
task_budget=128Ktokens. When a single user→agent interaction chain exceeds this budget, the CLI triggersautoCompact— summarizing the conversation to free space. The agent loses granular context and starts over from a compressed summary.On a 1M context window model, compacting at 128K means you're using 12.8% of available capacity before forced compression. For complex tasks (multi-file refactors, pipeline runs, deep research), 128K is easily exceeded in a single interaction.
The fix:
Why not unlimited? Runaway cost protection. 800K still leaves 200K headroom, and users have a stop button. Channel sessions use 400K because they're unattended —
max_turns=15is the primary safety there.The Broader Problem: SDK ≠ CLI
Claude Code was designed as an interactive terminal tool. The defaults make sense for that context:
But when you use Claude Code as an SDK (subprocess driving autonomous agents), these defaults become invisible failure modes:
is_error=True) with no machine-readable distinction--helpdocumentation of these parametersThe Hidden Default Matrix
maxTurnstask_budgetautoCompactOverride Methods (discovered by reading source)
maxTurnsClaudeAgentOptions.max_turnsorCLAUDE_CODE_MAX_TURNSenvtask_budgetClaudeAgentOptions.task_budgetor--task-budgetflagautoCompactCLAUDE_AUTOCOMPACT_PCT_OVERRIDEenvLessons for Anyone Building on Claude Code SDK
1. Read the source, not the docs. The SDK's true behavior lives in the implementation.
--helpshows you flags, not defaults. The most dangerous parameters are the ones that aren't listed.2. Interactive defaults ≠ autonomous defaults. Any time you're wrapping a CLI tool designed for humans into an autonomous pipeline, audit every implicit assumption. Timeouts, limits, and safety nets designed for "user can intervene" become silent killers in "unattended agent" mode.
3. Error semantics matter.
is_error=Trueis not granular enough. Our fix distinguisheserror_max_turnsfrom real errors — the former is a graceful pause, not a failure. Treating all non-success as "broken" loses user trust.4. Persist streaming state. If your agent streams results over 18 minutes and the transport can break, you need a checkpoint mechanism. We now persist content to
sessionStorageevery 10 seconds — if the stream breaks, a page refresh recovers everything.5. Test at scale, not at demo. 100 turns and 128K tokens? You'll never hit those in a 3-minute demo. You'll hit them every time your agent does real work. Test with your actual pipelines, not toy examples.
What We'd Like to See
For Anthropic / Claude Code maintainers:
error_max_turnsis great, but make it a first-class field, not something we reverse-engineer fromsubtypeContext
We're building SwarmAI — a personal AI command center that runs Claude Code as its execution engine. Our autonomous pipeline regularly runs 150+ turns per task. These bugs were invisible until production load exposed them.
The fixes shipped in commits
e2e604cathrough004c2c16(6 commits, 2 rounds of adversarial review, 95 tests passing). Full technical details in our KNOWLEDGE.md.If you're building on Claude Code SDK and hit similar issues, check your
maxTurnsandtask_budget. The defaults were made for a different use case.Beta Was this translation helpful? Give feedback.
All reactions