Symptom (observed in PR #64 Layer 3 dogfood)
Bro fires 4–5 parallel Bash exploratory calls during tmb_project-prescan on a fresh project. Several of them fail-by-design on an empty repo (git log --oneline -5 exits 128 when there are no commits; ls .claude/skills/ errors when the dir doesn't exist), and CC's parallel runtime cancels the entire batch when any sibling fails — even the calls that would have succeeded standalone.
Bro then retries serially. If the retry batch also contains a fragile call, it gets cancelled too. Two cascading cancellations were observed in a single prescan, wasting ~9 tool-uses and consuming subagent context.
Verbatim trace (from #64 Mode B run, fresh /tmp/tmb-smoke)
⏺ Bash(git status) → Cancelled: parallel tool call git log errored
⏺ Bash(git log --oneline -5) → Error: Exit code 128 (no commits)
⏺ Bash(ls -1) → Cancelled: parallel tool call git log errored
⏺ Bash(find ...) → Cancelled
⏺ Bash(ls .claude/agents/ ...) → Cancelled
[bro retries serially]
⏺ Bash(git status) → succeeds
⏺ Bash(ls -1) → empty
⏺ Bash(find ...) → empty
⏺ Bash(ls .claude/agents/ ...) → Cancelled AGAIN (different transient failure)
Root cause
This is CC platform behavior (parallel-batch-cancellation on any sibling failure) interacting with prompt design that doesn't account for it. The fix is in our prompts.
Two fixes
Fix A — harden tmb_project-prescan per-call
In skills/tmb_project-prescan/SKILL.md, gate fragile calls behind a cheap probe:
# BEFORE (fragile — cancels whole batch on empty repo)
git status
git log --oneline -5
ls -1
ls .claude/agents/
find . -name "package.json"
# AFTER (probe first, branch on result, batch survivors)
HAS_COMMITS=$(git rev-parse HEAD 2>/dev/null && echo yes || echo no)
HAS_AGENTS_DIR=$([ -d .claude/agents ] && echo yes || echo no)
# Then batch only what's known-safe
if [ "$HAS_COMMITS" = yes ]; then git log --oneline -5; fi
if [ "$HAS_AGENTS_DIR" = yes ]; then ls .claude/agents/; fi
Or simpler: append || true to every prescan Bash call so non-zero exits don't cascade. Trade-off: loses the actual error info if something genuinely breaks. Probe-first is cleaner.
Fix B — global guidance for tmb_* skills
Add a new section to tmb_project-prescan, or a new top-level rule in CLAUDE.md:
Parallel-batching rule. Never batch a known-fragile call (anything that can exit non-zero on a valid project state — empty repo, missing dir, no remote, etc.) with other calls in the same assistant response. CC cancels the entire batch on any single failure. Either probe first, or run fragile things solo, or append `|| true` to absorb the exit code.
This applies to bro's prescan, lazy-regen-check, and any other skill that issues exploratory Bash batches.
Acceptance
Priority
Medium — doesn't break the chain (bro recovers serially), but bleeds context budget on every fresh-project run. Worth fixing before next dogfood retest.
Related
Surfaced during PR #64 Layer 3 dogfood verification. Same hypothesis applies to tmb_lazy-regen-check (does similar exploration in fresh-DB scenarios) — audit and apply same hardening.
Symptom (observed in PR #64 Layer 3 dogfood)
Bro fires 4–5 parallel Bash exploratory calls during
tmb_project-prescanon a fresh project. Several of them fail-by-design on an empty repo (git log --oneline -5exits 128 when there are no commits;ls .claude/skills/errors when the dir doesn't exist), and CC's parallel runtime cancels the entire batch when any sibling fails — even the calls that would have succeeded standalone.Bro then retries serially. If the retry batch also contains a fragile call, it gets cancelled too. Two cascading cancellations were observed in a single prescan, wasting ~9 tool-uses and consuming subagent context.
Verbatim trace (from #64 Mode B run, fresh
/tmp/tmb-smoke)Root cause
This is CC platform behavior (parallel-batch-cancellation on any sibling failure) interacting with prompt design that doesn't account for it. The fix is in our prompts.
Two fixes
Fix A — harden
tmb_project-prescanper-callIn
skills/tmb_project-prescan/SKILL.md, gate fragile calls behind a cheap probe:Or simpler: append
|| trueto every prescan Bash call so non-zero exits don't cascade. Trade-off: loses the actual error info if something genuinely breaks. Probe-first is cleaner.Fix B — global guidance for
tmb_*skillsAdd a new section to
tmb_project-prescan, or a new top-level rule in CLAUDE.md:This applies to bro's prescan, lazy-regen-check, and any other skill that issues exploratory Bash batches.
Acceptance
tmb_project-prescanupdated with probe-first pattern (Fix A).tmb_project-prescanbody includes the parallel-batching guidance (Fix B).Priority
Medium — doesn't break the chain (bro recovers serially), but bleeds context budget on every fresh-project run. Worth fixing before next dogfood retest.
Related
Surfaced during PR #64 Layer 3 dogfood verification. Same hypothesis applies to
tmb_lazy-regen-check(does similar exploration in fresh-DB scenarios) — audit and apply same hardening.