Observed
During live dogfood, bro's first response took ~18 seconds for a trivial onboarding-trigger message:
@"tmb:bro (agent)" I'm Zax
⏺ tmb:bro(TMB bro entry point)
⎿ Done (0 tool uses · 21.7k tokens · 18s)
Two signals in that line matter:
- 21.7k tokens of context loaded per spawn
- 0 MCP tool uses despite "I'm Zax" being a textbook
identity_set trigger (separate correctness concern — noted for follow-up, not this issue's scope)
Hypotheses for the 18s latency, ordered by expected impact
H1. Model choice — Opus 4.7 for routing is overkill (highest expected win)
agents/bro.md frontmatter: model: opus
- Routing, classification, and onboarding flow are not reasoning-heavy — Sonnet handles them just as well.
- Opus first-token latency on 21k tokens is ~5-10x longer than Sonnet. Switching is a one-line change.
- Estimated impact: 60-70% latency reduction. Should drop 18s → ~6s.
H2. Eager skill loading — 7 skills declared in frontmatter, all loaded on spawn
Bro declares all of these in skills: frontmatter:
first-run-onboarding, tmb-reonboard, lazy-regen-check,
project-prescan, branch-id-proposal, refresh-architecture, agent-creator
If CC loads them all on spawn (rather than on-trigger), that's likely 15-18k of the 21.7k tokens. A typical request only needs one of them.
- Research question: does CC 2026 support on-demand skill loading via
paths: or trigger: frontmatter? If yes, move 6 of the 7 skills to on-demand. first-run-onboarding is probably the only one that's genuinely eager.
- Estimated impact: 40-60% context reduction; first-token latency scales with context so expect similar perf win.
H3. Chain-of-thought block mandated at response start
Bro's prompt requires a <chain_of_thought> block before any tool call. That's a reasoning preamble every response pays.
- For trivial asks (conversational, onboarding answers), CoT may be more cost than benefit.
- Fix candidate: exempt onboarding mode from CoT requirement ("skip it for one-liner acknowledgements or trivial lookups" is already in the prompt, but may not be honored).
H4. Agent spawn overhead itself
CC has to cold-start the agent on first invocation of a session. Some of the 18s is not bro's fault — it's CC's spawn round-trip. Hard to measure without instrumentation; likely 1-3s.
Suggested action order
- Switch bro to Sonnet (
model: sonnet) — smallest change, largest expected win. Keep Architect on Opus (architectural reasoning actually needs it).
- Audit skill frontmatter for on-demand triggers — if CC supports it, only
first-run-onboarding stays eager. Others load when their trigger condition hits.
- Measure — add timing to the next dogfood; look for Sonnet bro under 5s on the same trigger.
- Only if still slow: revisit CoT enforcement, prompt size, agent cold-start.
Acceptance criteria
Related observation (separate follow-up, not this issue's scope)
The 0 tool uses on "I'm Zax" — bro should have called identity_set(human_name='Zax') as part of the onboarding flow. That it didn't suggests first-run-onboarding skill didn't trigger, OR bro's prompt is drowning out the trigger. Worth investigating once the latency fix is in — if H1/H2 reduce context by 60%, this may self-resolve. If it persists at low-latency, file a separate correctness issue.
Surfaced during PR #41 dogfood.
Observed
During live dogfood, bro's first response took ~18 seconds for a trivial onboarding-trigger message:
Two signals in that line matter:
identity_settrigger (separate correctness concern — noted for follow-up, not this issue's scope)Hypotheses for the 18s latency, ordered by expected impact
H1. Model choice — Opus 4.7 for routing is overkill (highest expected win)
agents/bro.mdfrontmatter:model: opusH2. Eager skill loading — 7 skills declared in frontmatter, all loaded on spawn
Bro declares all of these in
skills:frontmatter:If CC loads them all on spawn (rather than on-trigger), that's likely 15-18k of the 21.7k tokens. A typical request only needs one of them.
paths:ortrigger:frontmatter? If yes, move 6 of the 7 skills to on-demand.first-run-onboardingis probably the only one that's genuinely eager.H3. Chain-of-thought block mandated at response start
Bro's prompt requires a
<chain_of_thought>block before any tool call. That's a reasoning preamble every response pays.H4. Agent spawn overhead itself
CC has to cold-start the agent on first invocation of a session. Some of the 18s is not bro's fault — it's CC's spawn round-trip. Hard to measure without instrumentation; likely 1-3s.
Suggested action order
model: sonnet) — smallest change, largest expected win. Keep Architect on Opus (architectural reasoning actually needs it).first-run-onboardingstays eager. Others load when their trigger condition hits.Acceptance criteria
Related observation (separate follow-up, not this issue's scope)
The 0 tool uses on "I'm Zax" — bro should have called
identity_set(human_name='Zax')as part of the onboarding flow. That it didn't suggestsfirst-run-onboardingskill didn't trigger, OR bro's prompt is drowning out the trigger. Worth investigating once the latency fix is in — if H1/H2 reduce context by 60%, this may self-resolve. If it persists at low-latency, file a separate correctness issue.Surfaced during PR #41 dogfood.