Skip to content

bro first-response latency — 18s with 21.7k token context, 0 MCP tool calls #47

@ZaxShen

Description

@ZaxShen

Observed

During live dogfood, bro's first response took ~18 seconds for a trivial onboarding-trigger message:

@"tmb:bro (agent)" I'm Zax
⏺ tmb:bro(TMB bro entry point)
  ⎿  Done (0 tool uses · 21.7k tokens · 18s)

Two signals in that line matter:

  • 21.7k tokens of context loaded per spawn
  • 0 MCP tool uses despite "I'm Zax" being a textbook identity_set trigger (separate correctness concern — noted for follow-up, not this issue's scope)

Hypotheses for the 18s latency, ordered by expected impact

H1. Model choice — Opus 4.7 for routing is overkill (highest expected win)

  • agents/bro.md frontmatter: model: opus
  • Routing, classification, and onboarding flow are not reasoning-heavy — Sonnet handles them just as well.
  • Opus first-token latency on 21k tokens is ~5-10x longer than Sonnet. Switching is a one-line change.
  • Estimated impact: 60-70% latency reduction. Should drop 18s → ~6s.

H2. Eager skill loading — 7 skills declared in frontmatter, all loaded on spawn

Bro declares all of these in skills: frontmatter:

first-run-onboarding, tmb-reonboard, lazy-regen-check,
project-prescan, branch-id-proposal, refresh-architecture, agent-creator

If CC loads them all on spawn (rather than on-trigger), that's likely 15-18k of the 21.7k tokens. A typical request only needs one of them.

  • Research question: does CC 2026 support on-demand skill loading via paths: or trigger: frontmatter? If yes, move 6 of the 7 skills to on-demand. first-run-onboarding is probably the only one that's genuinely eager.
  • Estimated impact: 40-60% context reduction; first-token latency scales with context so expect similar perf win.

H3. Chain-of-thought block mandated at response start

Bro's prompt requires a <chain_of_thought> block before any tool call. That's a reasoning preamble every response pays.

  • For trivial asks (conversational, onboarding answers), CoT may be more cost than benefit.
  • Fix candidate: exempt onboarding mode from CoT requirement ("skip it for one-liner acknowledgements or trivial lookups" is already in the prompt, but may not be honored).

H4. Agent spawn overhead itself

CC has to cold-start the agent on first invocation of a session. Some of the 18s is not bro's fault — it's CC's spawn round-trip. Hard to measure without instrumentation; likely 1-3s.

Suggested action order

  1. Switch bro to Sonnet (model: sonnet) — smallest change, largest expected win. Keep Architect on Opus (architectural reasoning actually needs it).
  2. Audit skill frontmatter for on-demand triggers — if CC supports it, only first-run-onboarding stays eager. Others load when their trigger condition hits.
  3. Measure — add timing to the next dogfood; look for Sonnet bro under 5s on the same trigger.
  4. Only if still slow: revisit CoT enforcement, prompt size, agent cold-start.

Acceptance criteria

  • bro first-response latency ≤ 6s on "I'm Zax" or equivalent onboarding trigger, measured on cold session.
  • Decision recorded on each hypothesis (kept / rejected / deferred) with evidence.
  • Sonnet bro produces the same correctness on SCENARIOS.md dogfood scenarios 1-6 (onboarding + routing + classification). If Sonnet fails any, revert and document why Opus is load-bearing.

Related observation (separate follow-up, not this issue's scope)

The 0 tool uses on "I'm Zax" — bro should have called identity_set(human_name='Zax') as part of the onboarding flow. That it didn't suggests first-run-onboarding skill didn't trigger, OR bro's prompt is drowning out the trigger. Worth investigating once the latency fix is in — if H1/H2 reduce context by 60%, this may self-resolve. If it persists at low-latency, file a separate correctness issue.

Surfaced during PR #41 dogfood.

Metadata

Metadata

Assignees

Labels

FeatureNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions