bro first-response latency — 18s with 21.7k token context, 0 MCP tool calls

## Observed

During live dogfood, bro's first response took **~18 seconds** for a trivial onboarding-trigger message:

```
@"tmb:bro (agent)" I'm Zax
⏺ tmb:bro(TMB bro entry point)
  ⎿  Done (0 tool uses · 21.7k tokens · 18s)
```

Two signals in that line matter:
- **21.7k tokens** of context loaded per spawn
- **0 MCP tool uses** despite "I'm Zax" being a textbook `identity_set` trigger (separate correctness concern — noted for follow-up, not this issue's scope)

## Hypotheses for the 18s latency, ordered by expected impact

### H1. Model choice — Opus 4.7 for routing is overkill (highest expected win)

- `agents/bro.md` frontmatter: `model: opus`
- Routing, classification, and onboarding flow are not reasoning-heavy — Sonnet handles them just as well.
- Opus first-token latency on 21k tokens is ~5-10x longer than Sonnet. Switching is a one-line change.
- **Estimated impact:** 60-70% latency reduction. Should drop 18s → ~6s.

### H2. Eager skill loading — 7 skills declared in frontmatter, all loaded on spawn

Bro declares all of these in `skills:` frontmatter:
```
first-run-onboarding, tmb-reonboard, lazy-regen-check,
project-prescan, branch-id-proposal, refresh-architecture, agent-creator
```

If CC loads them all on spawn (rather than on-trigger), that's likely 15-18k of the 21.7k tokens. A typical request only needs one of them.

- **Research question:** does CC 2026 support on-demand skill loading via `paths:` or `trigger:` frontmatter? If yes, move 6 of the 7 skills to on-demand. `first-run-onboarding` is probably the only one that's genuinely eager.
- **Estimated impact:** 40-60% context reduction; first-token latency scales with context so expect similar perf win.

### H3. Chain-of-thought block mandated at response start

Bro's prompt requires a `<chain_of_thought>` block before any tool call. That's a reasoning preamble every response pays.

- For trivial asks (conversational, onboarding answers), CoT may be more cost than benefit.
- **Fix candidate:** exempt onboarding mode from CoT requirement ("skip it for one-liner acknowledgements or trivial lookups" is already in the prompt, but may not be honored).

### H4. Agent spawn overhead itself

CC has to cold-start the agent on first invocation of a session. Some of the 18s is not bro's fault — it's CC's spawn round-trip. Hard to measure without instrumentation; likely 1-3s.

## Suggested action order

1. **Switch bro to Sonnet** (`model: sonnet`) — smallest change, largest expected win. Keep Architect on Opus (architectural reasoning actually needs it).
2. **Audit skill frontmatter for on-demand triggers** — if CC supports it, only `first-run-onboarding` stays eager. Others load when their trigger condition hits.
3. **Measure** — add timing to the next dogfood; look for Sonnet bro under 5s on the same trigger.
4. **Only if still slow:** revisit CoT enforcement, prompt size, agent cold-start.

## Acceptance criteria

- [ ] bro first-response latency ≤ 6s on "I'm Zax" or equivalent onboarding trigger, measured on cold session.
- [ ] Decision recorded on each hypothesis (kept / rejected / deferred) with evidence.
- [ ] Sonnet bro produces the same correctness on SCENARIOS.md dogfood scenarios 1-6 (onboarding + routing + classification). If Sonnet fails any, revert and document why Opus is load-bearing.

## Related observation (separate follow-up, not this issue's scope)

The 0 tool uses on "I'm Zax" — bro should have called `identity_set(human_name='Zax')` as part of the onboarding flow. That it didn't suggests `first-run-onboarding` skill didn't trigger, OR bro's prompt is drowning out the trigger. Worth investigating once the latency fix is in — if H1/H2 reduce context by 60%, this may self-resolve. If it persists at low-latency, file a separate correctness issue.

Surfaced during PR #41 dogfood.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bro first-response latency — 18s with 21.7k token context, 0 MCP tool calls #47

Observed

Hypotheses for the 18s latency, ordered by expected impact

H1. Model choice — Opus 4.7 for routing is overkill (highest expected win)

H2. Eager skill loading — 7 skills declared in frontmatter, all loaded on spawn

H3. Chain-of-thought block mandated at response start

H4. Agent spawn overhead itself

Suggested action order

Acceptance criteria

Related observation (separate follow-up, not this issue's scope)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bro first-response latency — 18s with 21.7k token context, 0 MCP tool calls #47

Description

Observed

Hypotheses for the 18s latency, ordered by expected impact

H1. Model choice — Opus 4.7 for routing is overkill (highest expected win)

H2. Eager skill loading — 7 skills declared in frontmatter, all loaded on spawn

H3. Chain-of-thought block mandated at response start

H4. Agent spawn overhead itself

Suggested action order

Acceptance criteria

Related observation (separate follow-up, not this issue's scope)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions