π§ chore: latency reduction β proposals on review#64
Merged
Conversation
Branch awaits Human approval of the latency-reduction proposals. See TMB/bro/DISCUSSION.md (workspace-local) for the full breakdown: - Latency analysis from the verified Layer 3 Mode B run that closed in #63 - 5 candidate optimizations ranked by impact-to-risk ratio - Recommended ship plan (Rounds 1β3) - Three explicit decisions requested from Zax before code lands This commit creates the branch + PR shell. Real changes ship after review approval β no template/skill/CLAUDE.md edits in this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦vial edits Implements the Round 1 latency-reduction plan from #64's DISCUSSION.md. Per Zax's review of the proposals (DISCUSSION.md in TMB workspace): ### Doctrine changes **Bro is the task gate.** Bro auto-flips a task to `status='closed'` as soon as SWE returns with `status='completed'` and a `commit_sha`. No pr-reviewer at task close. **pr-reviewer is the push gate.** It fires only at `git push` time over the batch of unsigned commits about to ship. Triggered by the `git-push-guard.sh` PreToolUse hook + the Human invoking the magic phrase `@bro review before push`. Cost amortized across multiple tasks per push instead of paid per task. **Direct Mode for trivial single-file changes.** Bro is now allowed to Edit a file directly when ALL of: single file, β€3 lines diff, no public API change, no test required, no docs/trustmybot/architecture/ touched. Logged as `direct_mode_used` in the ledger. Anything bigger falls back to the default chain. ### File changes - **CLAUDE.md** β rewrote `## Role`, `## First-action chain`, `## Code-touching asks`, `## Routing`. Dropped the bootstrap-check step (onboarding now handles template copy inline). Added `## Direct Mode` and `## Push gate` sections. Catchphrase now bound to the push gate's pass verdict. - **skills/tmb_architect-workflow/SKILL.md** β rewrote the main sequence: bro flips task β closed immediately after SWE returns. No pr-reviewer step. Added explicit instruction: emit `task_create_batch` + `Task(swe)` + `ledger_log(planning_complete)` as multiple tool_use blocks in one assistant response so CC executes them concurrently. Same change to the simple-fast-lane handoff step. - **skills/tmb_first-run-onboarding/SKILL.md** β added Step 5: silently copy templates/agents/{swe,pr-reviewer}.md + templates/skills/* into the project as part of onboarding's mandatory write sequence. No extra AskUserQuestion. Logs the same `tmb_bootstrap_complete` event for audit consistency. Frontmatter `allowed-tools` extended with `Read, Write, ledger_log`. - **skills/tmb_bootstrap/SKILL.md** β relegated to RECOVERY skill. Triggers only on the rare edge case where identity is set but `.claude/agents/swe.md` is missing (e.g. user hand-deleted between sessions). Onboarding handles the normal path now. - **templates/agents/swe.md** β dropped the chain-of-thought block. Removed `swe-checklist` from eager `skills:` (now lazy-load via Skill tool only when verification needs interpretation). Instructed to batch `task_get` + `git worktree add` in the first response so they run concurrently. Net 22 β 21 lines. - **templates/agents/pr-reviewer.md** β reframed as the push gate. Body explicitly notes it fires over a batch of unsigned tasks, not per individual task close. Skills array empty (project attaches review skills via tmb_skill-creator). - **scripts/hooks/git-push-guard.sh (NEW)** β replaces `require-review-sign.sh`. PreToolUse on Bash matching `git push`. Uses `git log @{u}..HEAD` to determine commits being pushed; queries the trajectory DB for tasks with matching commit_sha that lack a `validation_attempts.verdict='pass'` row. Blocks with a structured message naming the specific unsigned tasks and instructing `@bro review before push`. - **scripts/hooks/lib/query-task.sh** β `tmb_unsigned_tasks` updated: now queries `status='closed' AND commit_sha IS NOT NULL` (was `status='completed'`). Reflects the new doctrine where bro auto- closes immediately and pr-reviewer fires later. - **scripts/hooks/require-review-sign.sh + test** β DELETED, replaced by git-push-guard.sh. - **hooks/hooks.json** β swapped `require-review-sign.sh` β `git-push-guard.sh` in the PreToolUse Bash matcher. - **docs/PERFORMANCE.md (NEW)** β records target, baseline (#63 12-min run), Tier 1/2/3 doctrine of what's safe to trim, and the changes shipped here. Re-evaluation triggers spelled out. - **docs/architecture/FLOWS.md** β Flow 2 (simple task) sequence diagram redrawn for the new chain (parallel batched writes, bro closes inline, no pr-reviewer step). Flow 6 (PR review) reframed as the push gate flow with parallel pr-reviewer spawns over multiple unsigned tasks. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS Lego cap (β€30 lines templates/): PASS (max 26) MCP server unit + integration: PASS Hook tests (git-guards, task-spec): PASS ``` The deleted require-review-sign.sh test file went away cleanly; git- push-guard.sh has no dedicated test yet (Layer 3 dogfood will exercise it via real `git push`). ### Expected impact (estimate, to be verified in Layer 3) - Per-task time: ~12 min β ~3β5 min on simple-triage path - Direct Mode trivial edits: ~10β20s, near pure-Claude - Per-push pr-reviewer pass: ~2β3 min flat regardless of unsigned-task count (parallel spawns) Layer 3 verification next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
Round 1 implementation pushed (`b2bc2ac`)Per Zax's review of `TMB/bro/DISCUSSION.md` (workspace-local), the Round 1 changes are in. PR is now ready for review. What changed (vs the original 5 proposals)
File changes
Layer 1 + 2 status: GREEN`require-review-sign.sh` test file was deleted alongside the hook. `git-push-guard.sh` has no dedicated unit test yet β its behavior is exercised by Layer 3 dogfood (next step). Expected impact
Layer 3 verification planRe-run the same `@bro write a cli todo by python` scenario from the #63 baseline in a fresh Mode B session. Compare line-by-line to the #63 timeline (`docs/PERFORMANCE.md` records the methodology). Specific scenarios to walk:
π€ Generated with Claude Code |
3 tasks
β¦cascade (closes #66) Observed in PR #64 Layer 3 dogfood on a fresh empty repo: bro fired 4β5 parallel Bash exploratory calls during tmb_project-prescan, one of them (`git log` on an empty repo, exit 128) failed, and CC's parallel runtime cancelled the entire batch. Bro retried serially; the second batch contained another fragile call which also got cancelled. Net cost: ~9 wasted tool-uses + meaningful subagent context burn. The bug is prompt design, not platform β CC's cancel-on-sibling-failure behavior is a fact of life. The fix is to never batch fragile-by-design calls with healthy ones in the same response. ### tmb_project-prescan rewrite - Hard-rule section added at the top of the procedure: explains the cancel-cascade pathology and lists the two safe patterns (probe-first vs `|| true` defang). - Procedure split into three explicit phases: - Phase 1 β probe state serially (git rev-parse HEAD, dir existence checks). Cheap, β€2s, always single-call. - Phase 2 β batched read fan-out, gated on probe results. Fragile calls (git log, ls of missing dirs) only fire when the probe says they'll succeed. - Phase 3 β MCP queries (always safe to batch β they return null on miss instead of exiting non-zero). - Glob calls remain in their own block (Glob never errors). ### CLAUDE.md global rule Added a "Parallel-batching safety" section under Code-touching asks. Lists the four most common fragile commands (`git log`, `ls <dir>`, `find <path>`, `git diff @{u}..`) and the two safe patterns. Notes that Glob, Grep, and MCP tool calls are safe to batch (they don't fail with non-zero exits). This guidance applies to bro and any subagent firing exploratory Bash batches. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS Lego cap (β€30 lines templates/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` ### Expected impact Fresh-project prescan: 0 batch cancellations (down from 2). Bro completes prescan in ~1 batch instead of 2β3 retry rounds. Bonus: less context burn in subagent budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closed
Observed in PR #64 Layer 3 dogfood: onboarding's Step 5 copied BOTH swe.md AND pr-reviewer.md eagerly into .claude/agents/. That contradicts the "pr-reviewer is the push gate, not the task gate" doctrine β pr-reviewer shouldn't appear in the project until the first time it's actually needed (at push time), not at onboarding. User reported this as "still automatically creates pr-reviewer" after quitting Mode B mid-test. ### Mental model alignment | Agent | First needed at | Copy timing | |---|---|---| | swe.md | First task | Onboarding (eager) β first task could fire any moment | | pr-reviewer.md | First `git push` | Lazy β first push-gate trigger only | | consultants (architect, cto, ceo, pm) | First "@bro get the X opinion" | Lazy β first request | ### Changes - **skills/tmb_first-run-onboarding/SKILL.md** β Step 5 rewritten: - Removed pr-reviewer.md from the eager copy list - Removed review-protocol/SKILL.md and review-findings/SKILL.md (those are pr-reviewer's skills, copied alongside pr-reviewer when it's finally copied) - Kept the 5 swe-side skills (swe-checklist, code-quality, docs-conventions, git-conventions, naming-conventions) - Status message now mentions "pr-reviewer will be set up automatically the first time you push" so the user knows what to expect - **CLAUDE.md** β Push gate flow gains an explicit Step 1: lazy-copy pr-reviewer.md + review-protocol + review-findings from templates if they're missing. No AskUserQuestion (the Human invoked the push-review flow β that's the consent boundary). Logs as `tmb_pr_reviewer_lazy_copied` ledger event. - **CLAUDE.md** β "Templates shipped" table updated: pr-reviewer.md row now reads "First time the push gate fires (`@bro review before push`) β lazy, not at onboarding." - **CLAUDE.md** β "templates/skills/" paragraph clarifies which skills are swe-side vs pr-reviewer-side and when each gets copied. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS Lego cap (β€30 lines templates/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` ### Expected behavior after this fix Fresh project after onboarding completes: ``` .claude/ βββ agents/ β βββ swe.md # only swe β pr-reviewer NOT copied βββ skills/ βββ code-quality/ βββ docs-conventions/ βββ git-conventions/ βββ naming-conventions/ βββ swe-checklist/ ``` pr-reviewer.md + review-protocol + review-findings appear later, only after the first `@bro review before push` fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mode A retest in /tmp/tmb-scratch landed the file copies correctly but left no ledger row for tmb_bootstrap_complete. The chain still worked end-to-end (~2:16 wall-clock, 5Γ faster than the #63 baseline) but the audit trail lost the "onboarding ran here" anchor that downstream skills + tests rely on. Root cause: in the prior version of Step 5, the ledger_log call was described after the user-visible status message, framed as "Then log the copy event:" β bro's LLM treated that as a courtesy step instead of mandatory and skipped it once the human-facing line was emitted. ### Changes - **skills/tmb_first-run-onboarding** β Step 5 split into 5a (file copy), 5b (REQUIRED ledger_log + ledger_list verify, framed as "not optional" with a dedicated header), 5c (status message). Mandatory MCP write sequence at the top of the skill renumbered to include all 6 items (was "ALL FOUR" β stale after Step 5 was added). Closing message now explicitly waits for ledger_list to confirm the audit row landed. Frontmatter allowed-tools extended with ledger_list. - **tests/lint/onboarding-skill-contract.sh** β added 4 new assertions to catch this regression class: - skill must reference `tmb_bootstrap_complete` - skill must reference `ledger_log` (the write tool) - skill must reference `ledger_list` (the verify tool) - skill must contain the literal "not optional" so the LLM can't read the ledger step as courtesy ### Layer 1 + 2 status ``` Onboarding contract lint: PASS (4 new assertions added) Lego cap (β€30 lines templates/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` Next Mode A run should produce a `tmb_bootstrap_complete` ledger row within onboarding, before the closing message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦icult; bro verification non-negotiable Per Zax review of latency proposals (DISCUSSION.md): ### #1 β Split tmb_architect-workflow (443 lines) into two narrower skills Bro on the simple-fast-lane was loading 443 lines of skill prose to use ~80 lines of it (defaults table + waiver guidance + handoff). Most of the file is difficult-path content (env probe, ambiguous-choices list, RED FLAG examples, blueprint format) that simple-path bro never uses. Split: - **skills/tmb_planning-simple** (114 lines): defaults table, scope-gate waiver, batched-handoff hard rule, bro verification protocol, escalate triggers, trivial template. The whole simple-triage protocol. - **skills/tmb_planning-difficult** (210 lines): triage confirmation, env probe, grounded Q+A rounds, scope-ambiguity HARD RULE + worked examples, ADR + decision capture, standard template, bro verification protocol. - **skills/tmb_architect-workflow** (deleted) β the combined file. Bro loads exactly one of the two based on the triage decision (which it already makes per `tmb_branch-id-proposal`). Simple-path planner reads ~25% of what it used to. ### #2 β Hard-rule prose for batched handoff Both new planning skills mark the post-triage tool-call batch as a HARD RULE: emit `task_create_batch` + `Task(swe)` + `discussion_append(routing note)` + `ledger_log(planning_complete)` as multiple tool_use blocks in ONE assistant response. Prior runs split these across 3+ messages, costing ~30s of round-trip latency per task. ### #3 β Bro verification protocol (lean but mandatory) Per Zax's hard constraint ("bro should never skip verification"), both planning skills include a "Bro verification protocol β never skip this" section that bro runs after SWE returns and BEFORE flipping to closed: 1. **V1 β Pull spec + diff**: `task_get` + `git diff <commit_sha>~1..<sha>` 2. **V2 β Three checks (all required)**: - Files match `## Files` - `## Verification` commands re-run and pass - Each `## Success Criteria` bullet visibly met in the diff 3. **V3 β Decide**: all-pass β `task_update_status(status='closed')` (+ `issue_close` batched in same response if last task). Any-fail β log to discussion, re-spawn SWE with feedback (max 3) or escalate. This is the **task gate** β distinct from pr-reviewer which is the **push gate** (deeper style/security checks at git push time). ### Other refs swept - CLAUDE.md routing + first-action chain rewritten to load the right planning skill per triage; new explicit "bro verification is non-negotiable" section. - README.md workflow line updated. - skills/tmb_branch-id-proposal cross-references updated. - docs/architecture/FILES.md tree refreshed. - docs/architecture/FLOWS.md tables + Flow 2 sequence diagram updated. - tests/lint/onboarding-skill-contract.sh β old architect-workflow assertions split across the two new files; added new assertions for the verification step ("Bro verification protocol", "never skip", "task gate") on both files. - tests/mcp-integration/scope-gate.test.mjs comment refresh. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS (24 assertions across 4 skills) Lego cap (β€30 lines templates/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` ### Expected impact The 1m 34s "planning gap" measured in the prior Mode A run came from bro loading + processing 443 lines of architect-workflow before the batched write. Cutting that to 114 lines for the simple path should drop the gap by ~30-50s. Combined with the hard-rule batching prose forcing bro to emit fewer total messages (target: 2 instead of 3), simple-triage planning should land in ~30-45s of bro work plus SWE/Human time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦ll templates Per Zax's review of d5b762e: the template skills shipped from `templates/skills/` still carried "Architect plans / Architect spawns pr-reviewer / Architect must revert" wording from the pre-bro-as-planner era. These templates get copied into every TMB project during onboarding (per `tmb_first-run-onboarding` Step 5), so the stale doctrine was leaking into every install. ### Changes per template - **git-conventions** β full rewrite of the gates section. Two real gates now: (1) Bro task gate (auto-runs after SWE returns, per planning skills' verification protocol), (2) PR-reviewer push gate (fires on `git push` to protected branches via `git-push-guard.sh`), (3) Human merge gate (final eyeball on GitHub diff). Old "Architect spawns pr-reviewer at gate 1/2" wording removed. Source-code authorship check now references `tasks` row + `commit_sha` correlation, with a Direct Mode exception that cross-checks the `direct_mode_used` ledger event. - **docs-conventions** β frontmatter `agent:` updated (architect β bro). Enforcement section: "PR Reviewer flags... The Architect escalates" β "PR-reviewer flags at push time. Bro escalates during planning." Editing-prompts section: "any agent touching prompt files β usually architect" β "bro (planning + Direct Mode docs) or SWE (when task spec names a markdown file)". "route via architect β SWE" β "route via bro β SWE (the standard task chain)". - **naming-conventions** β frontmatter `agent: swe, architect` β `swe, pr-reviewer`. Architect doesn't load this; pr-reviewer does at push gate. - **code-quality** β Description "Used by Architect (design-time)" β "Used by bro (design-time)". Per-section "Design-time questions (Architect)" β "(bro)" across all 4 categories. - **review-findings** β "Architect β consult before writing task files" β "Bro β consult before authoring task_create_batch spec_body". ### Why this matters Template skills are inert plugin content until onboarding copies them into `<project>/.claude/skills/`. Every project that runs onboarding inherits these files verbatim. The Lego rule says we never edit them post-copy, so ANY stale doctrine in the templates becomes permanent project doctrine unless the user notices and edits manually. Catching this before the next Mode B run avoids re-baking the rot. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS Lego cap (β€30 lines templates/agents/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` No new lint coverage added β these are template skills (no automated contract assertions), but they're verified by Layer 3 (a fresh onboarding now copies clean skills into the project). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 25, 2026
Closed
Closed
Supersedes the v0.2.0 placeholder tag (to be revoked on main-merge). ### Versions aligned - `.claude-plugin/plugin.json`: 0.3.2 β **0.1.0** - `mcp/trajectory-server/package.json`: 0.2.0 β **0.1.0** ### CHANGELOG.md Replaced the "Unreleased β first public release pending" placeholder with the comprehensive v0.1.0 entry summarizing everything that shipped across PRs #41 β #64: - bro-as-persona + planner + task gate doctrine - pr-reviewer-as-push-gate + git-push-guard.sh hook - Lego templates (6 agents in templates/agents/, plugin's agents/ empty) - planning-simple / planning-difficult split (was tmb_architect-workflow) - bundled SQLite trajectory MCP server (~30 tools) - requireRoles middleware on every workflow-state mutation - Direct Mode for trivial single-file edits - Three-layer test infrastructure (lint / MCP-integration / dogfood) - Documented latency budget in docs/PERFORMANCE.md - Documented "known not-done" linking the 4 main backlog items ### Post-merge release plan After this PR merges to dev + dev β main: 1. `git tag -d v0.2.0` 2. `git push origin :refs/tags/v0.2.0` 3. `gh release delete v0.2.0` 4. `git tag -a v0.1.0 main -m "..."` 5. `git push origin v0.1.0` 6. `gh release create v0.1.0 --notes-file CHANGELOG.md` Layer 1 + 2 tests still PASS. No code/skill/agent changes β only version bumps + release notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 25, 2026
ZaxShen
added a commit
that referenced
this pull request
Apr 25, 2026
PERFORMANCE.md was 2/3 PR-cycle ephemera (PR #63 baseline timeline, "changes shipped in #64" table, "verification pending" TODO) and 1/3 evergreen doctrine (target latency band, Tier 1/2/3 trim guidance, re-eval triggers). The ephemera was rotting on every perf cycle. Moved the evergreen 1/3 into CONTRIBUTING.md Β§ Performance. Deleted the doc. Filed #75 as a standing re-baseline issue β opens when a re-eval trigger fires, closes when the timing posts back to green. Updated references: - README.md β links to CONTRIBUTING.md#performance - CLAUDE.md β same - FILES.md β drops the doc from the file map - CHANGELOG.md β notes the relocation in v0.1.2 and updates the v0.1.0 historical references so they're not broken links Doctrine durable in CONTRIBUTING.md. Measurements ephemeral in issues + git. No more standing doc that grows stale. Closes the perf-doc-staleness side of the v0.1.2 cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
STATUS: DRAFT β proposals on Human review, no code shipped yet.
Follow-up to #63 (verified Layer 3 dogfood at ~12min wall-clock for a CLI todo, vs ~32s for pure Claude). This PR is the placeholder for the latency-reduction work; actual changes land after Zax approves the proposals in
TMB/bro/DISCUSSION.md(workspace-local β visible here only as the embedded summary below).Latency breakdown (from #63 Layer 3 verified run)
Five candidate optimizations (ranked by impact-to-risk ratio)
simpletriageProposed ship plan
What's on Zax's plate before this PR ships code
Full reasoning + arguments for/against per proposal:
TMB/bro/DISCUSSION.mdat workspace level.π€ Generated with Claude Code