Skip to content

πŸ›οΈ feat: Lego templates β€” plugin ships zero global agents#63

Merged
ZaxShen merged 1 commit into
devfrom
feat/lego-templates
Apr 25, 2026
Merged

πŸ›οΈ feat: Lego templates β€” plugin ships zero global agents#63
ZaxShen merged 1 commit into
devfrom
feat/lego-templates

Conversation

@ZaxShen
Copy link
Copy Markdown
Contributor

@ZaxShen ZaxShen commented Apr 25, 2026

Summary

Implements #60 β€” the Lego templates restructure.

Plugin ships ZERO global agents. Bro is the only persona (in CLAUDE.md). Every other agent (swe, pr-reviewer, architect, cto, ceo, pm) is a Lego template that bro copies into <project>/.claude/agents/ on demand, with explicit Human approval. Project customization happens by extending the agent's skills: array via tmb_skill-creator β€” never by editing the agent body.

Lego rule

Layer Mutability
Template body Immutable β€” bro copies verbatim, never edits
skills: array Additive β€” extended by tmb_skill-creator
Spawn prompt Ephemeral β€” fresh per call

What's in this PR

New templates (6 agents, ≀25 lines body each)

templates/agents/ β€” swe, pr-reviewer, architect, cto, ceo, pm. Each is identity + role + boundary + skills: []. The 30-line Lego cap is enforced by tests/lint/agent-line-budget.sh.

New skill templates (7 default skills)

templates/skills/ β€” swe-checklist, review-protocol, review-findings, code-quality, docs-conventions, git-conventions, naming-conventions. Bootstrap copies these into the project alongside swe + pr-reviewer.

tmb_ prefix on all 14 plugin protocol skills

Plus 2 NEW protocol skills:

  • tmb_bootstrap β€” first-run template copy (with Human approval)
  • tmb_skill-creator β€” generates project skills, extends agent skills: arrays (NEVER edits agent body)

tmb_agent-creator rewritten with template-copy mode (PRIMARY) + from-scratch fallback.

CLAUDE.md updates

  • Bootstrap-check added to bro's first-action chain
  • Code-touching chain references tmb_* skill names
  • Routing table now describes the template-copy-on-demand flow
  • New 'Templates shipped with the plugin' section enumerates all 6

MCP role enforcement (still in place from #59)

  • task_create_batch, issue_create, issue_close β†’ bro only
  • task_update_status β†’ bro + swe
  • validation_record β†’ pr-reviewer only
  • discussion_append β†’ bro + swe + pr-reviewer + any consultant

Layer 1 + 2 status: GREEN

  • Onboarding contract lint: PASS
  • Lego cap (≀30 lines on templates/agents/): PASS β€” all ≀25 lines
  • MCP server unit tests: 0 failures
  • MCP integration suite: 0 failures
  • Hook tests: PASS

Test plan

  • Layer 1 lint
  • Layer 2 MCP unit + integration + hook tests
  • Layer 3 dogfood β€” see tests/manual/scenarios.md. Specifically:
    • 1.1 (onboarding) β€” should NOW also trigger tmb_bootstrap after onboarding completes (project gets swe + pr-reviewer + default skills copied)
    • 2.1.bro (canonical simple-task smoke test) β€” should work end-to-end with the project-local swe + pr-reviewer
    • C.1 (consultant invocation) β€” should now trigger tmb_agent-creator template-copy mode for architect/cto/ceo/pm

Followup

πŸ€– Generated with Claude Code

Major restructure to the agent-factory model. The plugin now ships ONLY
bro (a persona in CLAUDE.md) and a set of templates + skills. Every other
agent β€” swe, pr-reviewer, architect, cto, ceo, pm β€” is a Lego template that
bro copies into <project>/.claude/agents/ on demand.

### Lego philosophy

> Agents are studs. Skills are bricks. Spawn-prompts are assembly
> instructions. You don't edit a Lego block β€” you snap blocks together.

Three-layer composition rule, never confused:

| Layer | Mutability |
|---|---|
| Template body (`templates/agents/<name>.md`) | **Immutable** β€” bro copies verbatim, never edits |
| `skills:` frontmatter array on the project copy | **Additive** β€” extended by `tmb_skill-creator`, never replaced |
| Spawn prompt | **Ephemeral** β€” fresh per Task-tool invocation |

### Templates shipped

`templates/agents/` β€” 6 minimal Lego blocks (each ≀25 lines body, enforced
by `tests/lint/agent-line-budget.sh` 30-line cap):

- swe.md β€” executor, atomic close
- pr-reviewer.md β€” pre-commit gate
- architect.md β€” system-design consultant (analysis-only)
- cto.md β€” technical strategy consultant
- ceo.md β€” product scope consultant
- pm.md β€” product strategy consultant

`templates/skills/` β€” 7 default skill bag copied alongside swe + pr-reviewer
during bootstrap (swe-checklist, review-protocol, review-findings,
code-quality, docs-conventions, git-conventions, naming-conventions).

### Plugin protocol skills β€” `tmb_` prefix on all 16

To prevent name collisions with project-local skills, every plugin-shipped
skill now has the `tmb_` prefix. Project-local skills stay unprefixed and
live under `<project>/.claude/skills/`. Renamed:

- agent-creator β†’ tmb_agent-creator (template-copy mode + scratch fallback)
- architect-workflow β†’ tmb_architect-workflow
- branch-id-proposal β†’ tmb_branch-id-proposal
- create-hook β†’ tmb_create-hook
- feedback-loop β†’ tmb_feedback-loop
- first-run-onboarding β†’ tmb_first-run-onboarding
- lazy-regen-check β†’ tmb_lazy-regen-check
- project-prescan β†’ tmb_project-prescan
- refresh-architecture β†’ tmb_refresh-architecture
- roundtable β†’ tmb_roundtable
- roundtable-cleanup β†’ tmb_roundtable-cleanup
- swe-spawn-workflow β†’ tmb_swe-spawn-workflow
- tmb-reonboard β†’ tmb_reonboard (hyphen β†’ underscore for consistency)
- validate-swe-output β†’ tmb_validate-swe-output

Plus two NEW protocol skills:

- **tmb_bootstrap** β€” first-run flow. Bro detects `.claude/agents/swe.md`
  missing β†’ AskUserQuestion approval β†’ copies templates verbatim into
  the project. Logs to ledger. Idempotent (skips files that already exist).
- **tmb_skill-creator** β€” generates a new project-local skill and extends
  the relevant agent's `skills:` array. NEVER edits the agent body β€” the
  only allowed mutation to a project agent is appending to its skills
  list. Hard rule.

### tmb_agent-creator: two modes

Rewritten with clear branching:

- **Template-copy mode (PRIMARY)** β€” when the requested name matches a
  shipped template (`templates/agents/<name>.md` exists), copy verbatim
  with Human approval. The 4 consultant templates (architect/cto/ceo/pm)
  always trigger this mode.
- **From-scratch mode (FALLBACK)** β€” when no template matches (e.g.
  `legal-reviewer`, `security-reviewer`), draft a Lego-shaped prompt
  with Human approval and write it.

Reserved names tightened to `bro`, `swe`, `pr-reviewer` (architect is no
longer reserved β€” it's a common consultant template).

### CLAUDE.md updates

- Added bootstrap-check to bro's first-action chain (after onboarding,
  before resume).
- Code-touching chain now references `tmb_*` skill names.
- Routing table explicitly says "check .claude/agents/<name>.md, if absent
  invoke tmb_agent-creator (template-copy if available, scratch otherwise)".
- New "Templates shipped with the plugin" section enumerates the 6 templates
  and their copy triggers.

### Layout

```
plugin/
β”œβ”€β”€ agents/                    ← EMPTY (.gitkeep only; bro is in CLAUDE.md)
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ agents/{swe,pr-reviewer,architect,cto,ceo,pm}.md
β”‚   └── skills/{swe-checklist,review-*,code-quality,*-conventions,naming-conventions}/
β”œβ”€β”€ skills/                    ← all `tmb_*` plugin protocol skills
└── CLAUDE.md
```

### Tests

- `tests/lint/agent-line-budget.sh` rewritten β€” checks plugin/agents/ is
  empty and templates/agents/*.md within the 30-line Lego cap.
- `tests/lint/onboarding-skill-contract.sh` paths updated to `tmb_*`.
- All MCP integration + unit tests still pass after the renames.

### Layer 1 + 2 status: GREEN

```
Onboarding contract lint: PASS
Lego cap on templates/agents: PASS (all ≀25 lines)
MCP server unit tests: 0 failures
MCP integration suite: 0 failures
Hook tests: PASS
```

### Followup

- #61 β€” open question on agent-shared skill bags (per-agent listing wins for now)
- #62 β€” open question on splitting templates into a separate marketplace channel

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZaxShen
Copy link
Copy Markdown
Contributor Author

ZaxShen commented Apr 25, 2026

Layer 3 dogfood verification β€” end-to-end PASS

Walked the canonical scenario in fresh Mode B (`/tmp/tmb-smoke`) against this branch. First time the bro→swe→pr-reviewer chain has completed end-to-end with a clean lifecycle.

Trigger

Wiped `/tmp/tmb-smoke`, fresh `git init`, fresh Mode B session, fresh DB. Single ask: `@bro write a cli todo by Python`.

Trajectory log (from MCP DB monitor)

```
19:29:53 identity ZaxShen written
19:29:56 config branching_model="github-flow", pr_target="main"
19:29:56 config protected_branches=["main"] ← raw JSON array, NOT "[\"main\"]" β€” config_set defense fix verified
19:54:43 issue 1 created (objective: "Build a Python CLI todo app")
19:55:13 discussion 2 (note): "Beginning planning on branch_id feat/cli-todo, triage: simple"
← updated tmb_branch-id-proposal wording (was "Routed to architect" pre-cleanup)
19:55:30 ledger 1 (scope_gate_waived): "simple-triage personal CLI: defaulted to argparse + unittest..."
← simple-fast-lane waiver fired correctly
19:55:31 task 1 (feat/cli-todo, pending) created via task_create_batch
19:55:34 ledger 2 (planning_complete)
──────── 51s planning phase total ────────
19:59:10 ledger 3 (tmb_bootstrap_complete): "Copied 2 agent templates (swe, pr-reviewer) + 7 skill templates"
← bro detected missing .claude/agents/swe.md, invoked tmb_bootstrap, AskUserQuestion approved, copied verbatim
20:03:48 task 1 β†’ completed (SWE atomic close with commit_sha)
← worktree at .claude/worktrees/agent-acb891639c9f96486, committed:
── caeb93d ✨ feat(cli-todo): minimal argparse + JSON-store todo CLI
── files: todo.py, test_todo.py, README.md
20:06:44 validation 1 (task_id=1, attempt_n=1, verdict=pass) ← pr-reviewer signed off
20:06:44 task 1 β†’ closed (bro flipped status atomically)
20:08:11 issue 1 β†’ closed (bro called issue_close after all tasks closed)
```

Design checkpoints β€” all green

Checkpoint Status
Onboarding on fresh DB (identity + 3 config writes) βœ“
`protected_branches` stored as raw `["main"]` array (not double-encoded `"[\"main\"]"` β€” config_set defense) βœ“
Updated routing-note wording ("Beginning planning on..." not "Routed to architect on...") βœ“
Simple-fast-lane: no env probe, no ceremonial Q+A, defaults table picked βœ“
Scope-gate waiver with valid reason logged to ledger βœ“
`tmb_bootstrap` auto-invoked on missing `.claude/agents/swe.md` βœ“
Template copy verbatim (2 agents + 7 skills, no body edits) βœ“
Broβ†’swe spawn finds project-local agent βœ“
Worktree isolation + atomic commit + status='completed' βœ“
Broβ†’pr-reviewer spawn + `validation_record(verdict='pass')` βœ“
Bro flips task β†’ `closed` after pass verdict βœ“
Bro flips issue β†’ `closed` after all tasks closed βœ“
Zero `forbidden` events (MCP `requireRoles` enforcement clean) βœ“
Every state transition logged to ledger (replayable) βœ“

Lifecycle final state

```
issue.status: open β†’ closed
task.status: pending β†’ completed β†’ closed
verdict: pass (pr-reviewer)
ledger events: scope_gate_waived β†’ planning_complete β†’ tmb_bootstrap_complete
```

Every state machine ended in a terminal position. No orphan `running` tasks. No missing verdicts. No abandoned ledger entries.

Comparison to baselines

Run Total time Outcome
Original `bro→architect→swe` chain (pre-#58) 7+ minutes Hung at SWE spawn, never completed, Ctrl-C didn't interrupt
Pure Claude Code on the same ask (no plugin) ~32s Code generated, no audit, no governance
This run (this PR's Lego architecture) ~12min total (one-time bootstrap + chain), `~8.5min` projected for subsequent tasks in same project Closed cleanly with full audit + role enforcement

The runtime is meaningfully heavier than pure Claude β€” multi-agent + audit trail + gates aren't free β€” but the chain now actually completes, every transition is replayable from SQLite, and consultants can never write workflow state by construction. Subsequent tasks in the same project skip bootstrap entirely.

What this verifies for #60

  • βœ… Plugin ships zero global agents (`agents/` is empty)
  • βœ… `tmb_bootstrap` skill copies templates verbatim, no body edits
  • βœ… All `tmb_*` skill renames work β€” no name collisions, all references resolve
  • βœ… Bro's first-action chain correctly invokes `tmb_bootstrap` when `.claude/agents/swe.md` absent
  • βœ… Lego rule enforced in practice: agent files copied verbatim from templates, no edits
  • βœ… Layer 3 dogfood confirms what Layers 1 + 2 asserted statically

Open follow-ups (not blockers for this PR)

  • The 3.5min interactive bootstrap pause is wall-clock from the AskUserQuestion approval round-trip; that's user-bound, not chain-bound.
  • pr-reviewer ran in ~3min including bro's wrapper time; could potentially be tightened in a future iteration but doesn't block.
  • No `forbidden` events surfaced because the chain stayed in role; the negative-path tests in role-matrix.test.mjs cover the rejection case at Layer 2.

Recommend merging. Lego templates work end-to-end in dogfood.

@ZaxShen ZaxShen self-assigned this Apr 25, 2026
@ZaxShen ZaxShen merged commit 6715f89 into dev Apr 25, 2026
1 check passed
@ZaxShen ZaxShen deleted the feat/lego-templates branch April 25, 2026 03:14
ZaxShen added a commit that referenced this pull request Apr 25, 2026
…vial edits

Implements the Round 1 latency-reduction plan from #64's DISCUSSION.md.
Per Zax's review of the proposals (DISCUSSION.md in TMB workspace):

### Doctrine changes

**Bro is the task gate.** Bro auto-flips a task to `status='closed'` as
soon as SWE returns with `status='completed'` and a `commit_sha`. No
pr-reviewer at task close.

**pr-reviewer is the push gate.** It fires only at `git push` time over
the batch of unsigned commits about to ship. Triggered by the
`git-push-guard.sh` PreToolUse hook + the Human invoking the magic
phrase `@bro review before push`. Cost amortized across multiple tasks
per push instead of paid per task.

**Direct Mode for trivial single-file changes.** Bro is now allowed to
Edit a file directly when ALL of: single file, ≀3 lines diff, no public
API change, no test required, no docs/trustmybot/architecture/ touched.
Logged as `direct_mode_used` in the ledger. Anything bigger falls back
to the default chain.

### File changes

- **CLAUDE.md** β€” rewrote `## Role`, `## First-action chain`,
  `## Code-touching asks`, `## Routing`. Dropped the bootstrap-check
  step (onboarding now handles template copy inline). Added `## Direct
  Mode` and `## Push gate` sections. Catchphrase now bound to the push
  gate's pass verdict.

- **skills/tmb_architect-workflow/SKILL.md** β€” rewrote the main
  sequence: bro flips task β†’ closed immediately after SWE returns.
  No pr-reviewer step. Added explicit instruction: emit
  `task_create_batch` + `Task(swe)` + `ledger_log(planning_complete)`
  as multiple tool_use blocks in one assistant response so CC executes
  them concurrently. Same change to the simple-fast-lane handoff step.

- **skills/tmb_first-run-onboarding/SKILL.md** β€” added Step 5: silently
  copy templates/agents/{swe,pr-reviewer}.md + templates/skills/* into
  the project as part of onboarding's mandatory write sequence. No
  extra AskUserQuestion. Logs the same `tmb_bootstrap_complete` event
  for audit consistency. Frontmatter `allowed-tools` extended with
  `Read, Write, ledger_log`.

- **skills/tmb_bootstrap/SKILL.md** β€” relegated to RECOVERY skill.
  Triggers only on the rare edge case where identity is set but
  `.claude/agents/swe.md` is missing (e.g. user hand-deleted between
  sessions). Onboarding handles the normal path now.

- **templates/agents/swe.md** β€” dropped the chain-of-thought block.
  Removed `swe-checklist` from eager `skills:` (now lazy-load via Skill
  tool only when verification needs interpretation). Instructed to
  batch `task_get` + `git worktree add` in the first response so they
  run concurrently. Net 22 β†’ 21 lines.

- **templates/agents/pr-reviewer.md** β€” reframed as the push gate.
  Body explicitly notes it fires over a batch of unsigned tasks, not
  per individual task close. Skills array empty (project attaches
  review skills via tmb_skill-creator).

- **scripts/hooks/git-push-guard.sh (NEW)** β€” replaces
  `require-review-sign.sh`. PreToolUse on Bash matching `git push`.
  Uses `git log @{u}..HEAD` to determine commits being pushed; queries
  the trajectory DB for tasks with matching commit_sha that lack a
  `validation_attempts.verdict='pass'` row. Blocks with a structured
  message naming the specific unsigned tasks and instructing
  `@bro review before push`.

- **scripts/hooks/lib/query-task.sh** β€” `tmb_unsigned_tasks` updated:
  now queries `status='closed' AND commit_sha IS NOT NULL` (was
  `status='completed'`). Reflects the new doctrine where bro auto-
  closes immediately and pr-reviewer fires later.

- **scripts/hooks/require-review-sign.sh + test** β€” DELETED, replaced
  by git-push-guard.sh.

- **hooks/hooks.json** β€” swapped `require-review-sign.sh` β†’
  `git-push-guard.sh` in the PreToolUse Bash matcher.

- **docs/PERFORMANCE.md (NEW)** β€” records target, baseline (#63 12-min
  run), Tier 1/2/3 doctrine of what's safe to trim, and the changes
  shipped here. Re-evaluation triggers spelled out.

- **docs/architecture/FLOWS.md** β€” Flow 2 (simple task) sequence
  diagram redrawn for the new chain (parallel batched writes, bro
  closes inline, no pr-reviewer step). Flow 6 (PR review) reframed as
  the push gate flow with parallel pr-reviewer spawns over multiple
  unsigned tasks.

### Layer 1 + 2 status

```
Onboarding contract lint:           PASS
Lego cap (≀30 lines templates/):    PASS (max 26)
MCP server unit + integration:      PASS
Hook tests (git-guards, task-spec): PASS
```

The deleted require-review-sign.sh test file went away cleanly; git-
push-guard.sh has no dedicated test yet (Layer 3 dogfood will exercise
it via real `git push`).

### Expected impact (estimate, to be verified in Layer 3)

- Per-task time: ~12 min β†’ ~3–5 min on simple-triage path
- Direct Mode trivial edits: ~10–20s, near pure-Claude
- Per-push pr-reviewer pass: ~2–3 min flat regardless of unsigned-task count (parallel spawns)

Layer 3 verification next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request Apr 25, 2026
Mode A retest in /tmp/tmb-scratch landed the file copies correctly but
left no ledger row for tmb_bootstrap_complete. The chain still worked
end-to-end (~2:16 wall-clock, 5Γ— faster than the #63 baseline) but the
audit trail lost the "onboarding ran here" anchor that downstream skills
+ tests rely on.

Root cause: in the prior version of Step 5, the ledger_log call was
described after the user-visible status message, framed as "Then log
the copy event:" β€” bro's LLM treated that as a courtesy step instead of
mandatory and skipped it once the human-facing line was emitted.

### Changes

- **skills/tmb_first-run-onboarding** β€” Step 5 split into 5a (file copy),
  5b (REQUIRED ledger_log + ledger_list verify, framed as "not optional"
  with a dedicated header), 5c (status message). Mandatory MCP write
  sequence at the top of the skill renumbered to include all 6 items
  (was "ALL FOUR" β€” stale after Step 5 was added). Closing message now
  explicitly waits for ledger_list to confirm the audit row landed.
  Frontmatter allowed-tools extended with ledger_list.

- **tests/lint/onboarding-skill-contract.sh** β€” added 4 new assertions
  to catch this regression class:
  - skill must reference `tmb_bootstrap_complete`
  - skill must reference `ledger_log` (the write tool)
  - skill must reference `ledger_list` (the verify tool)
  - skill must contain the literal "not optional" so the LLM can't read
    the ledger step as courtesy

### Layer 1 + 2 status

```
Onboarding contract lint:           PASS (4 new assertions added)
Lego cap (≀30 lines templates/):    PASS
MCP server unit + integration:      PASS
Hook tests:                         PASS
```

Next Mode A run should produce a `tmb_bootstrap_complete` ledger row
within onboarding, before the closing message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request Apr 25, 2026
PERFORMANCE.md was 2/3 PR-cycle ephemera (PR #63 baseline timeline,
"changes shipped in #64" table, "verification pending" TODO) and
1/3 evergreen doctrine (target latency band, Tier 1/2/3 trim guidance,
re-eval triggers). The ephemera was rotting on every perf cycle.

Moved the evergreen 1/3 into CONTRIBUTING.md Β§ Performance. Deleted
the doc. Filed #75 as a standing re-baseline issue β€” opens when a
re-eval trigger fires, closes when the timing posts back to green.

Updated references:
- README.md β†’ links to CONTRIBUTING.md#performance
- CLAUDE.md β†’ same
- FILES.md β†’ drops the doc from the file map
- CHANGELOG.md β†’ notes the relocation in v0.1.2 and updates the
  v0.1.0 historical references so they're not broken links

Doctrine durable in CONTRIBUTING.md. Measurements ephemeral in issues +
git. No more standing doc that grows stale.

Closes the perf-doc-staleness side of the v0.1.2 cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant