Skip to content

πŸ“ dogfood setup: SCENARIOS.md + bun workspaces + doc fixes (closes #40, #42)#41

Merged
ZaxShen merged 55 commits into
devfrom
docs/40-dogfood-scenarios
Apr 25, 2026
Merged

πŸ“ dogfood setup: SCENARIOS.md + bun workspaces + doc fixes (closes #40, #42)#41
ZaxShen merged 55 commits into
devfrom
docs/40-dogfood-scenarios

Conversation

@ZaxShen
Copy link
Copy Markdown
Contributor

@ZaxShen ZaxShen commented Apr 24, 2026

Summary

One PR covering everything needed to actually run the dogfood flow from a fresh clone:

  1. docs/architecture/SCENARIOS.md β€” 30+ manual test scenarios mapped to the 9 flows in FLOWS.md. Each scenario has verbatim trigger prompts, prerequisites, expected behavior, and verification SQL. All four roundtable corner cases (explicit/implicit Γ— planners-present/absent) are covered. Closes docs: dogfood test scenarios β€” trigger prompts mapped to FLOWS.mdΒ #40.
  2. Bun workspaces at plugin root β€” root package.json declares mcp/trajectory-server and monitors as a workspace. One bun install at plugin root now resolves both subpackages and produces a single consolidated bun.lock. Closes Use Bun workspaces at plugin root + gitignore dogfood-generated pathsΒ #42.
  3. .gitignore dogfood paths β€” docs/trustmybot/ excluded so dogfood-from-inside-repo runs never pollute commits.
  4. .github/workflows/test.yml β€” simplified to three root-level steps (install / build / tests) now that workspaces work at root.
  5. docs/local-testing.md PLUGIN_PATH fix β€” replaced the copy-paste trap (/absolute/path/to/trustmybot-plugin) with export PLUGIN_PATH="$(pwd)" and a sanity-check echo. Hit during live dogfood: users were launching CC with the literal placeholder and getting zero plugins loaded.
  6. Manifest fixes (plugin.json, marketplace.json, hooks.json, .mcp.json) β€” every manifest validated cleanly against claude plugin validate; MCP server now actually loads.
  7. Monitor script fix + removal of monitors.json β€” stops the misleading "Monitor stream ended" noise; issue Reintroduce monitors/ as a proper long-running streaming daemonΒ #44 files the proper streaming rewrite.
  8. Rename gatekeeper β†’ bro β€” closes the handle half of User entry path to gatekeeper: enforce the routing + shorten the mention handle (@bro)Β #36; drops the vestigial bro_name schema column since bro is fixed plugin branding, not user-configurable.
  9. Brand voice: "Trust me bro, it works" β€” catchphrase wired into onboarding bookends + bro's prompt, gated behind real integration-test evidence (per-session discussion).
  10. No-yes-man discipline β€” bro escalates concerns via architect spawn prompt, never contradicts the Human directly; architect has explicit independent authority to surface concerns via discussion_append.

Why combined

Everything here is "make the dogfood flow runnable end-to-end" β€” test plan + infra + manifest correctness + agent behavior fixes surfaced by the dogfood itself. Splitting into 10 PRs would just create serial-merge churn with no review benefit.

Test plan

  • bun install --frozen-lockfile at plugin root resolves both subpackages.
  • bun run build at plugin root compiles the MCP server successfully.
  • bash tests/run-all.sh: 235 MCP + 16 hook + 4 agent-budget lint β€” all green.
  • claude plugin validate <path> passes for marketplace.json, plugin.json, hooks.json, .mcp.json.
  • Headless smoke: claude --plugin-dir ... -p "list tmb: agents" returns all 4 agents; plugin:tmb:trajectory-server: connected; config_set returns structured role-enforcement error as designed.
  • CI workflow matches local (single install at root, no per-package working-directory).
  • docs/trustmybot/ gitignored β€” confirmed no dogfood pollution.
  • grep -ri gatekeeper β†’ 0 hits; grep -ri bro_name β†’ 0 hits.

Follow-up issues surfaced during this PR

Closes #40
Closes #42

ZaxShen and others added 3 commits April 23, 2026 20:35
…oses #40)

For each workflow in FLOWS.md, the verbatim user prompt that should
trigger it + the observable expected behavior + the verification
query/file-check to confirm it landed. The manual test grid for
the plugin until automated agent tests exist.

What's covered (~30 scenarios across 9 flows)

- Flow 1 (onboarding): 4 β€” fresh-DB trigger, hold-and-resume of code-
  touching ask during onboarding, read-only ask interleaved, explicit
  re-onboard phrase
- Flow 2 (simple task): 3 β€” typo, comment, internal refactor with no
  API change
- Flow 3 (difficult task): 3 β€” new public API, schema/dependency
  change, cross-cutting concern
- Flow 4 (agent-creator): 4 β€” request, approve, refuse, reserved-name
  refusal
- Flow 5 (skill creation): 1 β€” mostly internal architect flow; noted
  as harder to deterministically trigger
- Flow 6 (PR review): 1 β€” auto-fired in flows 2/3; verification query
- Flow 7 (architecture regen): 4 β€” explicit phrase + 3 lazy variants
  (>25 commits / ≀25 commits / first-ever session)
- Flow 8 (SWE retry/escalation): 2 β€” single fail + retry, 3 fails +
  escalation
- Flow 9 (roundtable): 4 β€” the four corners
  β€’ 9.1 explicit magic word + β‰₯2 planners (canonical)
  β€’ 9.2 explicit magic word + only architect (skill escalates,
    proposes agent-creator)
  β€’ 9.3 implicit cross-domain question + β‰₯2 planners (architect
    detects, invokes roundtable without user request)
  β€’ 9.4 implicit cross-domain + only architect (architect proposes
    creating planners first via flow 4)

Format

Each scenario has: ID, prerequisites, trigger prompt (verbatim),
expected behavior (numbered observations), verification (SQL or
filesystem check), pass/fail checkbox.

Two cross-indices at the top: by trigger style (magic word /
implicit cross-domain / on-demand domain agent / etc.) and by flow.

Cross-links

- FLOWS.md header now references SCENARIOS.md as a companion doc
- docs/local-testing.md replaces its 10-row inline checklist with
  a one-paragraph pointer + a quick smoke-test table; full grid
  lives in SCENARIOS.md
- docs/architecture/FILES.md tree includes SCENARIOS.md
- README.md "Architecture reference" lists SCENARIOS.md

Closes #40

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
closes #42)

Two coupled fixes surfaced during a dogfood attempt that ran bun install
at the plugin repo root and failed β€” there was no root package.json.

Workspaces
- New plugin/package.json declares `workspaces: ["mcp/trajectory-server",
  "monitors"]`. One `bun install` at plugin root installs every
  subpackage; dedups `better-sqlite3` that both subpackages share.
- Root `scripts.build` runs `bun --filter='*' run build` β€” every
  workspace with a build script rebuilds. Root `scripts.test` runs
  `bash tests/run-all.sh`.
- Subpackage `bun.lock` files deleted (mcp/trajectory-server + monitors);
  replaced by a single root `bun.lock` (26 KB).

Gitignore dogfood-generated paths
- Add `docs/trustmybot/` to .gitignore. The plugin's own contributor
  docs live at `docs/architecture/`, not `docs/trustmybot/`. If a
  contributor dogfoods inside this repo (instead of a scratch project),
  architect/SWE would create `docs/trustmybot/snapshots/`,
  `docs/trustmybot/architecture/auto/`, etc. as if this were a user
  project β€” those artifacts should never land in commits.
- The plugin-root .gitignore's existing `.claude/` rule already covers
  the runtime DB + worktrees path.

Docs updated
- docs/local-testing.md Prereqs section: one-command `bun install` +
  `bun run build` at the plugin repo root. Two other places that said
  `cd plugin/mcp/trajectory-server && bun run build` trimmed the same
  way.

CI simplified
- .github/workflows/test.yml: three install/build/test steps that each
  set `working-directory: mcp/trajectory-server` collapsed into a
  single root-level `bun install --frozen-lockfile`, `bun run build`,
  and `bash tests/run-all.sh` (picks up MCP + hook + lint suites).

Verified
- `bun install` at plugin root installs 260 packages, produces one
  consolidated lockfile.
- `bun run build` at plugin root compiles MCP server successfully
  (monitors has no build; `bun --filter` skips it cleanly).
- `bash tests/run-all.sh`: 235 MCP + 16 hook + 4 agent-budget lint, all
  green.

Closes #42

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users were copy-pasting `/absolute/path/to/trustmybot-plugin` literally and
launching Claude Code with a bogus `--plugin-dir`, so no plugin loaded and
`@gatekeeper` had nothing to bind to.

Replace the placeholder with `$(pwd)` captured from the plugin repo root,
with a sanity-check echo and a fallback note for users who run from elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZaxShen ZaxShen changed the title πŸ“ docs: SCENARIOS.md β€” dogfood test plan with trigger prompts (closes #40) πŸ“ dogfood setup: SCENARIOS.md + bun workspaces + doc fixes (closes #40, #42) Apr 24, 2026
ZaxShen and others added 16 commits April 23, 2026 23:58
…h CC plugin spec

`claude plugin validate` surfaced three hard manifest violations that kept
--plugin-dir from loading any of our agents:

1. plugin.json
   - `repository` was an object; spec wants a string URL
   - `dependencies` had npm-style object form; spec wants an array (removed
     for now β€” no installed plugin uses deps yet, and our deps were soft refs)
   - `provides` was not in the spec at all (agents auto-discover from
     agents/ dir, same as every other plugin)
   - Added `homepage` and `keywords` to match the common shape

2. marketplace.json
   - Missing required `owner: { name }` object
   - `source` must end in `/`
   - Top-level `description` moved under `metadata.description`
   - Removed the plugin entry's `version` (version lives on plugin.json)

3. hooks/hooks.json
   - Must be wrapped in `{ "hooks": {...} }`; we had the event map at the root
   - Hook command paths now use `${CLAUDE_PLUGIN_ROOT}/...` for portability
     (matches superpowers and other first-party plugins)

Both `claude plugin validate` runs now pass. Full test suite unchanged:
235 MCP + 16 hook + 4 agent-budget lint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs were blocking the trajectory MCP server from ever loading:

1. Wrong top-level key β€” we had `"servers"`, Claude Code expects `"mcpServers"`
   (matches every official first-party plugin .mcp.json). The wrong key was a
   silent no-op: no error, no prompt, the server just never got spawned, so
   agents could never call issue_create / config_set / identity_set and no
   trajectory.db was ever created. Dogfood proved this out β€” `/mcp` listed
   zero plugin-owned servers.

2. Relative args path β€” `mcp/trajectory-server/dist/index.js` resolved
   against whatever cwd CC happened to be in, which is the scratch project,
   not the plugin clone. Now uses `${CLAUDE_PLUGIN_ROOT}` (the same
   substitution hooks.json now uses) so the launcher always finds the
   built entry regardless of where the user is.

Closes the "why doesn't trajectory.db exist" half of the dogfood thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
File had a node shebang but 0644 permissions, so CC's monitor subsystem
spawned it directly (via `command`) and got exit 126 / permission denied.
Every other plugin script (hooks/*.sh) has the executable bit recorded in
git; the monitor just slipped through.

chmod +x locally and `git update-index --chmod=+x` to bake the executable
bit into the tree so a fresh clone works without manual intervention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n repo

User hit this: ran  in the plugin clone thinking
it was the right place. Harmless (gitignored), but the instruction
didn't say which dir to run it in. Now spells it out and adds a
reassurance note for the plugin-clone case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bstitution caveat

Mode B referenced $PLUGIN_PATH inside a CC slash-command (/plugin marketplace
add $PLUGIN_PATH) but never exported it in its own step list, AND even if
exported CC's slash-commands don't expand shell vars β€” so users would paste
the literal string "$PLUGIN_PATH" and get a broken marketplace add.

Fix:
1. Add an explicit Step 1 that exports PLUGIN_PATH from the plugin repo
   (same pattern Mode A uses).
2. Tell the user to substitute the absolute path into the slash-command
   themselves β€” CC doesn't do shell interpolation inside /plugin commands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User wants to select-and-paste the whole setup at once instead of
hopping between three "Step N" chunks. Keep the two code fences (shell
vs. slash-commands) since they run in different shells, but drop the
stepped framing. Caveat about $PLUGIN_PATH not expanding in slash-commands
is now an inline note between the blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…board)

User's valid gripe: pasting the whole Mode B block means claude launches
and the TUI immediately scrolls the echo output off-screen before you can
copy the path into the slash-command.

Fix:
- Stash the path at /tmp/tmb-plugin-path so it survives the claude launch.
- macOS: also pipe into pbcopy so paste (Cmd-V) just works inside CC.
- Other platforms: mention `!cat /tmp/tmb-plugin-path` which CC interprets
  as an inline shell command β€” surfaces the path without leaving the session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User is tired of the copy-paste trap iterations. The 3-step version is
the lowest-cognitive-load option: copy the pwd output once with your
eyes, paste it twice (shell cd, CC slash-command). No file stash, no
clipboard, no shell-variable caveats inside CC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ritance

Mode B referenced $PLUGIN_PATH inside the `/plugin marketplace add` slash
command but never told the user to set the variable. Fix by adding the same
cd + export steps Mode A has, and a one-liner explaining that CC inherits
the shell environment so the slash command resolves $PLUGIN_PATH cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Our monitor script is a one-shot poller: reads new ledger rows, prints any
failed/escalated/retry-exhausted events, exits 0. CC's monitor subsystem
reports every process exit as "Monitor stream ended" in the chat, which
looks like a failure to users doing a dogfood run. Verified during live
testing today: the MCP server is βœ” connected and responding normally;
the "stream ended" message is pure UX noise from the monitor subsystem.

Since no other first-party plugin ships a monitors.json and `"when":"always"`
has no confirmed meaning in the current CC monitor API, remove the manifest
to stop the noise. Keep the .js script + package.json so we can revive
this as a proper long-running streaming daemon later.

File an issue to reintroduce with a real streaming implementation when the
monitor subsystem's contract is clearer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mode B had one code block with step headers as shell comments, which
reads as one monolithic copy-paste. Split into Step 1 (prose) β†’ bash
block, Step 2 (prose) β†’ bash block, so each step is visually distinct
and independently runnable.
… onboarding

bro is a branding choice, not a user setting. Prompting the Human to rename
their gatekeeper on every fresh project was both (a) extra ceremony on a
moment they just want to get through, and (b) a branding leak β€” half the
projects call the gatekeeper "alex", half call it "bro", and the onboarding
stops being a memorable shared UI.

Changes:
- first-run-onboarding: drop the "what would you like to call me" question.
  Keep the human_name question only. MCP call is now identity_set(human_name=…);
  gatekeeper_name falls through to the schema default of 'bro'.
- tmb-reonboard: symmetric β€” no rename prompt, no gatekeeper_name in the
  current/final summary, no gatekeeper_name in identity_set.
- Explicitly forbid inventing an honorific ("master", "boss", "friend")
  when the Human declines to share a name. Plain second-person is cleanest.
- FLOWS.md mermaid: drop the second name exchange from the onboarding
  sequence diagram.
- local-testing.md: clarify that gatekeeper_name stays at its schema default.

No MCP surface changes β€” identity_set still accepts gatekeeper_name if a
downstream script wants it; we just don't offer it via the onboarding UX.
All 235 MCP + 16 hook + 4 agent-budget lint tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**Rationale (pre-release, no tech-debt concerns):**

The agent was named `gatekeeper` (role) but ran a persona called `bro` (brand).
That split caused recurring confusion:

- `@gatekeeper` was 11 characters to type vs. the 3-character persona the
  user actually saw in conversation (issue #36 Concern 2).
- Every document had to explain the two-name relationship.
- The `identity.gatekeeper_name` column treated bro's name as user-configurable
  branding, which is wrong: bro IS the brand. It's not a knob.
- Onboarding asked the Human "what would you like to call me?" β€” ceremony
  that burned a question on a non-choice.

Since the plugin is pre-release and has no migration concern, a clean
rename is cheaper than dragging two names forever.

**Changes:**

- Renamed `agents/gatekeeper.md` β†’ `agents/bro.md` (mention handle becomes
  `@bro`, which closes the handle half of issue #36).
- Bulk rename `gatekeeper β†’ bro`, `Gatekeeper β†’ Bro` across all prose,
  code identifiers, test names, MCP caller_role values, and KNOWN_ROLES.
- Dropped the `identity.bro_name` column (was `gatekeeper_name`) from the
  schema. `identity_get` / `identity_set` / `identity_reset` no longer
  expose it. Bro is a fixed string in prompts, not a DB field.
- Simplified the onboarding welcome message (`Hey, I'm bro β€” your <role>`
  had become `your bro` after naive rename, a redundancy; replaced with
  a clean self-introduction).

**Files touched:** 42 including agent prompts, every skill, all docs
(CLAUDE.md, README, CONTRIBUTING, ERD, FLOWS, FILES, SCENARIOS,
local-testing, CONFIG_KEYS), MCP server source, middleware, and every
test file that referenced the old name.

**Verification:**
- `grep -ri gatekeeper` in source: 0 hits.
- `grep -ri bro_name` in source: 0 hits.
- 235 MCP + 16 hook + 4 agent-budget lint tests pass.
- Headless smoke test: `claude --plugin-dir … -p "list tmb: agents"`
  returns `tmb:bro`, `tmb:architect`, `tmb:swe`, `tmb:pr-reviewer`.

Closes #36 handle concern (renaming part). Enforcement (routing) stays
open as a separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README tagline pulled up as a blockquote right under the H1 so anyone
  landing on the repo sees it first.
- Onboarding welcome opens with the catchphrase; onboarding closing signs
  off with it β€” the two natural bookends of the first conversation.
- bro's agent prompt gets a short Catchphrase section: deploy at completed
  hand-offs and confident routing calls; never on failures or clarifying
  questions. Swagger, not reassurance.

Catchphrase placement is deliberately thin β€” dilution kills a tagline, so
it shows up at ~2 predictable beats per session instead of being appended
to every response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"Trust me bro, it works" is only self-aware humor if it follows real
verification. Applied to unverified code it just becomes the exact
behavior the meme mocks β€” an AI agent claiming correctness with no
evidence. That undercuts the whole brand bet.

New rule: catchphrase is only earned on a code-delivery hand-off when
BOTH conditions hold:
  1. pr-reviewer recorded validation_record(verdict='pass') for every
     completed task in the issue.
  2. Integration tests (not unit-only, not lint) actually executed
     against the delivered change and passed.

If the project has no integration tests, the slogan is not earned
and bro just says "done".

Onboarding bookends remain as the only no-evidence use β€” there it's
a meta-tagline referencing the plugin itself, not a claim about
shipped code.

Removed the earlier "confident routing call" carve-out: routing
decisions are too soft to back with the slogan. Swagger only after
the tests run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…shback

Adds the third pillar of agent-harness design β€” productive dissent β€”
alongside the two we already implement (responsibility boundaries +
requirements alignment).

Two failure modes this avoids:
- Sycophantic agreement: bro capitulates to anything the Human says,
  plans proceed with known flaws.
- Direct contradiction: bro pushes back personally, triggers the
  defensiveness people naturally feel about their own plans,
  conversation stalls.

The structural fix is to separate intent-capture from intent-evaluation:

**bro.md** β€” added a fifth role bullet. Bro's job is faithful capture;
concerns go into the architect spawn prompt as `concern: <why>`, never
argued with the Human directly. Bro is not a debate partner.

**architect.md** β€” sharpened objective #2. Architect is the technical
check on the Human's plan, with independent authority. Concerns
(bro-forwarded or architect-originated) surface via discussion_append
BEFORE writing tasks. Explicitly: never rubber-stamp a plan the
architect believes is flawed.

The Human now sees two independent reads (theirs + architect's) routed
through a structural channel, not a confrontation. Triangulation, not
conflict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt param

Two compounding bugs meant every bro MCP call during dogfood was narrated,
never executed β€” '0 tool uses Β· 21.7k tokens' across the session:

1. **Subagents had no MCP tools in their registry.** CC plugin agents must
   explicitly list `mcp__<server>__<tool>` names in `tools:` frontmatter.
   Wildcards (`mcp__plugin_tmb_trajectory-server__*`) don't propagate to
   subagents. Tested both: wildcard produces empty tool list, explicit
   names produce the right registry.

2. **MCP inputSchema didn't declare `agent` parameter.** The server's
   requireRoles middleware reads `args['agent']` to enforce role-based
   access. When the LLM sent `{agent: 'bro', ...}`, the schema validator
   stripped it (wasn't declared) β†’ `caller_role: 'unknown'` β†’ forbidden.
   Tests passed because they bypass schema by calling handlers directly.

Fix:

- **agents/bro.md, architect.md, swe.md, pr-reviewer.md**: list all MCP
  tools each agent is actually allowed to call per server-side role policy,
  explicitly. Added one-line MCP Caller Identity section to each prompt
  instructing the LLM to include `agent='<role>'` in every call.
- **skills/{first-run-onboarding, tmb-reonboard, lazy-regen-check,
  project-prescan, branch-id-proposal, refresh-architecture}/SKILL.md**:
  extended `allowed-tools` with the specific MCP tools each skill needs.
- **mcp/trajectory-server/src/tools/index.ts**: new `decorateWithAgent()`
  helper injects `agent: enum[bro|architect|swe|pr-reviewer]` into every
  tool's inputSchema at registration time. One place, applies to all 36
  tools, no per-tool edits.

Verified live:
  identity_set(agent='bro', human_name='Zax') β†’
    {"human_name":"Zax","created_at":"2026-04-24T...","updated_at":"..."}
  identity_get(agent='bro') β†’
    {"human_name":"Zax","created_at":"2026-04-24T...","updated_at":"..."}

This unblocks PR #41's end-to-end dogfood: bro can now actually persist
onboarding state via MCP instead of narrating the calls as prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen and others added 3 commits April 24, 2026 12:05
…-RPC protocol

User feedback caught the real process failure: I had them dogfooding (Layer 3)
without Layer 2 in place, so they burned hours chasing bugs Layer 2 would
have found in milliseconds.

Adds a proper MCP integration layer:

**tests/mcp-integration/harness.mjs** β€” spawns `node dist/index.js` as a
subprocess, speaks the real JSON-RPC stdio protocol via the MCP SDK client,
exposes `listTools()` and `call(name, args)` helpers that return structured
ok/error/data so tests read like production call sites.

**tests/mcp-integration/schema-contract.test.mjs** β€” verifies every tool's
inputSchema declares the `agent` parameter with the four-role enum. This is
the test that would have caught the 18s / 0-tool-uses disaster in PR #41:
the schema validator was stripping `agent` because no tool declared it,
collapsing every call's caller_role to 'unknown'.

**tests/mcp-integration/role-matrix.test.mjs** β€” 10 tests covering every
tool that currently wraps its handler with `requireRoles`:
  - identity_set (bro only)
  - identity_reset (bro only)
  - config_set (bro, architect)
  - file_registry_upsert (architect, bro)
  - file_registry_delete (architect, bro)
  - issue_snapshot_md (architect, pr-reviewer)
  - discussion_append (workflow agents + scope)
  - architecture_regen (architect, bro, pr-reviewer)
  - regen_state_set (architect, bro, pr-reviewer)

Each test verifies three conditions per tool: missing `agent` β†’ forbidden,
wrong `agent` β†’ forbidden, right `agent` β†’ success.

**Wiring:**
- New suite runs from tests/run-all.sh between Layer 1 (unit) and Hook tests.
- `@modelcontextprotocol/sdk` added as root devDependency for the harness.
- CI picks it up via the existing `bun run test` entry point.

**Tool schema fix surfaced while wiring:**
Changed `decorateWithAgent()` property-spread order so the decorator's
`agent` definition takes precedence over tool-specific declarations. Several
tools (issues.ts, tasks.ts) had their own `agent: { type: 'string' }`
without the four-role enum, shadowing the canonical schema.

**Gaps surfaced (filed as separate issue):**
Only 9 of 36 tools currently wrap handlers with `requireRoles`. Tasks,
validations, issues, ledger, audit, and skill tools are all unprotected β€”
any caller including 'unknown' can write through them. The worst offender
is `validation_record`, which controls the push-block hook's unlock
condition. Scoped to a follow-up issue so this PR doesn't spiral.

Full suite now: 235 Layer 1 unit + 10 Layer 2 integration + 16 hook
+ 4 agent-budget lint, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… press-enter defaults

CC interactive input can't send empty input, so every \"press enter to keep
default\" in the skills was broken (only plan-mode accepts bare Enter).
Text multi-choice also forced digit-matching (\"1\" β†’ github-flow) and made
the flow feel 2010.

Switched both onboarding skills to use \`AskUserQuestion\` β€” a proper radio
form with per-option descriptions, a pre-selected default, and an auto-added
\`Other\` for free text.

**first-run-onboarding/SKILL.md**
- One batched \`AskUserQuestion\` call with 3 questions (name, branching,
  PR target) β€” Human answers all in one form, bro persists all via MCP.
- Branching labels lead with structure, brand in parens:
  - \"Trunk + feature branches (GitHub Flow)\"
  - \"Trunk + develop + releases (Git Flow)\"
  - \"Custom workflow\"
  This solves the \"github-flow vs gitflow β€” which is which?\" confusion.
- Name question offers \"Anonymous\" + pre-populated inferred name (if the
  Human introduced themselves inline); Other covers everything else.
- Custom branching path uses a second batched call for the protected-
  branches follow-up.

**tmb-reonboard/SKILL.md**
- Same batched flow. First option in each question is \`Keep \"<current>\"\`
  β€” click once to preserve, or pick an alternative.
- \"Anonymous\" on the name question maps to \`identity_reset\`.
- Dedupe logic for when the current value would duplicate a static option.

**agents/bro.md**
- Added \`AskUserQuestion\` to the \`tools:\` allowlist.

Closes half of #48 (the skills-level rewrite). The naming-clarity half is
captured in the new labels. Integration testing of the rendered UI is a
human dogfood step (Layer 3) β€” automated tests can't render radio UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reviewer happy paths

Filled in the real Layer 2 scope. The prior Layer 2 commit only shipped a
role-permission matrix (does role X work/fail on tool Y); it didn't match
the user's ask of "each agent runs each MCP for their responsibilities".

Four new test files, one per agent role, each exercising that role's
realistic end-to-end MCP sequence against a real spawned server:

**agent-bro-workflow.test.mjs**
- First-run onboarding: empty state β†’ identity_set β†’ config_set x3 β†’
  verify final state via identity_get + config_list.
- Reonboard rename: identity_reset then new identity_set; verify config
  is untouched.
- issue_resume with a known issue_id (session-start resume path).

**agent-architect-workflow.test.mjs**
- Simple-task flow: issue_create β†’ discussion_append (triage note) β†’
  task_create_batch β†’ task_get β†’ validation_history (empty) β†’ issue_close.
- Difficult-task flow: issue + 4-entry ADR-style discussion loop (note,
  question, answer, decision) with issue_get_with_discussions round-trip.
- Skill lifecycle: skill_register (curated draft) β†’ skill_promote
  (draft β†’ pending_review) with explicit from_status.

**agent-swe-workflow.test.mjs**
- Pickup β†’ running β†’ atomic close: task_get β†’ task_update_status(running)
  β†’ ledger_log (full required-arg set: issue_id, branch_id, from_node,
  event_type, summary) β†’ file_registry_upsert (documents current forbidden
  status per #50) β†’ audit_log β†’ task_update_status(completed) β†’ verify.
- validation_history read-access to own task.

**agent-pr-reviewer-workflow.test.mjs**
- Happy path: task_get (completed) β†’ validation_record(pass) β†’ history
  reflects 1 row with verdict=pass.
- Fail path: validation_record(fail) β†’ architect reads same history.
- Retry loop: 3 sequential attempts (fail/fail/pass) β†’ history returns
  all 3 in ascending attempt order.

**Bugs surfaced and fixed while writing the tests:**

- `ledger_log` requires (agent, issue_id, from_node, event_type, summary);
  my first draft only passed event_type+summary+branch_id and got a
  `Missing required arg: issue_id` β€” caught immediately by Layer 2.
- `issue_resume` requires issue_id (not just agent).
- `skill_promote` requires from_status (not just name+to_status).

These are the exact shape mismatches that would have manifested as
"0 tool uses" in interactive dogfood, burning more of your time. Layer 2
caught them in ms.

Coverage:
- Schema contract: 1 test (all 36 tools have `agent` in inputSchema).
- Role matrix: 9 tests (every currently-protected tool).
- Per-agent workflows: 11 tests (4 bro + 3 architect + 2 swe + 3 pr-reviewer
  β€” note: 2 for swe because file_registry_upsert is documented-as-forbidden).

Full suite: 235 Layer 1 unit + 21 Layer 2 integration + 16 hook +
4 agent-budget lint, all green.

Layer 1 decision: the pre-existing `remaining_tools.test.ts` covers audit /
validation / skills / reports with 11 cases β€” enough for adequate. Splitting
into dedicated files is cosmetic refactoring, not new coverage, so it's
deferred. Real Layer 1 gaps (e.g. issue_snapshot_md which writes to disk)
are exercised indirectly by Layer 2's role-matrix test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eam issues are stale

User caught me being wrong. Provided a screenshot showing AskUserQuestion
rendering live in an interactive session β€” the "Scope" chip + 6-option
radio form + Enter-to-select hint are the AskUserQuestion UI, invoked
from a plugin subagent.

My mistake chain:
1. Shipped AskUserQuestion without verifying it works in subagents.
2. Hit the text-fallback in live dogfood, concluded the tool was unavailable.
3. Tested via `claude -p` (headless mode) β€” tool wasn't in the registry.
4. Decided this confirmed the limitation. Shipped revert + filed #52.
5. Searched upstream issues β€” they said "not-planned" back in CC 2.0.56.
6. Doubled down on the revert, closed #52 as BLOCKED.

What I missed: `claude -p` is non-interactive. AskUserQuestion REQUIRES
an interactive UI to render. `-p` stripping the tool wasn't a subagent
limitation β€” it was a UI-mode restriction. And the upstream "not-planned"
state may have been reversed in more recent CC builds (user is on a
current build where it works interactively).

Restoring:
- agents/bro.md tools: AskUserQuestion back.
- agents/architect.md tools: AskUserQuestion added (user's screenshot
  suggests architect is the one rendering it for architecture-scope
  questions, which is the right home for the radio UI on design decisions).
- skills/first-run-onboarding/SKILL.md: back to batched AskUserQuestion
  call with the 3-question form (name, branching, PR target) + Step 3a
  for custom-protected-branches multiSelect follow-up. Kept the hardened
  MCP discipline from the revert: mandatory write sequence enumerated,
  no hallucinating rejections, post-write config_list verify gates the
  closing message.
- skills/tmb-reonboard/SKILL.md: back to batched form with
  `Keep "<current>"` as pre-selected first option.
- tests/manual/scenarios.md Flow 1: back to AskUserQuestion expectations.
- tests/manual/scenarios.md 1.4: reonboard back to AskUserQuestion.

The discipline fixes from the previous "revert" commit (mandatory write
sequence + config_list verify) are preserved β€” those are UI-independent
and were the RIGHT fix for the hallucinated-rejection bug we saw.

Tests green: 235 unit + 21 integration + 16 hook + 4 lint.

Next: reopen #52 with the corrected status β€” works interactively,
documented `-p` limitation as separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen and others added 23 commits April 24, 2026 14:04
…n_append

User confirmed AskUserQuestion works live in plugin subagents (screenshot
of architect rendering a scope-narrowing radio form for "I want to make
social media platform"). The discussion-form UX is the design intent,
and every Q/A round must persist to the discussions table so the
trajectory stays replayable.

Implementing the pattern:

**skills/architect-workflow/SKILL.md** β€” new "Interactive Alignment"
subsection under Discussion Phase. Describes:
- When to use AskUserQuestion (enumerable answers, 2-4 options, scope/
  tech/priority/approach)
- When to fall back to text discussion_append (open-ended "what constraints
  do you have" questions)
- The mandatory dual-persist pattern β€” every AskUserQuestion round
  followed by TWO discussion_append calls (kind='question' + kind='answer')
  in chronological order, using the existing valid kinds
- Guidance on batching (independent answers β†’ one AskUserQuestion call
  with up to 4 questions; dependent answers β†’ sequential)
- Closing the discussion with kind='decision' once aligned

This gives us the three-mode UX the user specified:
- simple β†’ bro β†’ architect β†’ swe (no discussion form; trivial template)
- difficult β†’ bro + architect with AskUserQuestion alignment loop β†’ swe
  (persists to discussions, renders in issue_report_md / issue_snapshot_md)
- out-of-scope β†’ roundtable (tracked as #53 for AskUserQuestion
  extension into flow 9)

**docs/architecture/FLOWS.md** β€” flow 3 mermaid sequence diagram updated
to show the AskUserQuestion loop + dual discussion_append per round.
Notes section explicitly documents the replayability guarantee: every
alignment decision lands in discussions as a Q+A pair so future sessions
can reconstruct the reasoning via issue_report_md.

No schema changes β€” 'question' and 'answer' are already valid kinds in
ALLOWED_KINDS (verified in tools/discussions.ts).

Tests: 235 unit + 21 integration + 16 hook + 4 lint β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…text

Live dogfood found two related bugs, same root cause β€” load-bearing Human
questions were being silently skipped:

**Bug 1: Onboarding auto-filled without showing the radio form.**

Bro wrote identity (Zax) + branching (github-flow) + pr_target (main) +
protected_branches (main) in 11 seconds with ZERO AskUserQuestion render.
Root cause: CC's environment context leaked `# userEmail zax.shen@gmail.com`
into bro's context, and auto-mode's "minimize interruptions" nudged bro's
LLM to write the inferred defaults directly. The Human never got to
confirm or change.

Fix in skills/first-run-onboarding/SKILL.md: explicit "Hard rule β€” always
render the form" section. AskUserQuestion is NOT skippable when:
- Environment leaked user info (inferred β‰  confirmed)
- Auto mode encourages action (onboarding isn't routine)
- The LLM thinks it knows the answer (branching is policy, not guess)

**Bug 2: Architect wrote `kind='decision'` without asking any question.**

User asked "build a Python CLI todo". Architect triaged as simple (no
docs/trustmybot/architecture/ impact), skipped the alignment loop per the
current "simple β†’ straight to tasks" rule, and unilaterally chose
stdlib-only + argparse + JSON persistence. ZERO kind='question' rows in
discussions. User correctly pointed out: "build a CLI todo" has massive
scope ambiguity β€” storage backend, lib choice, command surface, feature
scope β€” all of which the Human should pick, not the architect.

Fix in skills/architect-workflow/SKILL.md: add a "Scope-ambiguity gate"
inside Discussion Phase. Triggers in BOTH simple and difficult triage.
Before any kind='decision' row, every plan choice must be either (a)
explicitly answered by the Human via the dual-discussion_append pattern,
or (b) a choice the Human plausibly wouldn't care about. Auto-mode does
not waive the gate.

The gate enumerates categories that ALWAYS need explicit Human input:
storage backend, lib choice, command surface, feature scope, persistence
location. Architect can still decide implementation details (internal
variable names, stylistic choices) β€” just not interface-shaping ones.

Net effect: simple triage no longer means "architect auto-decides the
whole plan". Simple means "fewer/smaller tasks" (trivial template); the
alignment loop still runs whenever scope is ambiguous.

Tests: 235 unit + 21 integration + 16 hook + 4 lint β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cal reality

User: "What I expected is to check local python env first, then ask which
env management tools, etc. in the discussion form."

Current architect workflow said "explore the codebase before asking
questions", but didn't probe the ENVIRONMENT β€” which tools are installed,
which language versions are available, which project files already exist.
Result: architect would offer generic menus like "uv vs poetry vs pip"
even when only one is installed locally.

Fix: new "Environment Probe" step in architect-workflow's Discussion
Phase. Runs BEFORE the AskUserQuestion radio form. Uses Bash (read-only)
to detect:

- Language versions (python3, node, go, rustc)
- Python package managers (uv, poetry, pipenv, pip)
- Python project files (pyproject.toml, requirements.txt, setup.py, Pipfile)
- Node ecosystem (bun, pnpm, npm) + lockfiles
- Linters/formatters (ruff, black, biome, eslint)
- Test runners (pytest, vitest)
- Git state (remote, branch)

The probe informs the radio options β€” every option must be grounded in
what's actually detected. Explicit rules:
- Never offer an option that can't execute on the local machine.
- Never mark a tool "(Recommended)" unless it's detected AND fits the task.
- If .python-version exists β†’ skip Python version question.
- If pyproject.toml exists β†’ ask "reuse / new alongside / scrap & restart"
  instead of generic "what layout".

Example mapping table in the skill shows: "uv detected + Python β‰₯ 3.11"
β†’ radio form with `uv (detected, v0.5.x) (Recommended)` / pip+venv /
poetry. "No package manager" β†’ "Install uv / use pip / I'll handle it".

Probe findings persisted as a discussion_append(kind='note') row so
future sessions replay the environment context. Also updated the
Discussion Phase step list β€” probe is step 2, codebase exploration is
step 3, radio form is step 4.

Net effect: the discussion form the user envisioned β€” grounded,
machine-specific, not a generic decision tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Question contract

Live dogfood: bro rendered text questions ("1. Name to call you? (e.g.
Zax or Anonymous)") instead of AskUserQuestion, AND leaked the user's
inferred name ("Zax", from CC's environment-level userEmail) into the
question text as an example.

Two tightenings in first-run-onboarding/SKILL.md:

**Hard rule: AskUserQuestion mandatory, no text fallback.**

Previous wording said "AskUserQuestion is mandatory" but bro's LLM still
fell back to text. The skill now explicitly calls the text fallback a
contract violation and prescribes the exact recovery path when the tool
errors: read error β†’ retry once β†’ if still erroring, surface verbatim
to the Human and ask how to proceed. NO silent text fallback.

Added an explicit reason list for why NOT to skip the tool call:
- Env context leaked identity (inferred β‰  confirmed)
- Auto mode says "minimize interruptions" (onboarding IS the trust-model setup)
- LLM thinks it knows answers (branching is policy, not a guess)
- Text seems faster (it violates the scenarios contract)

**Context-leak rule: never surface inferred identity.**

CC subagents inherit the user's email via # userEmail env lines. The
skill now explicitly forbids:
- Putting an inferred name in the question text (no more "(e.g. Zax)")
- Pre-populating `Use "Zax"` based on email-derived guesses
- Writing identity_set('Zax') from email inference alone

The only case where pre-populating a `Use "<name>"` option is allowed:
the Human typed their name IN THIS interactive session (e.g. "I'm Alice"
in a prior turn). Env context alone doesn't count β€” it's external
metadata the Human hasn't confirmed.

Also updated the AskUserQuestion code comment in Step 1 to make this
rule visible at the option-construction site.

The radio form is where consent lives. Until the Human picks, we don't
know their name β€” regardless of what CC told us.

Tests: 235 unit + 21 integration + 16 hook + 4 lint β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User correctly called me out for asking them to test in interactive CC
sessions for things that should have been automated. The concerns are:

- Does the skill say AskUserQuestion is mandatory?
- Does the skill forbid text fallback?
- Does the skill warn against env-leak of inferred identity?
- Are labels β†’ canonical mappings complete?
- Are required MCP writes enumerated?
- Are architect's alignment + scope-gate + env-probe sections present?

All of the above are prompt-content assertions β€” zero LLM required.
Running them caught a real bug immediately: the skill had
`(e.g. "Zax"` in an example, which is exactly the pattern the rule
forbids (teaching the LLM to echo inferred identity). Fixed.

Added tests/lint/onboarding-skill-contract.sh:
- 13 positive assertions on first-run-onboarding (mandatory tool,
  text-fallback forbidden, env-leak rule, canonical mappings, MCP
  writes, never-narrate-rejection, in-session pre-populate gate)
- 2 negative assertions (no "e.g. Zax" literal in either escaped or
  unescaped form β€” prevents regression of the bug we just found)
- 3 assertions on tmb-reonboard (AskUserQuestion, Keep pattern,
  identity_reset path)
- 9 assertions on architect-workflow (AskUserQuestion, discussion_append
  persistence, kind=question + kind=answer, scope-ambiguity gate,
  environment probe, probe citations of uv + pyproject.toml, discipline
  rule against ghost options)

Wired into tests/run-all.sh after the agent-budget lint. Runs in ~50ms.

What this CANNOT test: whether bro's LLM actually chooses to call
AskUserQuestion at runtime. That's genuine Layer 3 (LLM judgment,
non-deterministic). But the static contract at least prevents the
category of regression where the skill prompt itself degrades.

Full suite: 235 Layer 1 unit + 21 Layer 2 integration + 16 hook
+ 4 agent-budget + 27 onboarding contract β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…estion, scope-gate bypass

Four bugs surfaced in user's interactive dogfood. Solving all.

**Bug 1: issue_create returned a ghost `issue_string_id` field.**

Main Claude saw `{id: 1, issue_string_id: 'iss_modfdaia_5466e041'}` and
used the UUID for subsequent discussion_append calls, which silently
failed (no matching row). Retried with `1` which worked. Net effect:
2 ghost MCP writes, user-visible confusion about which id is canonical.

Fix in mcp/trajectory-server/src/tools/issues.ts: remove the genId('iss')
line and the issue_string_id in the return. issue_create now returns the
redacted row with a single `id` field. genId import dropped.

Layer 2 test added: issue_create returns a single id (no issue_string_id
ghost field) β€” prevents regression.

**Bug 2: AskUserQuestion doesn't work in plugin subagents (re-confirmed).**

User's dogfood: main Claude narrated "AskUserQuestion isn't available in
subagents" and took over the form at top level. This contradicts my
earlier un-revert based on a single screenshot (which I now believe was
main Claude rendering, not a subagent).

Verdict: AskUserQuestion is main-Claude-only. For bro, main Claude
hoists the form (proven to work in onboarding). For architect mid-flow,
main Claude does NOT hoist β€” architect is on its own.

Fix in skills/architect-workflow/SKILL.md: Interactive Alignment section
reverted from AskUserQuestion pattern to **text-based Q+A** with the same
dual discussion_append persistence. Architect now emits numbered text
questions, waits for reply, persists Q+A verbatim.

**Bug 3: Scope-ambiguity gate silently bypassed by auto-mode.**

Live DB showed: discussions had entries 1-4 (intent/note/note/note-probe)
then row 5 was `kind='decision'` with body containing literal phrase
"auto-mode defaults, zero-dep". ZERO `kind='question'` rows β€” architect
unilaterally chose Python stdlib + argparse + JSON + etc.

Fix in skills/architect-workflow/SKILL.md: Scope-ambiguity gate
upgraded to HARD RULE. New contents:
- Marked HARD RULE in the heading itself
- Explicit "Auto-mode does NOT waive this gate" statement
- Specific trigger phrase: "If your response body would include phrases
  like 'auto-mode defaults' or 'defaulting to X since you didn't
  specify' β€” STOP. Ask first."
- Full worked example of correct sequence (8-row discussions table)
- Full worked example of violation (what the user just saw) marked RED FLAG
- Self-review instruction: before task_create_batch, re-read
  discussion_list; if you see a decision without a preceding question,
  you violated the gate.

**Bug 4: No test catches Bug 3 at CI time.**

Added lint coverage for the HARD-RULE discipline:
- `tests/lint/onboarding-skill-contract.sh` extended with 5 new checks:
  HARD RULE marker, "Auto-mode does NOT waive", "auto-mode defaults"
  call-out, RED FLAG violation example, text-Q requirement, and
  negative check that no text instructs architect to CALL
  AskUserQuestion (a subagent-blocked call).

What this DOESN'T catch: whether architect's LLM actually honors the
gate at runtime. That's genuine Layer 3 β€” flaky by nature, document-able
via scenarios.md. But the static contract regression class is closed.

Also added Layer 2 test: discussion_append chronology test. Verifies
schema-level chronological consistency (answer rows have preceding
question rows) so at least the protocol-level invariant is enforced.

All automated tests green: 235 Layer 1 unit + 23 Layer 2 integration
(two new) + 16 hook + 4 agent-budget + 32 lint assertions (5 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t-blocked

User dogfood again surfaced bro stuck at the onboarding form: AskUserQuestion
fails for subagents (per anthropics/claude-code#12890), so my last commit
saying "surface error to Human and ask how to proceed" caused main Claude
to render an unhelpful "abort / answer the 3 / take defaults" menu.

Reverting bro to the same text-Q+A pattern architect now uses. Consistent,
reliable, no subagent-AskUserQuestion dependency:

**skills/first-run-onboarding/SKILL.md** β€” full rewrite of the question UI:
- Removed AskUserQuestion code block + Map-labels-to-canonical table.
- Added explicit "MUST NOT call AskUserQuestion" prohibition with link to
  the upstream issue.
- Step 1 now emits welcome + 3 numbered questions in one chat message:
    1. What should I call you? Reply with a name or `anonymous`.
    2. How does your team branch? Reply with 1/2/3 (canonical labels).
    3. What's your PR target branch? Reply with `main`/`master`/`develop`/
       `trunk`/other.
- Step 2 parses the free-text reply against canonical map.
- Step 3a (custom branching) replaced with a CSV reply for protected.

**skills/tmb-reonboard/SKILL.md** β€” symmetric rewrite:
- Removed AskUserQuestion batched-question form.
- Step 2 lists current values + asks "which to change?" β€” reply chooses
  one of: name / branching / pr-target / protected / all / none.
- Each sub-question accepts the new value or `keep` sentinel.

**tests/lint/onboarding-skill-contract.sh** β€” updated to match the new
text-Q reality:
- first-run-onboarding: replaced "AskUserQuestion mandatory" + "no text
  fallback" assertions with "MUST NOT call AskUserQuestion" prohibition
  + "anonymous" + "comma-separated" + canonical-mapping checks.
- tmb-reonboard: same β€” checks for NEVER list, keep sentinel,
  identity_reset path.
- Removed assertions about Keep "<current>" pattern (was AUQ-specific)
  and IN-THIS-session pre-populate gate (was AUQ-specific).

The two remaining intentional AskUserQuestion mentions:
1. first-run-onboarding line: "MUST NOT call AskUserQuestion" β€” load-bearing prohibition.
2. tmb-reonboard NEVER list β€” same prohibition.

Tests: 235 unit + 23 integration + 16 hook + 4 lint + 30 contract β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported on retry: "Still no form!" after 232a90b switched bro to
text-only Q+A.

Re-examining the earlier working run (commit 036c81f era): bro's skill
*did* call AskUserQuestion, bro subagent couldn't execute it, and main
Claude transparently hoisted the form at its own level β€” rendering the
radio UI, collecting answers, passing them to bro which then persisted
via MCP. The user's "Onboarding works" report came from that flow.

My 232a90b commit killed the signal: by removing AskUserQuestion from
bro's skill entirely, main Claude had no reason to intercept and
rendered the text as-is. Net result: worse UX.

Un-reverting the skill changes from 232a90b restores the signal:
- first-run-onboarding/SKILL.md back to batched AskUserQuestion call
  with Hard Rule + context-leak protection
- tmb-reonboard/SKILL.md back to Keep-"<current>" first option pattern
- tests/lint/onboarding-skill-contract.sh back to AskUserQuestion
  assertions

Architect keeps the text-Q+A pattern (from 98e0f23) because main Claude
does NOT hoist for deep-spawned subagents the same way. This is asymmetric
but matches observed behavior.

What I learned the hard way: "AskUserQuestion doesn't work in subagents"
is technically true (per anthropics/claude-code#12890) but functionally
incomplete β€” main Claude compensates for direct-@-mention subagents like
bro. The skill should keep the aspirational AskUserQuestion call as the
signal main Claude uses to know "this agent needs interactive UI".

Four revert-cycles on this issue. Locking in the current state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User clarification: "bro should be CC's persona. After first @tmb:bro then
CC have the persona. Why not subagent? Because subagent has no enough
permissions."

Right. Subagents lack AskUserQuestion (anthropics/claude-code#12890) and
hit other permission gates. bro needs those for onboarding + alignment
with the Human. So bro IS main Claude; architect/swe/pr-reviewer stay
as true subagents with constrained tool surfaces.

Changes:

- **agents/bro.md** β€” DELETED. bro's prompt folded into plugin/CLAUDE.md
  under a "You are bro" section. Main Claude loads CLAUDE.md at session
  start and adopts bro's persona automatically.

- **plugin/CLAUDE.md** β€” rewritten:
  - Lead with the top/bottom split: "You ARE bro" for main Claude;
    subagent prompts override when architect/swe/pr-reviewer spawn.
  - Full bro persona inline: MCP caller identity, chain-of-thought,
    role, identity+onboarding, catchphrase, session-start chain,
    routing table, no-auto-action discipline, agent-creator flow,
    mode rules, communication style.
  - "Subagent roster" section after bro's persona β€” the three true
    subagents (architect/swe/pr-reviewer), their models, their
    override-by-project rules.
  - "Tool availability contract" explicitly differentiates: bro has
    full CC toolkit including AskUserQuestion; subagents don't.
  - Updated Decision flow to show "bro = main Claude".
  - Preserved workspace boundary + workflow files + source access
    control sections.

- **tests/manual/scenarios.md** β€” dropped `@tmb:bro` trigger prompts in
  favor of plain `hello` / `switch to gitflow`. Notes that bro IS main
  Claude, loaded via plugin/CLAUDE.md.

- **agent-budget lint** β€” automatically picks up bro.md removal (uses
  a find-glob, no manual list update needed).

No changes to subagent MCP role enforcement (architect/swe/pr-reviewer
all still declare `agent='bro'` in their tool calls when relevant β€”
that's fine, they're reporting their caller context). bro's role name
remains 'bro' in requireRoles(['bro']) checks.

Net side-effects, which together close open issues in principle:
- #36 (routing enforcement): bro IS main Claude β†’ every Human message
  routes through bro by architecture, not by convention.
- #46 (shorten @-handle): no @-mention needed; user types naturally.
- #47 (bro first-response latency): no subagent spawn overhead; bro
  responds as fast as main Claude normally does.

Tests all green: 235 Layer 1 + 23 Layer 2 + 16 hook + 3 agent-budget
(bro removed) + 30 contract lint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User concern: with bro as main Claude, users can still type @tmb:architect
or @tmb:pr-reviewer and bypass bro's triage / onboarding / branch-id flow.
swe already had a MANDATORY FIRST ACTION that rejects spawns without
task_id=<N>. Extending the same discipline to architect and pr-reviewer
so no subagent is reachable directly by the Human.

Rejection heuristic per agent β€” detect bro-originated spawn via routing
markers, reject otherwise:

- architect: requires at least one of `triage: simple|difficult`,
  `issue_id=`, `branch_id=`, or `concern:` in the spawn prompt. bro
  always includes at least `triage:` and usually a `branch_id` from the
  branch-id-proposal skill. Direct @-mention by Human has none.
- pr-reviewer: requires at least one of `task_id=<N>`, `issue_id=<N>`,
  or a bro-routed review-request marker. architect always spawns
  pr-reviewer with task_id after SWE completes.
- swe: already had this via `task_id=<N>` scan. Unchanged.

On rejection, each agent outputs a structured REJECTED line explaining
that they're subagents, not Human entry points, and pointing the Human
back to bro (just type the request β€” no @-mention needed).

Tightened the pr-reviewer overview block to recover the line budget
(200-line cap per agent) after adding the rejection section.

Tests: 235 Layer 1 + 23 Layer 2 + 16 hook + 3 agent-budget + 30 contract
β€” all green.

Addresses your concern #1: users can't accidentally or deliberately
bypass bro by @-mentioning subagents. All three now auto-reject direct
invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User's refined design: plugin enabled β‰  bro active. Main Claude should
stay as normal Claude Code until the Human addresses "bro". Once
triggered, stays in bro mode for the session.

Why this is better than the auto-persona model:

- "what's 2+2?" β†’ main Claude answers directly. No onboarding overhead,
  no pre-scan, no MCP call pollution.
- "bro, write a CLI todo" β†’ triggers full bro machinery. Onboarding (if
  first time), routing, subagent spawning, MCP state.
- "bro" is a natural vocative β€” humans address TMB by its persona name.
- Plugin sits dormant until invoked; backwards-compatible with regular
  CC workflows in the same session.

Changes:

- plugin/CLAUDE.md β€” rewrote the opening section:
  * New "Persona activation rule": listen for "bro" as trigger word in
    any form (vocative, @-mention, subject). Until triggered β†’ act as
    normal main Claude. After β†’ adopt persona for the rest of session.
  * "Subagent prompt precedence" section separated so architect/swe/
    pr-reviewer's own prompts still win when spawned via Task.
  * Kept all bro persona content below β€” unchanged, only the activation
    semantics flipped.

- tests/manual/scenarios.md β€” updated Flow 1 trigger prompts:
  * `bro hello` for fresh onboarding (1.1) β€” first mention of "bro"
    triggers the persona and onboarding in one shot.
  * `bro, switch to gitflow` for reonboard (1.4) β€” continuing in bro
    mode, addressing bro again is optional but natural.

No subagent changes (architect/swe/pr-reviewer still reject direct
Human invocation per 5ed659b).

Tests still green: 235 + 23 + 16 + 3 + 30.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Other-synonyms

User saw this live-rendered radio form:
  1. Anonymous
  2. I'll enter it  ← redundant; just redirects to Other
  3. Type something.

Two bugs:

1. **Placeholder options redirecting to Other.** AskUserQuestion
   auto-adds `Other` for free-text entry. Options like "I'll enter it"
   or "Type something else" are UI-confusing synonyms of Other and
   shouldn't be added by the skill. Main Claude was inventing them
   because the skill didn't explicitly forbid the pattern.

2. **Missing git-config identity source.** Only "Anonymous" + Other
   were offered. The local `git config --get user.name` is a perfect
   identity source the user already set themselves β€” should be
   pre-populated as option 2 when available.

Fixes in skills/first-run-onboarding/SKILL.md:

- New **Step 0: Probe local identity sources** β€” runs `git config
  --get user.name` before rendering the form. Caches result as
  `git_user_name`.
- **Step 1 AskUserQuestion options** β€” conditional second option:
  if `git_user_name` is non-empty, append `Use "<name>"` with
  description "Detected from git config user.name". Explicit comment
  forbids placeholder synonym options for Other.
- **Step 2 label mapping** β€” updated to parse `Use "<name>"` back
  into the raw name for identity_set.
- **Context-leak rule** β€” rewrote to distinguish allowed source
  (git config) from forbidden sources (CC `# userEmail`, $USER,
  whoami, filesystem paths, LDAP/SSO).

Lint updates:
- Require `git config --get user.name` and `Detected from git config`
  literals in the skill.
- Negative assertions: no "I'll enter it", no "Type something else"
  literals in the skill prompt (removed from guard language, rewrote
  the DO-NOT guidance without using those exact phrases).

Tests: 235 unit + 23 integration + 16 hook + 3 agent-budget +
33 contract (5 new) β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ym placeholders

User saw this rendered live:
  1. Yes, proceed (Recommended)
  2. Use simple triage instead β€” Skip architect, let bro/SWE implement directly
  3. Suggest different branch_id
  4. Type something.

Two bugs:

1. **Option 2 says "Skip architect"** β€” directly contradicts bro's
   "No bypass. Every code change routes through architect" rule. The
   triage label is a TEMPLATE-DEPTH knob, not an architect-bypass switch.
   Main Claude was inventing this option because the skill didn't have
   an explicit AskUserQuestion specification.

2. **Option 4 "Type something"** β€” same redundant Other-synonym we
   just fixed in first-run-onboarding. AskUserQuestion auto-adds Other
   for free text; placeholder synonyms confuse the UI.

Fix in skills/branch-id-proposal/SKILL.md:

- Added explicit `AskUserQuestion` block specifying the EXACT option
  set. The skill no longer leaves the radio form to LLM improvisation.
- Conditional options: Downgrade-to-simple (when current is difficult)
  OR Upgrade-to-difficult (when current is simple), mutually exclusive.
  Both still go through architect β€” they just adjust template depth.
- "Suggest different branch_id" routes to auto-Other for free text;
  validated against the existing regex; reject + re-ask on mismatch.
- Hard-rule callout: "No Skip-architect option. Architect is mandatory
  for every code change per bro's routing contract."
- Hard-rule callout: "Never add a placeholder option that just
  redirects to Other."
- Added AskUserQuestion to the skill's allowed-tools frontmatter
  (was missing β€” that's why bro was improvising rather than calling
  the tool with a strict spec).
- Handling-the-answer table makes the persist+spawn logic explicit
  per selection.

Lint: 6 new assertions on branch-id-proposal/SKILL.md:
- references AskUserQuestion (positive)
- contains explicit "No Skip architect" prohibition (positive)
- offers Downgrade-to-simple AND Upgrade-to-difficult (positive)
- forbids "Skip architect β€” let" wording (negative)
- forbids "Type something." placeholder literal (negative)

Tests: 235 unit + 23 integration + 16 hook + 3 agent-budget +
33+6 contract = 39 lint β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mode

Live dogfood: user typed "bro, write a todo cli" β€” main Claude wrote
todo.py directly without entering bro mode. No onboarding check, no
triage, no architect spawn, no MCP writes. The trigger word was
ignored.

Two probable causes:

1. CLAUDE.md changes don't take effect via /reload-plugins. CLAUDE.md
   is loaded into the session's system prompt at session START; mid-
   session reload may not pick it up. User needs a fresh session.

2. The previous trigger language was descriptive ("adopt the persona")
   rather than prescriptive ("YOU MUST"). Main Claude treated "bro" as
   casual addressing instead of a hard activation token.

Restructured CLAUDE.md to make the trigger rule impossible to miss:

- Top-of-file H1: "TMB PLUGIN β€” TRIGGER RULE (READ FIRST)" + subhead
  "YOU MUST FOLLOW THIS RULE BEFORE RESPONDING TO ANY USER MESSAGE".
  No room for the LLM to drift into "this is just guidance".

- Concrete examples of trigger phrases (the exact patterns we expect
  to see, including "bro, write a todo cli" β€” the live one that
  failed).

- 3-step protocol: announce "Entering bro mode." β†’ adopt persona β†’
  stay in mode for the session. The announcement makes activation
  observable to the Human; missing it surfaces the bug immediately.

- Hard contract section: "If you see 'bro' in a message and respond
  by directly writing source code (Write/Edit on todo.py, *.ts, *.py,
  etc.), you have violated the contract." Names the exact failure
  mode the Human just witnessed.

- Default-trigger rule: "If you're not sure whether 'bro' was a
  trigger or casual usage: assume trigger." False positives cost
  one extra MCP call; false negatives bypass the whole workflow.

Note for the user: CLAUDE.md changes need a fresh session to take
effect (not /reload-plugins). The next test should restart claude
fully before retrying "bro, write a todo cli".

Tests: all green (lint doesn't enforce CLAUDE.md content yet β€” out
of scope for this commit; could add later).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User clarification:
- @bro is what users should learn (documented).
- Bare-word "bro, do X" still works (graceful fallback, undocumented).
- @bro MUST trigger reliably.

Changes:

**README.md** β€” added a "How to use it" section right after Install that
shows the exact entry point: `@bro write a todo cli`. Explains that
addressing bro activates the persona for the rest of the session. First
trigger runs onboarding; subsequent code-touching asks route through the
full workflow; casual asks are handled by regular Claude Code without
TMB taking over.

Also rewrote the agent roster section:
- Stale "two agents globally + five editable placeholders" claim removed
  (left over from the v0.2 design).
- New table makes the persona/subagent split explicit: bro = main Claude
  persona; architect/swe/pr-reviewer = constrained subagents spawned
  via Task tool.
- Notes that subagents auto-reject direct @-mentions β€” talk to @bro,
  bro routes.

**plugin/CLAUDE.md** β€” restructured trigger-rule examples to lead with
the canonical `@bro` form. Bare-word and addressed forms ("bro,...",
"hey bro") kept as supported fallbacks but explicitly labeled
"undocumented". This way:
- Users who learn from README only know @bro β†’ the simplest mental model.
- Users who type "bro" naturally still get the persona β€” no failure mode.
- LLM gets a clear mental hierarchy: canonical example first, fallbacks
  after, all triggering the same activation.

No code or test changes β€” this is doc-only refinement of how the
trigger is presented to humans vs. honored by the LLM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User: "Shorten CLAUDE.md if this is in actual the prompt for bro only.
It has many optional content like ceo, cto, etc. Keep bro focus on its
job. Only let it handle the optional work when needed (we have skills,
rules)."

CLAUDE.md is loaded into main Claude's system prompt every triggered
session β€” every byte costs context. The previous version carried
runtime instructions AND extensive documentation in the same file.
Splitting:

**KEPT (essential bro runtime):**
- Trigger rule (top, mandatory) β€” unchanged
- Subagent precedence (1 line) β€” unchanged
- Role statement (concise)
- MCP caller identity rule
- First-action chain (3 steps: identity_get + config_get β†’ cache name
  β†’ issue_resume)
- Code-touching skill chain (lazy-regen-check β†’ project-prescan β†’
  triage β†’ branch-id-proposal β†’ architect spawn) β€” names skills,
  defers detail to skill files
- Direct ops list
- Routing (3 destinations: architect, pr-reviewer, agent-creator)
- Concerns-escalate rule
- Catchphrase rule (compressed to one paragraph)
- Communication style (compressed to one line)

**REMOVED (moved to skills / docs / subagent prompts):**
- Workspace boundary section (TMB-internal contributor info β€” belongs
  in CONTRIBUTING.md, not bro's runtime)
- Full workflow-files table (concise version retained at the bottom;
  full table now references mcp/.../CONFIG_KEYS.md and FLOWS.md)
- Persistence MCP details (already in mcp/trajectory-server/README.md)
- Source code access control section (already enforced by hook scripts
  and called out in architect.md / swe.md prompts)
- Full mode-rules section (covered by individual skills)
- Detailed no-auto-action discipline (covered by individual skills +
  bro's role statement)
- Full agent-creator flow protocol (lives in agent-creator skill)
- Per-Human-request routing table (compressed to 3 destinations + the
  agent-creator fallback)
- Full session-start triage heuristic block (compressed to 1 sentence)

**Reference pointers retained:**
- Subagent roster (concise 4-row table)
- Where state lives (5-bullet summary, references CONFIG_KEYS.md and
  FLOWS.md for detail)
- Code style (3 bullets)

The skills referenced from CLAUDE.md (first-run-onboarding,
tmb-reonboard, lazy-regen-check, project-prescan, branch-id-proposal,
refresh-architecture, agent-creator) all already exist and carry the
detail. CLAUDE.md just names them.

bro's prompt now is laser-focused on what bro does at runtime, not on
educating the LLM about the plugin's architecture.

Tests: all green. Trigger rule + subagent precedence intact; lint
unaffected (it doesn't enforce CLAUDE.md content).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User: "update swe to opus and I will manual test the workflow again".

agents/swe.md: model: sonnet β†’ opus
CLAUDE.md: subagent roster table updated to reflect new model.

Other models unchanged: architect (opus), pr-reviewer (opus). Only
remaining sonnet reference is in agent-creator/SKILL.md as the
DEFAULT suggestion for user-created domain agents (use opus only for
reasoning-heavy roles) β€” that's not the four core subagents and stays.

Tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live dogfood (8m ago): architect skipped the alignment loop AGAIN under
auto-mode, wrote kind='decision' + task_create_batch with zero
kind='question' rows. Same bug as 17824e9 + 419a1fa β€” prompt-level
discipline isn't enough.

Structural fix: move enforcement from the skill prompt to the MCP handler
where the LLM literally cannot bypass it.

**mcp/trajectory-server/src/tools/tasks.ts**:
- task_create_batch now queries discussions for the issue before inserting.
  If zero kind='question' rows exist AND no waiver is set, returns
  `{error: 'scope_gate_violation'}` with a clear explanation and
  `questions_found: 0`.
- New optional args: `waive_scope_gate: boolean` + `waive_scope_gate_reason:
  string` (min 10 chars). Lets architect explicitly bypass for truly
  trivial scope (typo fix, one-line doc change).
- Waiver reason is logged to the ledger as `scope_gate_waived` event
  for audit β€” pr-reviewer / Human reviewers can grep ledger for
  inappropriate waivers.
- inputSchema updated to document both waiver fields with guidance on
  when they're acceptable.

**skills/architect-workflow/SKILL.md**: updated the Scope-ambiguity gate
section to describe the structural enforcement + the waiver mechanism.
Notes: "Default to not waiving."

**Existing tests updated**: added the waiver to every existing
task_create_batch call in test fixtures (27 call sites in tasks.test.ts
+ 9 others across unit + integration test files). One perl pass via
`-0777 -pe` regex to inject `waive_scope_gate: true,
waive_scope_gate_reason: 'unit-test synthetic scope; gate not under
test'` right at the start of each args object. All 236 existing unit +
previous integration tests stay green.

**New Layer 2 scope-gate tests (4 cases)** in
tests/mcp-integration/scope-gate.test.mjs:
- rejects when 0 questions + no waiver (verifies the structural check)
- accepts when kind='question' row exists
- accepts when waiver + reason β‰₯10 chars
- rejects waiver with missing OR too-short reason

Now the live dogfood pattern that was bypassing the gate will fail at
the MCP layer: task_create_batch returns scope_gate_violation, architect
cannot create tasks, SWE cannot be spawned β€” forcing architect back to
the alignment loop or explicit waiver.

Tests: 236 unit + 27 integration (+4 new) + 16 hook + 4 budget + 39
contract lint β€” all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minor: the previous commit (140a355) rewrote the scope-gate section to
describe the MCP-level enforcement but dropped the 'HARD RULE' literal
the contract lint checks for. Restoring.

All tests green post-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t eager-skills

Benchmark: pure Claude solves 'write a todo cli by python' in 32 seconds.
TMB's bro→architect→swe chain on the SAME task took 5+ minutes and hung
uninterruptibly (#55). Architect emitted a 55k-token spec; SWE cold-start
to load that spec was the tail end of the hang.

Three structural fixes, all ship in this commit:

**1. Hard-cap spec_body at 8000 chars (mcp/trajectory-server/src/tools/tasks.ts)**

Was 64000. Reality: a spec longer than ~8k is almost always a sign the
task should be split. Over-long specs force SWE to spend its context
budget reading instead of coding. Architect gets a clear error citing
#55 and telling it to split or cite-don't-restate.

This is MCP-level, not prompt-level β€” architect cannot talk its way
around the cap. If a spec genuinely needs more, split into N tasks
via parent_branch_id dependency.

**2. Revert swe to sonnet (agents/swe.md, CLAUDE.md)**

Undoes commit e2ce240. Opus on swe was a quality bet that came at a
2-3x latency cost at SWE's cold-start. For the 90% of tasks that are
"implement this spec" mechanics, sonnet is indistinguishable from
opus in output. Architect stays on opus (planning + alignment), pr-
reviewer stays on opus (review depth). SWE back to sonnet.

**3. Trim architect's eager skill-load list (agents/architect.md)**

Was loading 6 skills on every spawn: architect-workflow,
swe-spawn-workflow, validate-swe-output, agent-creator, roundtable,
refresh-architecture. Removed the last three β€” they don't fire per
spawn:

- agent-creator is bro-invoked (when Human asks for a domain agent),
  not architect-invoked.
- roundtable fires only on explicit cross-domain decisions, rare.
- refresh-architecture is bro-invoked + auto on first code-touching
  ask, not per-architect-spawn.

Architect now loads 3 skills instead of 6, halving its context load.

**Skill docs update (skills/architect-workflow/SKILL.md)**

Added "Spec-body brevity rule (HARD CAP at 8000 chars, enforced by MCP)"
section explaining:
- Cite existing code/conventions, don't restate them.
- Reference industry-standard patterns by name, don't explain them.
- Over-long specs push chain latency into the minutes range (cites #55).
- Split into focused tasks via parent_branch_id when scope is genuinely
  large. Parallelizable via worktree isolation.

**Test fixes**

- `tasks.test.ts`: updated "rejects oversize spec_body" test from 64000
  to 8000. Added a new boundary test: spec_body at exactly 8000 chars
  must succeed.

Full suite: 237 unit (+1 boundary test) + 27 integration + 16 hook +
4 budget + 39 contract β€” all green.

What still can't be fixed from our side:
- Ctrl+C doesn't interrupt in-flight subagent inference (CC platform
  limitation). Issue #55 flags this for upstream filing. Meantime,
  `kill -9 <pid>` is the documented recovery.

Closes #55 on the parts we can fix. Filed upstream concerns remain
open on the CC issue tracker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- frontmatter: tool list ~40 β†’ 20 (drop AskUserQuestion + 15 rarely-used
  MCP tools: issue_get/list/get_phase/report_md, task_first_actionable,
  audit_log, ledger_list, file_registry_*, architecture_regen/regen_state_*,
  identity_get, config_*, skill_*). Each MCP schema was injected into the
  subagent system prompt at cold-start; dropping them shrinks the per-spawn
  context tax.
- frontmatter: eager skills 3 β†’ 1 (kept architect-workflow; swe-spawn-workflow
  + validate-swe-output now load on-demand via Skill tool β€” needed only at
  SWE handoff + SWE return, not during planning).
- architect-workflow: new "Simple Fast-Lane" section. Simple triage =
  narrow-scope ask. Architect picks defaults from a dimension β†’ default-choice
  table (argparse, stdlib unittest, ~/.<app>/<file>.json, single-user laptop
  scope), records them as explicit spec Assumptions, and calls
  task_create_batch with waive_scope_gate=true + a reason naming the
  defaults. No env probe, no Q+A round-trip.
- architect-workflow: env probe and full discussion phase explicitly scoped
  to the difficult path. Scope-ambiguity gate waiver's two legitimate uses
  (simple fast-lane + trivial doc/typo) spelled out.
- architect-workflow: escalate simple β†’ difficult triggers listed (multiple
  unrelated surfaces, architecture change, strategic defaults, spec >8k).

Target: simple-triage architect phase drops from 20 tool calls / 50k tokens /
2m50s to 3–5 tool calls / ≀20k tokens / ≀30s. Measured via Mode B retest.

Closes the architect-side of the 7m chain slowdown from #55. SWE-side
(MCP cold-start + CC platform subagent spawn latency) is separate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three workflow agents (architect, swe, pr-reviewer) grant access to the
same MCP server (plugin_tmb_trajectory-server). The fully-qualified per-tool
form made the tools: line repeat the server prefix 7–13 times per agent β€”
visually painful and a maintenance burden every time a tool is added.

Collapse to the bare server name ("mcp__plugin_tmb_trajectory-server"), which
CC treats as "allow every tool this server exposes". Each tools: line is now
one short line.

Tradeoff: each subagent now has schemas for all ~30 server tools in its
cold-start context, not just the ~13 specific ones I had picked for architect
in a4fe045. In exchange we get readable frontmatter and no drift when new
MCP tools ship. Net cold-start tax is small compared to the architect
simple-fast-lane work β€” most of that cost was in the ceremonial phase, not
schema registration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bro kept double-encoding the protected_branches config: passing
value="[\"main\"]" (a string containing JSON) instead of value=["main"]
(an actual JSON array). The MCP handler then JSON.stringify'd it a second
time, storing "[\"main\"]" in value_json. git-guards.sh does
`jq -e 'type == "array"'` on that value, sees a string, and blocks every
Bash call with "BLOCKED: TMB plugin_config key protected_branches is
malformed JSON" β€” freezing bro and any subagent that touches Bash.

Two-sided fix:

1. skills/first-run-onboarding: rename the wire guidance from
   "value=<JSON array>" (which LLMs read as 'pass a string containing
   JSON') to "value=<array of strings>" with an inline worked example
   (value=["main"]) and a one-liner explaining why pre-serializing is
   wrong. Adds the same clarity to the code example block.

2. mcp/trajectory-server config_set handler: detect when args.value is a
   string whose trimmed form starts with [ or { AND parses as valid JSON
   to an object/array, and reject with a helpful error naming the correct
   shape. Plain strings that happen to start with [ but aren't valid JSON
   fall through and are stored verbatim as before.

Repro: a session whose onboarding ran BEFORE this fix has a corrupt
value_json. Repair with sqlite3 or wipe .claude/tmb/trajectory.db and
re-run bro onboarding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZaxShen ZaxShen merged commit cc3718d into dev Apr 25, 2026
1 check passed
ZaxShen added a commit that referenced this pull request Apr 25, 2026
Supersedes the v0.2.0 placeholder tag (to be revoked on main-merge).

### Versions aligned

- `.claude-plugin/plugin.json`: 0.3.2 β†’ **0.1.0**
- `mcp/trajectory-server/package.json`: 0.2.0 β†’ **0.1.0**

### CHANGELOG.md

Replaced the "Unreleased β€” first public release pending" placeholder with
the comprehensive v0.1.0 entry summarizing everything that shipped across
PRs #41 β†’ #64:

- bro-as-persona + planner + task gate doctrine
- pr-reviewer-as-push-gate + git-push-guard.sh hook
- Lego templates (6 agents in templates/agents/, plugin's agents/ empty)
- planning-simple / planning-difficult split (was tmb_architect-workflow)
- bundled SQLite trajectory MCP server (~30 tools)
- requireRoles middleware on every workflow-state mutation
- Direct Mode for trivial single-file edits
- Three-layer test infrastructure (lint / MCP-integration / dogfood)
- Documented latency budget in docs/PERFORMANCE.md
- Documented "known not-done" linking the 4 main backlog items

### Post-merge release plan

After this PR merges to dev + dev β†’ main:

1. `git tag -d v0.2.0`
2. `git push origin :refs/tags/v0.2.0`
3. `gh release delete v0.2.0`
4. `git tag -a v0.1.0 main -m "..."`
5. `git push origin v0.1.0`
6. `gh release create v0.1.0 --notes-file CHANGELOG.md`

Layer 1 + 2 tests still PASS. No code/skill/agent changes β€” only version
bumps + release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZaxShen ZaxShen deleted the docs/40-dogfood-scenarios branch April 25, 2026 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant