π dogfood setup: SCENARIOS.md + bun workspaces + doc fixes (closes #40, #42)#41
Merged
Conversation
β¦oses #40) For each workflow in FLOWS.md, the verbatim user prompt that should trigger it + the observable expected behavior + the verification query/file-check to confirm it landed. The manual test grid for the plugin until automated agent tests exist. What's covered (~30 scenarios across 9 flows) - Flow 1 (onboarding): 4 β fresh-DB trigger, hold-and-resume of code- touching ask during onboarding, read-only ask interleaved, explicit re-onboard phrase - Flow 2 (simple task): 3 β typo, comment, internal refactor with no API change - Flow 3 (difficult task): 3 β new public API, schema/dependency change, cross-cutting concern - Flow 4 (agent-creator): 4 β request, approve, refuse, reserved-name refusal - Flow 5 (skill creation): 1 β mostly internal architect flow; noted as harder to deterministically trigger - Flow 6 (PR review): 1 β auto-fired in flows 2/3; verification query - Flow 7 (architecture regen): 4 β explicit phrase + 3 lazy variants (>25 commits / β€25 commits / first-ever session) - Flow 8 (SWE retry/escalation): 2 β single fail + retry, 3 fails + escalation - Flow 9 (roundtable): 4 β the four corners β’ 9.1 explicit magic word + β₯2 planners (canonical) β’ 9.2 explicit magic word + only architect (skill escalates, proposes agent-creator) β’ 9.3 implicit cross-domain question + β₯2 planners (architect detects, invokes roundtable without user request) β’ 9.4 implicit cross-domain + only architect (architect proposes creating planners first via flow 4) Format Each scenario has: ID, prerequisites, trigger prompt (verbatim), expected behavior (numbered observations), verification (SQL or filesystem check), pass/fail checkbox. Two cross-indices at the top: by trigger style (magic word / implicit cross-domain / on-demand domain agent / etc.) and by flow. Cross-links - FLOWS.md header now references SCENARIOS.md as a companion doc - docs/local-testing.md replaces its 10-row inline checklist with a one-paragraph pointer + a quick smoke-test table; full grid lives in SCENARIOS.md - docs/architecture/FILES.md tree includes SCENARIOS.md - README.md "Architecture reference" lists SCENARIOS.md Closes #40 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
closes #42) Two coupled fixes surfaced during a dogfood attempt that ran bun install at the plugin repo root and failed β there was no root package.json. Workspaces - New plugin/package.json declares `workspaces: ["mcp/trajectory-server", "monitors"]`. One `bun install` at plugin root installs every subpackage; dedups `better-sqlite3` that both subpackages share. - Root `scripts.build` runs `bun --filter='*' run build` β every workspace with a build script rebuilds. Root `scripts.test` runs `bash tests/run-all.sh`. - Subpackage `bun.lock` files deleted (mcp/trajectory-server + monitors); replaced by a single root `bun.lock` (26 KB). Gitignore dogfood-generated paths - Add `docs/trustmybot/` to .gitignore. The plugin's own contributor docs live at `docs/architecture/`, not `docs/trustmybot/`. If a contributor dogfoods inside this repo (instead of a scratch project), architect/SWE would create `docs/trustmybot/snapshots/`, `docs/trustmybot/architecture/auto/`, etc. as if this were a user project β those artifacts should never land in commits. - The plugin-root .gitignore's existing `.claude/` rule already covers the runtime DB + worktrees path. Docs updated - docs/local-testing.md Prereqs section: one-command `bun install` + `bun run build` at the plugin repo root. Two other places that said `cd plugin/mcp/trajectory-server && bun run build` trimmed the same way. CI simplified - .github/workflows/test.yml: three install/build/test steps that each set `working-directory: mcp/trajectory-server` collapsed into a single root-level `bun install --frozen-lockfile`, `bun run build`, and `bash tests/run-all.sh` (picks up MCP + hook + lint suites). Verified - `bun install` at plugin root installs 260 packages, produces one consolidated lockfile. - `bun run build` at plugin root compiles MCP server successfully (monitors has no build; `bun --filter` skips it cleanly). - `bash tests/run-all.sh`: 235 MCP + 16 hook + 4 agent-budget lint, all green. Closes #42 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users were copy-pasting `/absolute/path/to/trustmybot-plugin` literally and launching Claude Code with a bogus `--plugin-dir`, so no plugin loaded and `@gatekeeper` had nothing to bind to. Replace the placeholder with `$(pwd)` captured from the plugin repo root, with a sanity-check echo and a fallback note for users who run from elsewhere. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
β¦h CC plugin spec
`claude plugin validate` surfaced three hard manifest violations that kept
--plugin-dir from loading any of our agents:
1. plugin.json
- `repository` was an object; spec wants a string URL
- `dependencies` had npm-style object form; spec wants an array (removed
for now β no installed plugin uses deps yet, and our deps were soft refs)
- `provides` was not in the spec at all (agents auto-discover from
agents/ dir, same as every other plugin)
- Added `homepage` and `keywords` to match the common shape
2. marketplace.json
- Missing required `owner: { name }` object
- `source` must end in `/`
- Top-level `description` moved under `metadata.description`
- Removed the plugin entry's `version` (version lives on plugin.json)
3. hooks/hooks.json
- Must be wrapped in `{ "hooks": {...} }`; we had the event map at the root
- Hook command paths now use `${CLAUDE_PLUGIN_ROOT}/...` for portability
(matches superpowers and other first-party plugins)
Both `claude plugin validate` runs now pass. Full test suite unchanged:
235 MCP + 16 hook + 4 agent-budget lint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs were blocking the trajectory MCP server from ever loading:
1. Wrong top-level key β we had `"servers"`, Claude Code expects `"mcpServers"`
(matches every official first-party plugin .mcp.json). The wrong key was a
silent no-op: no error, no prompt, the server just never got spawned, so
agents could never call issue_create / config_set / identity_set and no
trajectory.db was ever created. Dogfood proved this out β `/mcp` listed
zero plugin-owned servers.
2. Relative args path β `mcp/trajectory-server/dist/index.js` resolved
against whatever cwd CC happened to be in, which is the scratch project,
not the plugin clone. Now uses `${CLAUDE_PLUGIN_ROOT}` (the same
substitution hooks.json now uses) so the launcher always finds the
built entry regardless of where the user is.
Closes the "why doesn't trajectory.db exist" half of the dogfood thread.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
File had a node shebang but 0644 permissions, so CC's monitor subsystem spawned it directly (via `command`) and got exit 126 / permission denied. Every other plugin script (hooks/*.sh) has the executable bit recorded in git; the monitor just slipped through. chmod +x locally and `git update-index --chmod=+x` to bake the executable bit into the tree so a fresh clone works without manual intervention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦n repo User hit this: ran in the plugin clone thinking it was the right place. Harmless (gitignored), but the instruction didn't say which dir to run it in. Now spells it out and adds a reassurance note for the plugin-clone case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦bstitution caveat Mode B referenced $PLUGIN_PATH inside a CC slash-command (/plugin marketplace add $PLUGIN_PATH) but never exported it in its own step list, AND even if exported CC's slash-commands don't expand shell vars β so users would paste the literal string "$PLUGIN_PATH" and get a broken marketplace add. Fix: 1. Add an explicit Step 1 that exports PLUGIN_PATH from the plugin repo (same pattern Mode A uses). 2. Tell the user to substitute the absolute path into the slash-command themselves β CC doesn't do shell interpolation inside /plugin commands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User wants to select-and-paste the whole setup at once instead of hopping between three "Step N" chunks. Keep the two code fences (shell vs. slash-commands) since they run in different shells, but drop the stepped framing. Caveat about $PLUGIN_PATH not expanding in slash-commands is now an inline note between the blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦board) User's valid gripe: pasting the whole Mode B block means claude launches and the TUI immediately scrolls the echo output off-screen before you can copy the path into the slash-command. Fix: - Stash the path at /tmp/tmb-plugin-path so it survives the claude launch. - macOS: also pipe into pbcopy so paste (Cmd-V) just works inside CC. - Other platforms: mention `!cat /tmp/tmb-plugin-path` which CC interprets as an inline shell command β surfaces the path without leaving the session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User is tired of the copy-paste trap iterations. The 3-step version is the lowest-cognitive-load option: copy the pwd output once with your eyes, paste it twice (shell cd, CC slash-command). No file stash, no clipboard, no shell-variable caveats inside CC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦ritance Mode B referenced $PLUGIN_PATH inside the `/plugin marketplace add` slash command but never told the user to set the variable. Fix by adding the same cd + export steps Mode A has, and a one-liner explaining that CC inherits the shell environment so the slash command resolves $PLUGIN_PATH cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Our monitor script is a one-shot poller: reads new ledger rows, prints any failed/escalated/retry-exhausted events, exits 0. CC's monitor subsystem reports every process exit as "Monitor stream ended" in the chat, which looks like a failure to users doing a dogfood run. Verified during live testing today: the MCP server is β connected and responding normally; the "stream ended" message is pure UX noise from the monitor subsystem. Since no other first-party plugin ships a monitors.json and `"when":"always"` has no confirmed meaning in the current CC monitor API, remove the manifest to stop the noise. Keep the .js script + package.json so we can revive this as a proper long-running streaming daemon later. File an issue to reintroduce with a real streaming implementation when the monitor subsystem's contract is clearer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mode B had one code block with step headers as shell comments, which reads as one monolithic copy-paste. Split into Step 1 (prose) β bash block, Step 2 (prose) β bash block, so each step is visually distinct and independently runnable.
β¦ onboarding
bro is a branding choice, not a user setting. Prompting the Human to rename
their gatekeeper on every fresh project was both (a) extra ceremony on a
moment they just want to get through, and (b) a branding leak β half the
projects call the gatekeeper "alex", half call it "bro", and the onboarding
stops being a memorable shared UI.
Changes:
- first-run-onboarding: drop the "what would you like to call me" question.
Keep the human_name question only. MCP call is now identity_set(human_name=β¦);
gatekeeper_name falls through to the schema default of 'bro'.
- tmb-reonboard: symmetric β no rename prompt, no gatekeeper_name in the
current/final summary, no gatekeeper_name in identity_set.
- Explicitly forbid inventing an honorific ("master", "boss", "friend")
when the Human declines to share a name. Plain second-person is cleanest.
- FLOWS.md mermaid: drop the second name exchange from the onboarding
sequence diagram.
- local-testing.md: clarify that gatekeeper_name stays at its schema default.
No MCP surface changes β identity_set still accepts gatekeeper_name if a
downstream script wants it; we just don't offer it via the onboarding UX.
All 235 MCP + 16 hook + 4 agent-budget lint tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**Rationale (pre-release, no tech-debt concerns):** The agent was named `gatekeeper` (role) but ran a persona called `bro` (brand). That split caused recurring confusion: - `@gatekeeper` was 11 characters to type vs. the 3-character persona the user actually saw in conversation (issue #36 Concern 2). - Every document had to explain the two-name relationship. - The `identity.gatekeeper_name` column treated bro's name as user-configurable branding, which is wrong: bro IS the brand. It's not a knob. - Onboarding asked the Human "what would you like to call me?" β ceremony that burned a question on a non-choice. Since the plugin is pre-release and has no migration concern, a clean rename is cheaper than dragging two names forever. **Changes:** - Renamed `agents/gatekeeper.md` β `agents/bro.md` (mention handle becomes `@bro`, which closes the handle half of issue #36). - Bulk rename `gatekeeper β bro`, `Gatekeeper β Bro` across all prose, code identifiers, test names, MCP caller_role values, and KNOWN_ROLES. - Dropped the `identity.bro_name` column (was `gatekeeper_name`) from the schema. `identity_get` / `identity_set` / `identity_reset` no longer expose it. Bro is a fixed string in prompts, not a DB field. - Simplified the onboarding welcome message (`Hey, I'm bro β your <role>` had become `your bro` after naive rename, a redundancy; replaced with a clean self-introduction). **Files touched:** 42 including agent prompts, every skill, all docs (CLAUDE.md, README, CONTRIBUTING, ERD, FLOWS, FILES, SCENARIOS, local-testing, CONFIG_KEYS), MCP server source, middleware, and every test file that referenced the old name. **Verification:** - `grep -ri gatekeeper` in source: 0 hits. - `grep -ri bro_name` in source: 0 hits. - 235 MCP + 16 hook + 4 agent-budget lint tests pass. - Headless smoke test: `claude --plugin-dir β¦ -p "list tmb: agents"` returns `tmb:bro`, `tmb:architect`, `tmb:swe`, `tmb:pr-reviewer`. Closes #36 handle concern (renaming part). Enforcement (routing) stays open as a separate concern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README tagline pulled up as a blockquote right under the H1 so anyone landing on the repo sees it first. - Onboarding welcome opens with the catchphrase; onboarding closing signs off with it β the two natural bookends of the first conversation. - bro's agent prompt gets a short Catchphrase section: deploy at completed hand-offs and confident routing calls; never on failures or clarifying questions. Swagger, not reassurance. Catchphrase placement is deliberately thin β dilution kills a tagline, so it shows up at ~2 predictable beats per session instead of being appended to every response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"Trust me bro, it works" is only self-aware humor if it follows real
verification. Applied to unverified code it just becomes the exact
behavior the meme mocks β an AI agent claiming correctness with no
evidence. That undercuts the whole brand bet.
New rule: catchphrase is only earned on a code-delivery hand-off when
BOTH conditions hold:
1. pr-reviewer recorded validation_record(verdict='pass') for every
completed task in the issue.
2. Integration tests (not unit-only, not lint) actually executed
against the delivered change and passed.
If the project has no integration tests, the slogan is not earned
and bro just says "done".
Onboarding bookends remain as the only no-evidence use β there it's
a meta-tagline referencing the plugin itself, not a claim about
shipped code.
Removed the earlier "confident routing call" carve-out: routing
decisions are too soft to back with the slogan. Swagger only after
the tests run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦shback Adds the third pillar of agent-harness design β productive dissent β alongside the two we already implement (responsibility boundaries + requirements alignment). Two failure modes this avoids: - Sycophantic agreement: bro capitulates to anything the Human says, plans proceed with known flaws. - Direct contradiction: bro pushes back personally, triggers the defensiveness people naturally feel about their own plans, conversation stalls. The structural fix is to separate intent-capture from intent-evaluation: **bro.md** β added a fifth role bullet. Bro's job is faithful capture; concerns go into the architect spawn prompt as `concern: <why>`, never argued with the Human directly. Bro is not a debate partner. **architect.md** β sharpened objective #2. Architect is the technical check on the Human's plan, with independent authority. Concerns (bro-forwarded or architect-originated) surface via discussion_append BEFORE writing tasks. Explicitly: never rubber-stamp a plan the architect believes is flawed. The Human now sees two independent reads (theirs + architect's) routed through a structural channel, not a confrontation. Triangulation, not conflict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 24, 2026
Closed
β¦nt param
Two compounding bugs meant every bro MCP call during dogfood was narrated,
never executed β '0 tool uses Β· 21.7k tokens' across the session:
1. **Subagents had no MCP tools in their registry.** CC plugin agents must
explicitly list `mcp__<server>__<tool>` names in `tools:` frontmatter.
Wildcards (`mcp__plugin_tmb_trajectory-server__*`) don't propagate to
subagents. Tested both: wildcard produces empty tool list, explicit
names produce the right registry.
2. **MCP inputSchema didn't declare `agent` parameter.** The server's
requireRoles middleware reads `args['agent']` to enforce role-based
access. When the LLM sent `{agent: 'bro', ...}`, the schema validator
stripped it (wasn't declared) β `caller_role: 'unknown'` β forbidden.
Tests passed because they bypass schema by calling handlers directly.
Fix:
- **agents/bro.md, architect.md, swe.md, pr-reviewer.md**: list all MCP
tools each agent is actually allowed to call per server-side role policy,
explicitly. Added one-line MCP Caller Identity section to each prompt
instructing the LLM to include `agent='<role>'` in every call.
- **skills/{first-run-onboarding, tmb-reonboard, lazy-regen-check,
project-prescan, branch-id-proposal, refresh-architecture}/SKILL.md**:
extended `allowed-tools` with the specific MCP tools each skill needs.
- **mcp/trajectory-server/src/tools/index.ts**: new `decorateWithAgent()`
helper injects `agent: enum[bro|architect|swe|pr-reviewer]` into every
tool's inputSchema at registration time. One place, applies to all 36
tools, no per-tool edits.
Verified live:
identity_set(agent='bro', human_name='Zax') β
{"human_name":"Zax","created_at":"2026-04-24T...","updated_at":"..."}
identity_get(agent='bro') β
{"human_name":"Zax","created_at":"2026-04-24T...","updated_at":"..."}
This unblocks PR #41's end-to-end dogfood: bro can now actually persist
onboarding state via MCP instead of narrating the calls as prose.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 24, 2026
Closed
β¦-RPC protocol User feedback caught the real process failure: I had them dogfooding (Layer 3) without Layer 2 in place, so they burned hours chasing bugs Layer 2 would have found in milliseconds. Adds a proper MCP integration layer: **tests/mcp-integration/harness.mjs** β spawns `node dist/index.js` as a subprocess, speaks the real JSON-RPC stdio protocol via the MCP SDK client, exposes `listTools()` and `call(name, args)` helpers that return structured ok/error/data so tests read like production call sites. **tests/mcp-integration/schema-contract.test.mjs** β verifies every tool's inputSchema declares the `agent` parameter with the four-role enum. This is the test that would have caught the 18s / 0-tool-uses disaster in PR #41: the schema validator was stripping `agent` because no tool declared it, collapsing every call's caller_role to 'unknown'. **tests/mcp-integration/role-matrix.test.mjs** β 10 tests covering every tool that currently wraps its handler with `requireRoles`: - identity_set (bro only) - identity_reset (bro only) - config_set (bro, architect) - file_registry_upsert (architect, bro) - file_registry_delete (architect, bro) - issue_snapshot_md (architect, pr-reviewer) - discussion_append (workflow agents + scope) - architecture_regen (architect, bro, pr-reviewer) - regen_state_set (architect, bro, pr-reviewer) Each test verifies three conditions per tool: missing `agent` β forbidden, wrong `agent` β forbidden, right `agent` β success. **Wiring:** - New suite runs from tests/run-all.sh between Layer 1 (unit) and Hook tests. - `@modelcontextprotocol/sdk` added as root devDependency for the harness. - CI picks it up via the existing `bun run test` entry point. **Tool schema fix surfaced while wiring:** Changed `decorateWithAgent()` property-spread order so the decorator's `agent` definition takes precedence over tool-specific declarations. Several tools (issues.ts, tasks.ts) had their own `agent: { type: 'string' }` without the four-role enum, shadowing the canonical schema. **Gaps surfaced (filed as separate issue):** Only 9 of 36 tools currently wrap handlers with `requireRoles`. Tasks, validations, issues, ledger, audit, and skill tools are all unprotected β any caller including 'unknown' can write through them. The worst offender is `validation_record`, which controls the push-block hook's unlock condition. Scoped to a follow-up issue so this PR doesn't spiral. Full suite now: 235 Layer 1 unit + 10 Layer 2 integration + 16 hook + 4 agent-budget lint, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦ press-enter defaults CC interactive input can't send empty input, so every \"press enter to keep default\" in the skills was broken (only plan-mode accepts bare Enter). Text multi-choice also forced digit-matching (\"1\" β github-flow) and made the flow feel 2010. Switched both onboarding skills to use \`AskUserQuestion\` β a proper radio form with per-option descriptions, a pre-selected default, and an auto-added \`Other\` for free text. **first-run-onboarding/SKILL.md** - One batched \`AskUserQuestion\` call with 3 questions (name, branching, PR target) β Human answers all in one form, bro persists all via MCP. - Branching labels lead with structure, brand in parens: - \"Trunk + feature branches (GitHub Flow)\" - \"Trunk + develop + releases (Git Flow)\" - \"Custom workflow\" This solves the \"github-flow vs gitflow β which is which?\" confusion. - Name question offers \"Anonymous\" + pre-populated inferred name (if the Human introduced themselves inline); Other covers everything else. - Custom branching path uses a second batched call for the protected- branches follow-up. **tmb-reonboard/SKILL.md** - Same batched flow. First option in each question is \`Keep \"<current>\"\` β click once to preserve, or pick an alternative. - \"Anonymous\" on the name question maps to \`identity_reset\`. - Dedupe logic for when the current value would duplicate a static option. **agents/bro.md** - Added \`AskUserQuestion\` to the \`tools:\` allowlist. Closes half of #48 (the skills-level rewrite). The naming-clarity half is captured in the new labels. Integration testing of the rendered UI is a human dogfood step (Layer 3) β automated tests can't render radio UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦reviewer happy paths Filled in the real Layer 2 scope. The prior Layer 2 commit only shipped a role-permission matrix (does role X work/fail on tool Y); it didn't match the user's ask of "each agent runs each MCP for their responsibilities". Four new test files, one per agent role, each exercising that role's realistic end-to-end MCP sequence against a real spawned server: **agent-bro-workflow.test.mjs** - First-run onboarding: empty state β identity_set β config_set x3 β verify final state via identity_get + config_list. - Reonboard rename: identity_reset then new identity_set; verify config is untouched. - issue_resume with a known issue_id (session-start resume path). **agent-architect-workflow.test.mjs** - Simple-task flow: issue_create β discussion_append (triage note) β task_create_batch β task_get β validation_history (empty) β issue_close. - Difficult-task flow: issue + 4-entry ADR-style discussion loop (note, question, answer, decision) with issue_get_with_discussions round-trip. - Skill lifecycle: skill_register (curated draft) β skill_promote (draft β pending_review) with explicit from_status. **agent-swe-workflow.test.mjs** - Pickup β running β atomic close: task_get β task_update_status(running) β ledger_log (full required-arg set: issue_id, branch_id, from_node, event_type, summary) β file_registry_upsert (documents current forbidden status per #50) β audit_log β task_update_status(completed) β verify. - validation_history read-access to own task. **agent-pr-reviewer-workflow.test.mjs** - Happy path: task_get (completed) β validation_record(pass) β history reflects 1 row with verdict=pass. - Fail path: validation_record(fail) β architect reads same history. - Retry loop: 3 sequential attempts (fail/fail/pass) β history returns all 3 in ascending attempt order. **Bugs surfaced and fixed while writing the tests:** - `ledger_log` requires (agent, issue_id, from_node, event_type, summary); my first draft only passed event_type+summary+branch_id and got a `Missing required arg: issue_id` β caught immediately by Layer 2. - `issue_resume` requires issue_id (not just agent). - `skill_promote` requires from_status (not just name+to_status). These are the exact shape mismatches that would have manifested as "0 tool uses" in interactive dogfood, burning more of your time. Layer 2 caught them in ms. Coverage: - Schema contract: 1 test (all 36 tools have `agent` in inputSchema). - Role matrix: 9 tests (every currently-protected tool). - Per-agent workflows: 11 tests (4 bro + 3 architect + 2 swe + 3 pr-reviewer β note: 2 for swe because file_registry_upsert is documented-as-forbidden). Full suite: 235 Layer 1 unit + 21 Layer 2 integration + 16 hook + 4 agent-budget lint, all green. Layer 1 decision: the pre-existing `remaining_tools.test.ts` covers audit / validation / skills / reports with 11 cases β enough for adequate. Splitting into dedicated files is cosmetic refactoring, not new coverage, so it's deferred. Real Layer 1 gaps (e.g. issue_snapshot_md which writes to disk) are exercised indirectly by Layer 2's role-matrix test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦eam issues are stale User caught me being wrong. Provided a screenshot showing AskUserQuestion rendering live in an interactive session β the "Scope" chip + 6-option radio form + Enter-to-select hint are the AskUserQuestion UI, invoked from a plugin subagent. My mistake chain: 1. Shipped AskUserQuestion without verifying it works in subagents. 2. Hit the text-fallback in live dogfood, concluded the tool was unavailable. 3. Tested via `claude -p` (headless mode) β tool wasn't in the registry. 4. Decided this confirmed the limitation. Shipped revert + filed #52. 5. Searched upstream issues β they said "not-planned" back in CC 2.0.56. 6. Doubled down on the revert, closed #52 as BLOCKED. What I missed: `claude -p` is non-interactive. AskUserQuestion REQUIRES an interactive UI to render. `-p` stripping the tool wasn't a subagent limitation β it was a UI-mode restriction. And the upstream "not-planned" state may have been reversed in more recent CC builds (user is on a current build where it works interactively). Restoring: - agents/bro.md tools: AskUserQuestion back. - agents/architect.md tools: AskUserQuestion added (user's screenshot suggests architect is the one rendering it for architecture-scope questions, which is the right home for the radio UI on design decisions). - skills/first-run-onboarding/SKILL.md: back to batched AskUserQuestion call with the 3-question form (name, branching, PR target) + Step 3a for custom-protected-branches multiSelect follow-up. Kept the hardened MCP discipline from the revert: mandatory write sequence enumerated, no hallucinating rejections, post-write config_list verify gates the closing message. - skills/tmb-reonboard/SKILL.md: back to batched form with `Keep "<current>"` as pre-selected first option. - tests/manual/scenarios.md Flow 1: back to AskUserQuestion expectations. - tests/manual/scenarios.md 1.4: reonboard back to AskUserQuestion. The discipline fixes from the previous "revert" commit (mandatory write sequence + config_list verify) are preserved β those are UI-independent and were the RIGHT fix for the hallucinated-rejection bug we saw. Tests green: 235 unit + 21 integration + 16 hook + 4 lint. Next: reopen #52 with the corrected status β works interactively, documented `-p` limitation as separate concern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦n_append User confirmed AskUserQuestion works live in plugin subagents (screenshot of architect rendering a scope-narrowing radio form for "I want to make social media platform"). The discussion-form UX is the design intent, and every Q/A round must persist to the discussions table so the trajectory stays replayable. Implementing the pattern: **skills/architect-workflow/SKILL.md** β new "Interactive Alignment" subsection under Discussion Phase. Describes: - When to use AskUserQuestion (enumerable answers, 2-4 options, scope/ tech/priority/approach) - When to fall back to text discussion_append (open-ended "what constraints do you have" questions) - The mandatory dual-persist pattern β every AskUserQuestion round followed by TWO discussion_append calls (kind='question' + kind='answer') in chronological order, using the existing valid kinds - Guidance on batching (independent answers β one AskUserQuestion call with up to 4 questions; dependent answers β sequential) - Closing the discussion with kind='decision' once aligned This gives us the three-mode UX the user specified: - simple β bro β architect β swe (no discussion form; trivial template) - difficult β bro + architect with AskUserQuestion alignment loop β swe (persists to discussions, renders in issue_report_md / issue_snapshot_md) - out-of-scope β roundtable (tracked as #53 for AskUserQuestion extension into flow 9) **docs/architecture/FLOWS.md** β flow 3 mermaid sequence diagram updated to show the AskUserQuestion loop + dual discussion_append per round. Notes section explicitly documents the replayability guarantee: every alignment decision lands in discussions as a Q+A pair so future sessions can reconstruct the reasoning via issue_report_md. No schema changes β 'question' and 'answer' are already valid kinds in ALLOWED_KINDS (verified in tools/discussions.ts). Tests: 235 unit + 21 integration + 16 hook + 4 lint β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦text Live dogfood found two related bugs, same root cause β load-bearing Human questions were being silently skipped: **Bug 1: Onboarding auto-filled without showing the radio form.** Bro wrote identity (Zax) + branching (github-flow) + pr_target (main) + protected_branches (main) in 11 seconds with ZERO AskUserQuestion render. Root cause: CC's environment context leaked `# userEmail zax.shen@gmail.com` into bro's context, and auto-mode's "minimize interruptions" nudged bro's LLM to write the inferred defaults directly. The Human never got to confirm or change. Fix in skills/first-run-onboarding/SKILL.md: explicit "Hard rule β always render the form" section. AskUserQuestion is NOT skippable when: - Environment leaked user info (inferred β confirmed) - Auto mode encourages action (onboarding isn't routine) - The LLM thinks it knows the answer (branching is policy, not guess) **Bug 2: Architect wrote `kind='decision'` without asking any question.** User asked "build a Python CLI todo". Architect triaged as simple (no docs/trustmybot/architecture/ impact), skipped the alignment loop per the current "simple β straight to tasks" rule, and unilaterally chose stdlib-only + argparse + JSON persistence. ZERO kind='question' rows in discussions. User correctly pointed out: "build a CLI todo" has massive scope ambiguity β storage backend, lib choice, command surface, feature scope β all of which the Human should pick, not the architect. Fix in skills/architect-workflow/SKILL.md: add a "Scope-ambiguity gate" inside Discussion Phase. Triggers in BOTH simple and difficult triage. Before any kind='decision' row, every plan choice must be either (a) explicitly answered by the Human via the dual-discussion_append pattern, or (b) a choice the Human plausibly wouldn't care about. Auto-mode does not waive the gate. The gate enumerates categories that ALWAYS need explicit Human input: storage backend, lib choice, command surface, feature scope, persistence location. Architect can still decide implementation details (internal variable names, stylistic choices) β just not interface-shaping ones. Net effect: simple triage no longer means "architect auto-decides the whole plan". Simple means "fewer/smaller tasks" (trivial template); the alignment loop still runs whenever scope is ambiguous. Tests: 235 unit + 21 integration + 16 hook + 4 lint β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦cal reality User: "What I expected is to check local python env first, then ask which env management tools, etc. in the discussion form." Current architect workflow said "explore the codebase before asking questions", but didn't probe the ENVIRONMENT β which tools are installed, which language versions are available, which project files already exist. Result: architect would offer generic menus like "uv vs poetry vs pip" even when only one is installed locally. Fix: new "Environment Probe" step in architect-workflow's Discussion Phase. Runs BEFORE the AskUserQuestion radio form. Uses Bash (read-only) to detect: - Language versions (python3, node, go, rustc) - Python package managers (uv, poetry, pipenv, pip) - Python project files (pyproject.toml, requirements.txt, setup.py, Pipfile) - Node ecosystem (bun, pnpm, npm) + lockfiles - Linters/formatters (ruff, black, biome, eslint) - Test runners (pytest, vitest) - Git state (remote, branch) The probe informs the radio options β every option must be grounded in what's actually detected. Explicit rules: - Never offer an option that can't execute on the local machine. - Never mark a tool "(Recommended)" unless it's detected AND fits the task. - If .python-version exists β skip Python version question. - If pyproject.toml exists β ask "reuse / new alongside / scrap & restart" instead of generic "what layout". Example mapping table in the skill shows: "uv detected + Python β₯ 3.11" β radio form with `uv (detected, v0.5.x) (Recommended)` / pip+venv / poetry. "No package manager" β "Install uv / use pip / I'll handle it". Probe findings persisted as a discussion_append(kind='note') row so future sessions replay the environment context. Also updated the Discussion Phase step list β probe is step 2, codebase exploration is step 3, radio form is step 4. Net effect: the discussion form the user envisioned β grounded, machine-specific, not a generic decision tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦Question contract
Live dogfood: bro rendered text questions ("1. Name to call you? (e.g.
Zax or Anonymous)") instead of AskUserQuestion, AND leaked the user's
inferred name ("Zax", from CC's environment-level userEmail) into the
question text as an example.
Two tightenings in first-run-onboarding/SKILL.md:
**Hard rule: AskUserQuestion mandatory, no text fallback.**
Previous wording said "AskUserQuestion is mandatory" but bro's LLM still
fell back to text. The skill now explicitly calls the text fallback a
contract violation and prescribes the exact recovery path when the tool
errors: read error β retry once β if still erroring, surface verbatim
to the Human and ask how to proceed. NO silent text fallback.
Added an explicit reason list for why NOT to skip the tool call:
- Env context leaked identity (inferred β confirmed)
- Auto mode says "minimize interruptions" (onboarding IS the trust-model setup)
- LLM thinks it knows answers (branching is policy, not a guess)
- Text seems faster (it violates the scenarios contract)
**Context-leak rule: never surface inferred identity.**
CC subagents inherit the user's email via # userEmail env lines. The
skill now explicitly forbids:
- Putting an inferred name in the question text (no more "(e.g. Zax)")
- Pre-populating `Use "Zax"` based on email-derived guesses
- Writing identity_set('Zax') from email inference alone
The only case where pre-populating a `Use "<name>"` option is allowed:
the Human typed their name IN THIS interactive session (e.g. "I'm Alice"
in a prior turn). Env context alone doesn't count β it's external
metadata the Human hasn't confirmed.
Also updated the AskUserQuestion code comment in Step 1 to make this
rule visible at the option-construction site.
The radio form is where consent lives. Until the Human picks, we don't
know their name β regardless of what CC told us.
Tests: 235 unit + 21 integration + 16 hook + 4 lint β all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User correctly called me out for asking them to test in interactive CC sessions for things that should have been automated. The concerns are: - Does the skill say AskUserQuestion is mandatory? - Does the skill forbid text fallback? - Does the skill warn against env-leak of inferred identity? - Are labels β canonical mappings complete? - Are required MCP writes enumerated? - Are architect's alignment + scope-gate + env-probe sections present? All of the above are prompt-content assertions β zero LLM required. Running them caught a real bug immediately: the skill had `(e.g. "Zax"` in an example, which is exactly the pattern the rule forbids (teaching the LLM to echo inferred identity). Fixed. Added tests/lint/onboarding-skill-contract.sh: - 13 positive assertions on first-run-onboarding (mandatory tool, text-fallback forbidden, env-leak rule, canonical mappings, MCP writes, never-narrate-rejection, in-session pre-populate gate) - 2 negative assertions (no "e.g. Zax" literal in either escaped or unescaped form β prevents regression of the bug we just found) - 3 assertions on tmb-reonboard (AskUserQuestion, Keep pattern, identity_reset path) - 9 assertions on architect-workflow (AskUserQuestion, discussion_append persistence, kind=question + kind=answer, scope-ambiguity gate, environment probe, probe citations of uv + pyproject.toml, discipline rule against ghost options) Wired into tests/run-all.sh after the agent-budget lint. Runs in ~50ms. What this CANNOT test: whether bro's LLM actually chooses to call AskUserQuestion at runtime. That's genuine Layer 3 (LLM judgment, non-deterministic). But the static contract at least prevents the category of regression where the skill prompt itself degrades. Full suite: 235 Layer 1 unit + 21 Layer 2 integration + 16 hook + 4 agent-budget + 27 onboarding contract β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦estion, scope-gate bypass
Four bugs surfaced in user's interactive dogfood. Solving all.
**Bug 1: issue_create returned a ghost `issue_string_id` field.**
Main Claude saw `{id: 1, issue_string_id: 'iss_modfdaia_5466e041'}` and
used the UUID for subsequent discussion_append calls, which silently
failed (no matching row). Retried with `1` which worked. Net effect:
2 ghost MCP writes, user-visible confusion about which id is canonical.
Fix in mcp/trajectory-server/src/tools/issues.ts: remove the genId('iss')
line and the issue_string_id in the return. issue_create now returns the
redacted row with a single `id` field. genId import dropped.
Layer 2 test added: issue_create returns a single id (no issue_string_id
ghost field) β prevents regression.
**Bug 2: AskUserQuestion doesn't work in plugin subagents (re-confirmed).**
User's dogfood: main Claude narrated "AskUserQuestion isn't available in
subagents" and took over the form at top level. This contradicts my
earlier un-revert based on a single screenshot (which I now believe was
main Claude rendering, not a subagent).
Verdict: AskUserQuestion is main-Claude-only. For bro, main Claude
hoists the form (proven to work in onboarding). For architect mid-flow,
main Claude does NOT hoist β architect is on its own.
Fix in skills/architect-workflow/SKILL.md: Interactive Alignment section
reverted from AskUserQuestion pattern to **text-based Q+A** with the same
dual discussion_append persistence. Architect now emits numbered text
questions, waits for reply, persists Q+A verbatim.
**Bug 3: Scope-ambiguity gate silently bypassed by auto-mode.**
Live DB showed: discussions had entries 1-4 (intent/note/note/note-probe)
then row 5 was `kind='decision'` with body containing literal phrase
"auto-mode defaults, zero-dep". ZERO `kind='question'` rows β architect
unilaterally chose Python stdlib + argparse + JSON + etc.
Fix in skills/architect-workflow/SKILL.md: Scope-ambiguity gate
upgraded to HARD RULE. New contents:
- Marked HARD RULE in the heading itself
- Explicit "Auto-mode does NOT waive this gate" statement
- Specific trigger phrase: "If your response body would include phrases
like 'auto-mode defaults' or 'defaulting to X since you didn't
specify' β STOP. Ask first."
- Full worked example of correct sequence (8-row discussions table)
- Full worked example of violation (what the user just saw) marked RED FLAG
- Self-review instruction: before task_create_batch, re-read
discussion_list; if you see a decision without a preceding question,
you violated the gate.
**Bug 4: No test catches Bug 3 at CI time.**
Added lint coverage for the HARD-RULE discipline:
- `tests/lint/onboarding-skill-contract.sh` extended with 5 new checks:
HARD RULE marker, "Auto-mode does NOT waive", "auto-mode defaults"
call-out, RED FLAG violation example, text-Q requirement, and
negative check that no text instructs architect to CALL
AskUserQuestion (a subagent-blocked call).
What this DOESN'T catch: whether architect's LLM actually honors the
gate at runtime. That's genuine Layer 3 β flaky by nature, document-able
via scenarios.md. But the static contract regression class is closed.
Also added Layer 2 test: discussion_append chronology test. Verifies
schema-level chronological consistency (answer rows have preceding
question rows) so at least the protocol-level invariant is enforced.
All automated tests green: 235 Layer 1 unit + 23 Layer 2 integration
(two new) + 16 hook + 4 agent-budget + 32 lint assertions (5 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦t-blocked User dogfood again surfaced bro stuck at the onboarding form: AskUserQuestion fails for subagents (per anthropics/claude-code#12890), so my last commit saying "surface error to Human and ask how to proceed" caused main Claude to render an unhelpful "abort / answer the 3 / take defaults" menu. Reverting bro to the same text-Q+A pattern architect now uses. Consistent, reliable, no subagent-AskUserQuestion dependency: **skills/first-run-onboarding/SKILL.md** β full rewrite of the question UI: - Removed AskUserQuestion code block + Map-labels-to-canonical table. - Added explicit "MUST NOT call AskUserQuestion" prohibition with link to the upstream issue. - Step 1 now emits welcome + 3 numbered questions in one chat message: 1. What should I call you? Reply with a name or `anonymous`. 2. How does your team branch? Reply with 1/2/3 (canonical labels). 3. What's your PR target branch? Reply with `main`/`master`/`develop`/ `trunk`/other. - Step 2 parses the free-text reply against canonical map. - Step 3a (custom branching) replaced with a CSV reply for protected. **skills/tmb-reonboard/SKILL.md** β symmetric rewrite: - Removed AskUserQuestion batched-question form. - Step 2 lists current values + asks "which to change?" β reply chooses one of: name / branching / pr-target / protected / all / none. - Each sub-question accepts the new value or `keep` sentinel. **tests/lint/onboarding-skill-contract.sh** β updated to match the new text-Q reality: - first-run-onboarding: replaced "AskUserQuestion mandatory" + "no text fallback" assertions with "MUST NOT call AskUserQuestion" prohibition + "anonymous" + "comma-separated" + canonical-mapping checks. - tmb-reonboard: same β checks for NEVER list, keep sentinel, identity_reset path. - Removed assertions about Keep "<current>" pattern (was AUQ-specific) and IN-THIS-session pre-populate gate (was AUQ-specific). The two remaining intentional AskUserQuestion mentions: 1. first-run-onboarding line: "MUST NOT call AskUserQuestion" β load-bearing prohibition. 2. tmb-reonboard NEVER list β same prohibition. Tests: 235 unit + 23 integration + 16 hook + 4 lint + 30 contract β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported on retry: "Still no form!" after 232a90b switched bro to text-only Q+A. Re-examining the earlier working run (commit 036c81f era): bro's skill *did* call AskUserQuestion, bro subagent couldn't execute it, and main Claude transparently hoisted the form at its own level β rendering the radio UI, collecting answers, passing them to bro which then persisted via MCP. The user's "Onboarding works" report came from that flow. My 232a90b commit killed the signal: by removing AskUserQuestion from bro's skill entirely, main Claude had no reason to intercept and rendered the text as-is. Net result: worse UX. Un-reverting the skill changes from 232a90b restores the signal: - first-run-onboarding/SKILL.md back to batched AskUserQuestion call with Hard Rule + context-leak protection - tmb-reonboard/SKILL.md back to Keep-"<current>" first option pattern - tests/lint/onboarding-skill-contract.sh back to AskUserQuestion assertions Architect keeps the text-Q+A pattern (from 98e0f23) because main Claude does NOT hoist for deep-spawned subagents the same way. This is asymmetric but matches observed behavior. What I learned the hard way: "AskUserQuestion doesn't work in subagents" is technically true (per anthropics/claude-code#12890) but functionally incomplete β main Claude compensates for direct-@-mention subagents like bro. The skill should keep the aspirational AskUserQuestion call as the signal main Claude uses to know "this agent needs interactive UI". Four revert-cycles on this issue. Locking in the current state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User clarification: "bro should be CC's persona. After first @tmb:bro then CC have the persona. Why not subagent? Because subagent has no enough permissions." Right. Subagents lack AskUserQuestion (anthropics/claude-code#12890) and hit other permission gates. bro needs those for onboarding + alignment with the Human. So bro IS main Claude; architect/swe/pr-reviewer stay as true subagents with constrained tool surfaces. Changes: - **agents/bro.md** β DELETED. bro's prompt folded into plugin/CLAUDE.md under a "You are bro" section. Main Claude loads CLAUDE.md at session start and adopts bro's persona automatically. - **plugin/CLAUDE.md** β rewritten: - Lead with the top/bottom split: "You ARE bro" for main Claude; subagent prompts override when architect/swe/pr-reviewer spawn. - Full bro persona inline: MCP caller identity, chain-of-thought, role, identity+onboarding, catchphrase, session-start chain, routing table, no-auto-action discipline, agent-creator flow, mode rules, communication style. - "Subagent roster" section after bro's persona β the three true subagents (architect/swe/pr-reviewer), their models, their override-by-project rules. - "Tool availability contract" explicitly differentiates: bro has full CC toolkit including AskUserQuestion; subagents don't. - Updated Decision flow to show "bro = main Claude". - Preserved workspace boundary + workflow files + source access control sections. - **tests/manual/scenarios.md** β dropped `@tmb:bro` trigger prompts in favor of plain `hello` / `switch to gitflow`. Notes that bro IS main Claude, loaded via plugin/CLAUDE.md. - **agent-budget lint** β automatically picks up bro.md removal (uses a find-glob, no manual list update needed). No changes to subagent MCP role enforcement (architect/swe/pr-reviewer all still declare `agent='bro'` in their tool calls when relevant β that's fine, they're reporting their caller context). bro's role name remains 'bro' in requireRoles(['bro']) checks. Net side-effects, which together close open issues in principle: - #36 (routing enforcement): bro IS main Claude β every Human message routes through bro by architecture, not by convention. - #46 (shorten @-handle): no @-mention needed; user types naturally. - #47 (bro first-response latency): no subagent spawn overhead; bro responds as fast as main Claude normally does. Tests all green: 235 Layer 1 + 23 Layer 2 + 16 hook + 3 agent-budget (bro removed) + 30 contract lint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User concern: with bro as main Claude, users can still type @tmb:architect or @tmb:pr-reviewer and bypass bro's triage / onboarding / branch-id flow. swe already had a MANDATORY FIRST ACTION that rejects spawns without task_id=<N>. Extending the same discipline to architect and pr-reviewer so no subagent is reachable directly by the Human. Rejection heuristic per agent β detect bro-originated spawn via routing markers, reject otherwise: - architect: requires at least one of `triage: simple|difficult`, `issue_id=`, `branch_id=`, or `concern:` in the spawn prompt. bro always includes at least `triage:` and usually a `branch_id` from the branch-id-proposal skill. Direct @-mention by Human has none. - pr-reviewer: requires at least one of `task_id=<N>`, `issue_id=<N>`, or a bro-routed review-request marker. architect always spawns pr-reviewer with task_id after SWE completes. - swe: already had this via `task_id=<N>` scan. Unchanged. On rejection, each agent outputs a structured REJECTED line explaining that they're subagents, not Human entry points, and pointing the Human back to bro (just type the request β no @-mention needed). Tightened the pr-reviewer overview block to recover the line budget (200-line cap per agent) after adding the rejection section. Tests: 235 Layer 1 + 23 Layer 2 + 16 hook + 3 agent-budget + 30 contract β all green. Addresses your concern #1: users can't accidentally or deliberately bypass bro by @-mentioning subagents. All three now auto-reject direct invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User's refined design: plugin enabled β bro active. Main Claude should
stay as normal Claude Code until the Human addresses "bro". Once
triggered, stays in bro mode for the session.
Why this is better than the auto-persona model:
- "what's 2+2?" β main Claude answers directly. No onboarding overhead,
no pre-scan, no MCP call pollution.
- "bro, write a CLI todo" β triggers full bro machinery. Onboarding (if
first time), routing, subagent spawning, MCP state.
- "bro" is a natural vocative β humans address TMB by its persona name.
- Plugin sits dormant until invoked; backwards-compatible with regular
CC workflows in the same session.
Changes:
- plugin/CLAUDE.md β rewrote the opening section:
* New "Persona activation rule": listen for "bro" as trigger word in
any form (vocative, @-mention, subject). Until triggered β act as
normal main Claude. After β adopt persona for the rest of session.
* "Subagent prompt precedence" section separated so architect/swe/
pr-reviewer's own prompts still win when spawned via Task.
* Kept all bro persona content below β unchanged, only the activation
semantics flipped.
- tests/manual/scenarios.md β updated Flow 1 trigger prompts:
* `bro hello` for fresh onboarding (1.1) β first mention of "bro"
triggers the persona and onboarding in one shot.
* `bro, switch to gitflow` for reonboard (1.4) β continuing in bro
mode, addressing bro again is optional but natural.
No subagent changes (architect/swe/pr-reviewer still reject direct
Human invocation per 5ed659b).
Tests still green: 235 + 23 + 16 + 3 + 30.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦ Other-synonyms User saw this live-rendered radio form: 1. Anonymous 2. I'll enter it β redundant; just redirects to Other 3. Type something. Two bugs: 1. **Placeholder options redirecting to Other.** AskUserQuestion auto-adds `Other` for free-text entry. Options like "I'll enter it" or "Type something else" are UI-confusing synonyms of Other and shouldn't be added by the skill. Main Claude was inventing them because the skill didn't explicitly forbid the pattern. 2. **Missing git-config identity source.** Only "Anonymous" + Other were offered. The local `git config --get user.name` is a perfect identity source the user already set themselves β should be pre-populated as option 2 when available. Fixes in skills/first-run-onboarding/SKILL.md: - New **Step 0: Probe local identity sources** β runs `git config --get user.name` before rendering the form. Caches result as `git_user_name`. - **Step 1 AskUserQuestion options** β conditional second option: if `git_user_name` is non-empty, append `Use "<name>"` with description "Detected from git config user.name". Explicit comment forbids placeholder synonym options for Other. - **Step 2 label mapping** β updated to parse `Use "<name>"` back into the raw name for identity_set. - **Context-leak rule** β rewrote to distinguish allowed source (git config) from forbidden sources (CC `# userEmail`, $USER, whoami, filesystem paths, LDAP/SSO). Lint updates: - Require `git config --get user.name` and `Detected from git config` literals in the skill. - Negative assertions: no "I'll enter it", no "Type something else" literals in the skill prompt (removed from guard language, rewrote the DO-NOT guidance without using those exact phrases). Tests: 235 unit + 23 integration + 16 hook + 3 agent-budget + 33 contract (5 new) β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦ym placeholders User saw this rendered live: 1. Yes, proceed (Recommended) 2. Use simple triage instead β Skip architect, let bro/SWE implement directly 3. Suggest different branch_id 4. Type something. Two bugs: 1. **Option 2 says "Skip architect"** β directly contradicts bro's "No bypass. Every code change routes through architect" rule. The triage label is a TEMPLATE-DEPTH knob, not an architect-bypass switch. Main Claude was inventing this option because the skill didn't have an explicit AskUserQuestion specification. 2. **Option 4 "Type something"** β same redundant Other-synonym we just fixed in first-run-onboarding. AskUserQuestion auto-adds Other for free text; placeholder synonyms confuse the UI. Fix in skills/branch-id-proposal/SKILL.md: - Added explicit `AskUserQuestion` block specifying the EXACT option set. The skill no longer leaves the radio form to LLM improvisation. - Conditional options: Downgrade-to-simple (when current is difficult) OR Upgrade-to-difficult (when current is simple), mutually exclusive. Both still go through architect β they just adjust template depth. - "Suggest different branch_id" routes to auto-Other for free text; validated against the existing regex; reject + re-ask on mismatch. - Hard-rule callout: "No Skip-architect option. Architect is mandatory for every code change per bro's routing contract." - Hard-rule callout: "Never add a placeholder option that just redirects to Other." - Added AskUserQuestion to the skill's allowed-tools frontmatter (was missing β that's why bro was improvising rather than calling the tool with a strict spec). - Handling-the-answer table makes the persist+spawn logic explicit per selection. Lint: 6 new assertions on branch-id-proposal/SKILL.md: - references AskUserQuestion (positive) - contains explicit "No Skip architect" prohibition (positive) - offers Downgrade-to-simple AND Upgrade-to-difficult (positive) - forbids "Skip architect β let" wording (negative) - forbids "Type something." placeholder literal (negative) Tests: 235 unit + 23 integration + 16 hook + 3 agent-budget + 33+6 contract = 39 lint β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦mode
Live dogfood: user typed "bro, write a todo cli" β main Claude wrote
todo.py directly without entering bro mode. No onboarding check, no
triage, no architect spawn, no MCP writes. The trigger word was
ignored.
Two probable causes:
1. CLAUDE.md changes don't take effect via /reload-plugins. CLAUDE.md
is loaded into the session's system prompt at session START; mid-
session reload may not pick it up. User needs a fresh session.
2. The previous trigger language was descriptive ("adopt the persona")
rather than prescriptive ("YOU MUST"). Main Claude treated "bro" as
casual addressing instead of a hard activation token.
Restructured CLAUDE.md to make the trigger rule impossible to miss:
- Top-of-file H1: "TMB PLUGIN β TRIGGER RULE (READ FIRST)" + subhead
"YOU MUST FOLLOW THIS RULE BEFORE RESPONDING TO ANY USER MESSAGE".
No room for the LLM to drift into "this is just guidance".
- Concrete examples of trigger phrases (the exact patterns we expect
to see, including "bro, write a todo cli" β the live one that
failed).
- 3-step protocol: announce "Entering bro mode." β adopt persona β
stay in mode for the session. The announcement makes activation
observable to the Human; missing it surfaces the bug immediately.
- Hard contract section: "If you see 'bro' in a message and respond
by directly writing source code (Write/Edit on todo.py, *.ts, *.py,
etc.), you have violated the contract." Names the exact failure
mode the Human just witnessed.
- Default-trigger rule: "If you're not sure whether 'bro' was a
trigger or casual usage: assume trigger." False positives cost
one extra MCP call; false negatives bypass the whole workflow.
Note for the user: CLAUDE.md changes need a fresh session to take
effect (not /reload-plugins). The next test should restart claude
fully before retrying "bro, write a todo cli".
Tests: all green (lint doesn't enforce CLAUDE.md content yet β out
of scope for this commit; could add later).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User clarification: - @bro is what users should learn (documented). - Bare-word "bro, do X" still works (graceful fallback, undocumented). - @bro MUST trigger reliably. Changes: **README.md** β added a "How to use it" section right after Install that shows the exact entry point: `@bro write a todo cli`. Explains that addressing bro activates the persona for the rest of the session. First trigger runs onboarding; subsequent code-touching asks route through the full workflow; casual asks are handled by regular Claude Code without TMB taking over. Also rewrote the agent roster section: - Stale "two agents globally + five editable placeholders" claim removed (left over from the v0.2 design). - New table makes the persona/subagent split explicit: bro = main Claude persona; architect/swe/pr-reviewer = constrained subagents spawned via Task tool. - Notes that subagents auto-reject direct @-mentions β talk to @bro, bro routes. **plugin/CLAUDE.md** β restructured trigger-rule examples to lead with the canonical `@bro` form. Bare-word and addressed forms ("bro,...", "hey bro") kept as supported fallbacks but explicitly labeled "undocumented". This way: - Users who learn from README only know @bro β the simplest mental model. - Users who type "bro" naturally still get the persona β no failure mode. - LLM gets a clear mental hierarchy: canonical example first, fallbacks after, all triggering the same activation. No code or test changes β this is doc-only refinement of how the trigger is presented to humans vs. honored by the LLM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User: "Shorten CLAUDE.md if this is in actual the prompt for bro only. It has many optional content like ceo, cto, etc. Keep bro focus on its job. Only let it handle the optional work when needed (we have skills, rules)." CLAUDE.md is loaded into main Claude's system prompt every triggered session β every byte costs context. The previous version carried runtime instructions AND extensive documentation in the same file. Splitting: **KEPT (essential bro runtime):** - Trigger rule (top, mandatory) β unchanged - Subagent precedence (1 line) β unchanged - Role statement (concise) - MCP caller identity rule - First-action chain (3 steps: identity_get + config_get β cache name β issue_resume) - Code-touching skill chain (lazy-regen-check β project-prescan β triage β branch-id-proposal β architect spawn) β names skills, defers detail to skill files - Direct ops list - Routing (3 destinations: architect, pr-reviewer, agent-creator) - Concerns-escalate rule - Catchphrase rule (compressed to one paragraph) - Communication style (compressed to one line) **REMOVED (moved to skills / docs / subagent prompts):** - Workspace boundary section (TMB-internal contributor info β belongs in CONTRIBUTING.md, not bro's runtime) - Full workflow-files table (concise version retained at the bottom; full table now references mcp/.../CONFIG_KEYS.md and FLOWS.md) - Persistence MCP details (already in mcp/trajectory-server/README.md) - Source code access control section (already enforced by hook scripts and called out in architect.md / swe.md prompts) - Full mode-rules section (covered by individual skills) - Detailed no-auto-action discipline (covered by individual skills + bro's role statement) - Full agent-creator flow protocol (lives in agent-creator skill) - Per-Human-request routing table (compressed to 3 destinations + the agent-creator fallback) - Full session-start triage heuristic block (compressed to 1 sentence) **Reference pointers retained:** - Subagent roster (concise 4-row table) - Where state lives (5-bullet summary, references CONFIG_KEYS.md and FLOWS.md for detail) - Code style (3 bullets) The skills referenced from CLAUDE.md (first-run-onboarding, tmb-reonboard, lazy-regen-check, project-prescan, branch-id-proposal, refresh-architecture, agent-creator) all already exist and carry the detail. CLAUDE.md just names them. bro's prompt now is laser-focused on what bro does at runtime, not on educating the LLM about the plugin's architecture. Tests: all green. Trigger rule + subagent precedence intact; lint unaffected (it doesn't enforce CLAUDE.md content). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User: "update swe to opus and I will manual test the workflow again". agents/swe.md: model: sonnet β opus CLAUDE.md: subagent roster table updated to reflect new model. Other models unchanged: architect (opus), pr-reviewer (opus). Only remaining sonnet reference is in agent-creator/SKILL.md as the DEFAULT suggestion for user-created domain agents (use opus only for reasoning-heavy roles) β that's not the four core subagents and stays. Tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live dogfood (8m ago): architect skipped the alignment loop AGAIN under auto-mode, wrote kind='decision' + task_create_batch with zero kind='question' rows. Same bug as 17824e9 + 419a1fa β prompt-level discipline isn't enough. Structural fix: move enforcement from the skill prompt to the MCP handler where the LLM literally cannot bypass it. **mcp/trajectory-server/src/tools/tasks.ts**: - task_create_batch now queries discussions for the issue before inserting. If zero kind='question' rows exist AND no waiver is set, returns `{error: 'scope_gate_violation'}` with a clear explanation and `questions_found: 0`. - New optional args: `waive_scope_gate: boolean` + `waive_scope_gate_reason: string` (min 10 chars). Lets architect explicitly bypass for truly trivial scope (typo fix, one-line doc change). - Waiver reason is logged to the ledger as `scope_gate_waived` event for audit β pr-reviewer / Human reviewers can grep ledger for inappropriate waivers. - inputSchema updated to document both waiver fields with guidance on when they're acceptable. **skills/architect-workflow/SKILL.md**: updated the Scope-ambiguity gate section to describe the structural enforcement + the waiver mechanism. Notes: "Default to not waiving." **Existing tests updated**: added the waiver to every existing task_create_batch call in test fixtures (27 call sites in tasks.test.ts + 9 others across unit + integration test files). One perl pass via `-0777 -pe` regex to inject `waive_scope_gate: true, waive_scope_gate_reason: 'unit-test synthetic scope; gate not under test'` right at the start of each args object. All 236 existing unit + previous integration tests stay green. **New Layer 2 scope-gate tests (4 cases)** in tests/mcp-integration/scope-gate.test.mjs: - rejects when 0 questions + no waiver (verifies the structural check) - accepts when kind='question' row exists - accepts when waiver + reason β₯10 chars - rejects waiver with missing OR too-short reason Now the live dogfood pattern that was bypassing the gate will fail at the MCP layer: task_create_batch returns scope_gate_violation, architect cannot create tasks, SWE cannot be spawned β forcing architect back to the alignment loop or explicit waiver. Tests: 236 unit + 27 integration (+4 new) + 16 hook + 4 budget + 39 contract lint β all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minor: the previous commit (140a355) rewrote the scope-gate section to describe the MCP-level enforcement but dropped the 'HARD RULE' literal the contract lint checks for. Restoring. All tests green post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦t eager-skills Benchmark: pure Claude solves 'write a todo cli by python' in 32 seconds. TMB's broβarchitectβswe chain on the SAME task took 5+ minutes and hung uninterruptibly (#55). Architect emitted a 55k-token spec; SWE cold-start to load that spec was the tail end of the hang. Three structural fixes, all ship in this commit: **1. Hard-cap spec_body at 8000 chars (mcp/trajectory-server/src/tools/tasks.ts)** Was 64000. Reality: a spec longer than ~8k is almost always a sign the task should be split. Over-long specs force SWE to spend its context budget reading instead of coding. Architect gets a clear error citing #55 and telling it to split or cite-don't-restate. This is MCP-level, not prompt-level β architect cannot talk its way around the cap. If a spec genuinely needs more, split into N tasks via parent_branch_id dependency. **2. Revert swe to sonnet (agents/swe.md, CLAUDE.md)** Undoes commit e2ce240. Opus on swe was a quality bet that came at a 2-3x latency cost at SWE's cold-start. For the 90% of tasks that are "implement this spec" mechanics, sonnet is indistinguishable from opus in output. Architect stays on opus (planning + alignment), pr- reviewer stays on opus (review depth). SWE back to sonnet. **3. Trim architect's eager skill-load list (agents/architect.md)** Was loading 6 skills on every spawn: architect-workflow, swe-spawn-workflow, validate-swe-output, agent-creator, roundtable, refresh-architecture. Removed the last three β they don't fire per spawn: - agent-creator is bro-invoked (when Human asks for a domain agent), not architect-invoked. - roundtable fires only on explicit cross-domain decisions, rare. - refresh-architecture is bro-invoked + auto on first code-touching ask, not per-architect-spawn. Architect now loads 3 skills instead of 6, halving its context load. **Skill docs update (skills/architect-workflow/SKILL.md)** Added "Spec-body brevity rule (HARD CAP at 8000 chars, enforced by MCP)" section explaining: - Cite existing code/conventions, don't restate them. - Reference industry-standard patterns by name, don't explain them. - Over-long specs push chain latency into the minutes range (cites #55). - Split into focused tasks via parent_branch_id when scope is genuinely large. Parallelizable via worktree isolation. **Test fixes** - `tasks.test.ts`: updated "rejects oversize spec_body" test from 64000 to 8000. Added a new boundary test: spec_body at exactly 8000 chars must succeed. Full suite: 237 unit (+1 boundary test) + 27 integration + 16 hook + 4 budget + 39 contract β all green. What still can't be fixed from our side: - Ctrl+C doesn't interrupt in-flight subagent inference (CC platform limitation). Issue #55 flags this for upstream filing. Meantime, `kill -9 <pid>` is the documented recovery. Closes #55 on the parts we can fix. Filed upstream concerns remain open on the CC issue tracker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- frontmatter: tool list ~40 β 20 (drop AskUserQuestion + 15 rarely-used MCP tools: issue_get/list/get_phase/report_md, task_first_actionable, audit_log, ledger_list, file_registry_*, architecture_regen/regen_state_*, identity_get, config_*, skill_*). Each MCP schema was injected into the subagent system prompt at cold-start; dropping them shrinks the per-spawn context tax. - frontmatter: eager skills 3 β 1 (kept architect-workflow; swe-spawn-workflow + validate-swe-output now load on-demand via Skill tool β needed only at SWE handoff + SWE return, not during planning). - architect-workflow: new "Simple Fast-Lane" section. Simple triage = narrow-scope ask. Architect picks defaults from a dimension β default-choice table (argparse, stdlib unittest, ~/.<app>/<file>.json, single-user laptop scope), records them as explicit spec Assumptions, and calls task_create_batch with waive_scope_gate=true + a reason naming the defaults. No env probe, no Q+A round-trip. - architect-workflow: env probe and full discussion phase explicitly scoped to the difficult path. Scope-ambiguity gate waiver's two legitimate uses (simple fast-lane + trivial doc/typo) spelled out. - architect-workflow: escalate simple β difficult triggers listed (multiple unrelated surfaces, architecture change, strategic defaults, spec >8k). Target: simple-triage architect phase drops from 20 tool calls / 50k tokens / 2m50s to 3β5 tool calls / β€20k tokens / β€30s. Measured via Mode B retest. Closes the architect-side of the 7m chain slowdown from #55. SWE-side (MCP cold-start + CC platform subagent spawn latency) is separate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three workflow agents (architect, swe, pr-reviewer) grant access to the
same MCP server (plugin_tmb_trajectory-server). The fully-qualified per-tool
form made the tools: line repeat the server prefix 7β13 times per agent β
visually painful and a maintenance burden every time a tool is added.
Collapse to the bare server name ("mcp__plugin_tmb_trajectory-server"), which
CC treats as "allow every tool this server exposes". Each tools: line is now
one short line.
Tradeoff: each subagent now has schemas for all ~30 server tools in its
cold-start context, not just the ~13 specific ones I had picked for architect
in a4fe045. In exchange we get readable frontmatter and no drift when new
MCP tools ship. Net cold-start tax is small compared to the architect
simple-fast-lane work β most of that cost was in the ceremonial phase, not
schema registration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bro kept double-encoding the protected_branches config: passing
value="[\"main\"]" (a string containing JSON) instead of value=["main"]
(an actual JSON array). The MCP handler then JSON.stringify'd it a second
time, storing "[\"main\"]" in value_json. git-guards.sh does
`jq -e 'type == "array"'` on that value, sees a string, and blocks every
Bash call with "BLOCKED: TMB plugin_config key protected_branches is
malformed JSON" β freezing bro and any subagent that touches Bash.
Two-sided fix:
1. skills/first-run-onboarding: rename the wire guidance from
"value=<JSON array>" (which LLMs read as 'pass a string containing
JSON') to "value=<array of strings>" with an inline worked example
(value=["main"]) and a one-liner explaining why pre-serializing is
wrong. Adds the same clarity to the code example block.
2. mcp/trajectory-server config_set handler: detect when args.value is a
string whose trimmed form starts with [ or { AND parses as valid JSON
to an object/array, and reject with a helpful error naming the correct
shape. Plain strings that happen to start with [ but aren't valid JSON
fall through and are stored verbatim as before.
Repro: a session whose onboarding ran BEFORE this fix has a corrupt
value_json. Repair with sqlite3 or wipe .claude/tmb/trajectory.db and
re-run bro onboarding.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
Apr 25, 2026
Supersedes the v0.2.0 placeholder tag (to be revoked on main-merge). ### Versions aligned - `.claude-plugin/plugin.json`: 0.3.2 β **0.1.0** - `mcp/trajectory-server/package.json`: 0.2.0 β **0.1.0** ### CHANGELOG.md Replaced the "Unreleased β first public release pending" placeholder with the comprehensive v0.1.0 entry summarizing everything that shipped across PRs #41 β #64: - bro-as-persona + planner + task gate doctrine - pr-reviewer-as-push-gate + git-push-guard.sh hook - Lego templates (6 agents in templates/agents/, plugin's agents/ empty) - planning-simple / planning-difficult split (was tmb_architect-workflow) - bundled SQLite trajectory MCP server (~30 tools) - requireRoles middleware on every workflow-state mutation - Direct Mode for trivial single-file edits - Three-layer test infrastructure (lint / MCP-integration / dogfood) - Documented latency budget in docs/PERFORMANCE.md - Documented "known not-done" linking the 4 main backlog items ### Post-merge release plan After this PR merges to dev + dev β main: 1. `git tag -d v0.2.0` 2. `git push origin :refs/tags/v0.2.0` 3. `gh release delete v0.2.0` 4. `git tag -a v0.1.0 main -m "..."` 5. `git push origin v0.1.0` 6. `gh release create v0.1.0 --notes-file CHANGELOG.md` Layer 1 + 2 tests still PASS. No code/skill/agent changes β only version bumps + release notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One PR covering everything needed to actually run the dogfood flow from a fresh clone:
docs/architecture/SCENARIOS.mdβ 30+ manual test scenarios mapped to the 9 flows inFLOWS.md. Each scenario has verbatim trigger prompts, prerequisites, expected behavior, and verification SQL. All four roundtable corner cases (explicit/implicit Γ planners-present/absent) are covered. Closes docs: dogfood test scenarios β trigger prompts mapped to FLOWS.mdΒ #40.package.jsondeclaresmcp/trajectory-serverandmonitorsas a workspace. Onebun installat plugin root now resolves both subpackages and produces a single consolidatedbun.lock. Closes Use Bun workspaces at plugin root + gitignore dogfood-generated pathsΒ #42..gitignoredogfood paths βdocs/trustmybot/excluded so dogfood-from-inside-repo runs never pollute commits..github/workflows/test.ymlβ simplified to three root-level steps (install / build / tests) now that workspaces work at root.docs/local-testing.mdPLUGIN_PATH fix β replaced the copy-paste trap (/absolute/path/to/trustmybot-plugin) withexport PLUGIN_PATH="$(pwd)"and a sanity-check echo. Hit during live dogfood: users were launching CC with the literal placeholder and getting zero plugins loaded.claude plugin validate; MCP server now actually loads.monitors.jsonβ stops the misleading "Monitor stream ended" noise; issue Reintroduce monitors/ as a proper long-running streaming daemonΒ #44 files the proper streaming rewrite.gatekeeperβbroβ closes the handle half of User entry path to gatekeeper: enforce the routing + shorten the mention handle (@bro)Β #36; drops the vestigialbro_nameschema column since bro is fixed plugin branding, not user-configurable.discussion_append.Why combined
Everything here is "make the dogfood flow runnable end-to-end" β test plan + infra + manifest correctness + agent behavior fixes surfaced by the dogfood itself. Splitting into 10 PRs would just create serial-merge churn with no review benefit.
Test plan
bun install --frozen-lockfileat plugin root resolves both subpackages.bun run buildat plugin root compiles the MCP server successfully.bash tests/run-all.sh: 235 MCP + 16 hook + 4 agent-budget lint β all green.claude plugin validate <path>passes for marketplace.json, plugin.json, hooks.json, .mcp.json.claude --plugin-dir ... -p "list tmb: agents"returns all 4 agents;plugin:tmb:trajectory-server: connected;config_setreturns structured role-enforcement error as designed.working-directory).docs/trustmybot/gitignored β confirmed no dogfood pollution.grep -ri gatekeeperβ 0 hits;grep -ri bro_nameβ 0 hits.Follow-up issues surfaced during this PR
@tmb:broinvocation; test whether persona persists across sessionCloses #40
Closes #42