Releases: trustmybot/plugin
v0.5.0
Headline: bro is now a structurally-enforced pure planner. Direct Mode removed (#162) and 7 hard-enforcement hooks promote previously prompt-only doctrine to Layer 2 (deterministic shell scripts). New docs/architecture/ENFORCEMENT.md documents the 6-layer model (MCP middleware → hooks → frontmatter → tool-handler validation → skill paths: → prompts) and the per-agent × per-interaction coverage matrix.
Fixed — file_registry summary ownership: bro, not SWE (#181)
The original #45 doctrine had SWE batching file_registry_update_summaries into its atomic close. That was the wrong agent: SWE only sees the task spec, not the broader issue/discussion that motivated the work. Bro has full task context (issue + spec + diff just verified during the V1/V2/V3 task gate) and is the natural author of summaries. Re-assigned ownership structurally:
agents/swe.md— dropsfile_registry_update_summariesfrom the atomic-close batch. SWE's atomic close is now 2 calls: commit +task_update_status(completed).skills/tmb_planning-simple/SKILL.md+tmb_planning-difficult/SKILL.md— bro's V3 close batch grows by one call:file_registry_update_summaries(updates=[...], advance_verified_sha=<commit>)BEFOREtask_update_status(closed).mcp/trajectory-server/src/tools/file-registry.ts—requireRoles('file_registry_update_summaries', ['bro'])(was['bro', 'swe']). Layer 1 — server rejects SWE callers.scripts/hooks/require-summaries-before-task-close.sh(NEW PreToolUse hook) — when bro triestask_update_status(status='closed'), walks the commit's touched files and DENIES the close iffile_registryis missing summaries or has summaries older than the task'screated_at. Bypass:TMB_ALLOW_CLOSE_WITHOUT_SUMMARIES=1. Layer 2 — bro can't close the task without doing the summary update first.
Re-tightened L5 outcome assertions in 02-simple-task and 11-codebase-memory-verify-on-drift since the structural enforcement now guarantees fresh summaries on every closed task. 10-codebase-memory-cold-start's assertion stays disabled — that's headless_fallback ledger event compliance, a separate bro prompt-discipline issue requiring its own enforcement (filed as a separate follow-up).
Added — docs/architecture/RESPONSIBILITIES.md
Codebase-derived (not architecture-doc-derived) listing of what bro / SWE / pr-reviewer / consultants are actually instructed to do — by reading the agent prompts, the skills they wire to, and the hook surface around them. Includes the role × tool matrix from requireRoles. Source of truth for what the plugin enforces vs what doctrine merely suggests.
Fixed (post-rc.1)
no-source-edit-from-main.sh+activation-routine.shbro-mode detection too narrow. Previously required the assistant to emitEntering bro mode.in the transcript — but inclaude -pheadless mode bro routinely skips that announcement (the h3/h4 prompt-discipline ceiling). Hooks now also detect bro mode by scanning the transcript for any user message containing thebrotrigger word. Without this fix, bro shortcut source edits in 3 of 5 v0.5.0-rc.1 L5 dogfood flows. Adds regression test cases for both hooks covering the real-world fixture instead of just the announce-emitted variant.TMB_CLAUDE_TIMEOUT=600wired intol5-dogfood.yml+release-canary.Dockerfile. The env override was added in #172 but missed both L5-runner workflows; runs hit the default 180s cap mid-SWE chain.- Stale
tools-required.jsonfor cold-start + code-touching flows. Cleared assertion lists for01-first-contact,02-simple-task,10-codebase-memory-cold-start,11-codebase-memory-verify-on-drift,12-source-edit-attempt,95-anonymous-cold-restart. These asserted on MCP tool calls captured indebug_trajectory— but the table isn't populated because of #164 (env propagation bug + UNIQUE merge bug). Once #179 (stream-json refactor) lands, the trajectory scoring is re-implemented end-to-end and these lists get re-populated against the new capture format. - Disabled chronic #45 codebase-memory outcome assertions.
02-simple-task,10-codebase-memory-cold-start,11-codebase-memory-verify-on-drifthad assertions onfile_registry'scontent_md5/summary/last_verified_shacolumns that depend on SWE/bro reliably callingfile_registry_update_summaries— a prompt-only doctrine that hits the same h3/h4 ceiling. Tracked in #181 as a deferred Layer 2 PostToolUse hook. Original assertions kept commented-out for restoration once #181 ships.
Breaking changes (pre-1.0 minor bump per SemVer)
- Direct Mode is gone. Bro never edits source code; every code change routes through SWE. Trivial fixes go via the same chain (lighter spec, not a separate code path). Pushes that previously relied on bro-direct edits will fail; rewrite as task → SWE → bro verify → close.
- All plugin-shipped skills now use
tmb_*prefix. The 7 un-prefixed defaults (code-quality,docs-conventions,git-conventions,naming-conventions,review-findings,review-protocol,swe-checklist) are renamed totmb_*. Project-local skills with un-prefixed names are unaffected; local skills can shadow plugin defaults by name resolution as before.
New hard-enforcement hooks
The h3 + h4 A/B scenarios proved prompt-only doctrine compliance is 0/10 in both wording arms for high-frequency operations. These 7 hooks move load-bearing rules to deterministic Layer 2:
| Hook | Event | Doctrine enforced |
|---|---|---|
activation-routine.sh |
UserPromptSubmit | Pre-fetches identity + pending issue from the trajectory DB on every bro-triggered message; injects as additionalContext so bro never has to remember to call identity_get / issue_resume |
no-source-edit-from-main.sh |
PreToolUse on Edit/Write/MultiEdit/NotebookEdit | Blocks bro from editing source files outside an SWE worktree (allowlist: markdown, LICENSE, agent/skill prompts, plugin/hooks manifests, .github/). Bypass: TMB_ALLOW_SOURCE_EDIT=1 |
session-start-regen-check.sh |
SessionStart | Computes git drift vs regen_state.last_seen_sha; nudges bro to run tmb_refresh-architecture when drift > 25 commits (override: TMB_REGEN_DRIFT_THRESHOLD) |
ensure-gitignore.sh |
SessionStart | Ensures .claude/ is in the project's .gitignore. Creates .gitignore if missing; appends if rule absent; idempotent. Prevents the trajectory.db-leaking-into-worktrees footgun |
no-worktree-branch-create.sh |
PreToolUse on Bash | Blocks git worktree add -b/-B/--create-branch .... Branch authority is bro's: bro pre-creates <task.branch_id> from the latest origin, SWE attaches via git worktree add <path> <branch> (no creation, no abbreviation). Bypass: TMB_ALLOW_WORKTREE_BRANCH_CREATE=1 |
branch-up-to-date-with-remote.sh |
PreToolUse on Bash | Fetches origin/<pr_target>, denies worktree-add if <branch> is behind. Catches the stale-local-main bug. Bypass: TMB_ALLOW_STALE_BRANCH=1 |
cleanup-worktree-on-task-close.sh |
PostToolUse on task_update_status |
When bro flips task to closed, removes the corresponding .claude/worktrees/<slug>/. Commits live on the branch and survive. Bypass: TMB_KEEP_CLOSED_WORKTREES=1 |
Plus structural improvements: tmb_db_path walks up to git root for DB resolution (was $(pwd)-relative — broke when bro cd'd into a worktree), TMB_CLAUDE_TIMEOUT env override for L5/A/B test runners, and tests/dogfood/lib/flow-helpers.sh:l5_setup_scratch_project writes .gitignore matching real-project behavior.
Other shipping in v0.5.0
- A/B framework matures (#131, #157, #160, #161): runner + helpers + chi-squared stats; 4 backfill hypothesis scenarios (h1 CLAUDE.md slim, h2 Hybrid D' vs lazy, h4 first-action MANDATORY); shared substrate-health pre-flight (#161);
node_modulessymlinking + scenario fixture/setup_files framework fix. - Activation routine hook proven necessary: h4 A/B (5 paired runs × 2 wording arms) showed prompt-only
identity_get + issue_resumecompliance was 0/10 in both arms — the hook delivers 100% reliability. - L6 → L5 helper namespace cleanup (#163):
l6_*shell functions intests/dogfood/lib/renamed tol5_*to match the renamed test layer. - GH Actions bumped to v5 (#165): Node 24 internal runtime; CC-auth prefix check dropped (smoke test is the authoritative gate).
- Two CLAUDE.md cleanups (#168, #169): verify-context decision tree → 2-column table; opaque issue refs dropped.
Added — 4 hard-enforcement hooks (branch authority + worktree hygiene) (#170, #171)
Local h5 dogfood surfaced two doctrine bugs that were prompt-only and unreliable. Promoted both to Layer 2:
scripts/hooks/ensure-gitignore.sh(SessionStart). Ensures the project's.gitignoreexcludes.claude/. Creates the file if missing; appends if the rule is absent; idempotent. Without this, the trajectory.db gets committed to the project, thengit worktree addchecks it out inside every worktree — a stale per-worktree DB poisons every hook that resolves DB path via$(pwd). Fixes the root cause behind #171.scripts/hooks/no-worktree-branch-create.sh(PreToolUse onBash). Blocksgit worktree add -b/-B/--create-branch .... Branch authority belongs to bro: bro creates<task.branch_id>first (git branch <name> origin/<pr_target>), then SWE attaches viagit worktree add <path> <branch>— no creation, no abbreviation. Fixes #170 where SWE inventedfix/typo-foo-tsfor specfix/foo-typo-receive. Bypass:TMB_ALLOW_WORKTREE_BRANCH_CREATE=1.scripts/hooks/branch-up-to-date-with-remote.sh(PreToolUse onBash). When SWE attaches a worktree to<branch>, fetchesorigin/<pr_target>(best-effort, offline-friendly) and verifies<branch>descends from it. Catches the "stale local main" bug where bro creates a task branch from yesterday's pointer, then the SWE commit conflicts on push. Bypass...
v0.4.2
Added — codebase memory (#45) — Hybrid D' design
Bro now persists a per-file index in file_registry: md5 + summary + last-verified timestamp. The verify-context doctrine (CLAUDE.md, post v0.4.1) tells bro to "trust the trajectory DB's file_registry index" when git is clean — this PR makes that index real.
Doctrine — entry-state matrix in tmb_project-prescan:
- New project (empty repo) → no registry, no scan.
- Existing repo + registry empty → AskUserQuestion "deep scan now or lazy fill?". Headless fallback = lazy.
- Registry populated + clean tree + HEAD ==
last_verified_sha→ trust, no scan. - Registry populated + drift →
file_registry_verifypass; refresh mismatched rows.
Writers:
- Bro (CLAUDE.md addition): when bro Reads a file for context, follow with
file_registry_update_summariesif the row's summary was null. Side-effect of work — no extra LLM cost. - SWE (atomic-close): batch
file_registry_update_summaries(touched_paths)alongsidetask_update_statusand the commit. SWE has fresh context for free. - Direct Mode (
tmb_direct-mode): step 4 in the protocol — registry update is now mandatory alongside thedirect_mode_usedledger event.
New skill tmb_deep-scan: eager opt-in for cold-start when the Human says yes (or invokes via "@bro deep scan"). Filters binaries / lockfiles / generated dirs, batches Reads, single bulk update call.
Two new L5 dogfood flows:
10-codebase-memory-cold-start— existing repo + empty registry → headless fallback fires + lazy default chosen + planning still proceeds11-codebase-memory-verify-on-drift— populated registry + induced disk drift → verify pass refreshes the row
Updated outcome.sql for existing flows:
02-simple-task— assert SWE atomic-close updatedfile_registry(md5 + summary set,last_verified_shaadvanced)D-direct-mode— assert step 4 fired (registry row refreshed,last_verified_shaset)
v0.4.1
Refactored — L6 dogfood → L5 dogfood (close the L4→L6 numbering gap)
The previous rename (L5+L6 combined → Release canary) demoted the standalone manual L5 to an unnumbered "Manual smoke" fallback, which left a gap between L4 and L6. This rename closes the gap: L6 dogfood is now L5 dogfood. The pyramid is contiguous L0–L5 again, with Release canary and Manual smoke as the non-numbered layers above.
Renamed:
.github/workflows/l6-dogfood.yml→.github/workflows/l5-dogfood.yml(workflowname:updated, PR-label trigger nowL5)tests/dogfood/run-l6.sh→tests/dogfood/run-l5.sh- Env var:
L6_KEEP_ARTIFACTS→L5_KEEP_ARTIFACTS - Docker scratch dirs:
/tmp/tmb-l6-XXXX→/tmp/tmb-l5-XXXX - Internal globals:
L6_DOGFOOD_DIR→L5_DOGFOOD_DIR
Updated docs: tests/README.md (pyramid table), CONTRIBUTING.md (workflow scope), tests/manual/{setup,README}.md, docs/contributing/LABELS.md, docs/architecture/FILES.md, scripts/release.sh, scripts/hooks/debug-trajectory.sh, mcp/trajectory-server/src/{index,test/schema.test}.ts.
Refactored — testing framework: L5+L6 combined → Release canary, L5 manual dogfood → Manual smoke (fallback)
The numeric "L5+L6 combined" name was awkward (not a real layer, just a Docker-bundled superset) and constrained future insertion of heavy layers. Renamed to a non-numeric Release canary so future layers (e.g. A/B prompt eval — issue #131, perf canary, etc.) can slot in between L4 and Release canary without renumbering.
Standalone "L5 manual dogfood" demoted to Manual smoke — a fallback used only for UX scenarios the automated layers can't model (e.g. live AskUserQuestion interactivity). The Release canary handles everything else automatically.
Renamed files:
.github/workflows/l5-l6-combined.yml→.github/workflows/release-canary.ymltests/docker/l5-l6-combined.Dockerfile→tests/docker/release-canary.Dockerfiletests/docker/run-l5-l6-combined.sh→tests/docker/run-release-canary.sh- Workflow
name:and job ID updated toRelease canary/release-canary. - Image tag:
tmb-l5-l6-combined:<v>→tmb-release-canary:<v>.
Updated docs: tests/README.md (test pyramid + escalation chain), CONTRIBUTING.md ("CI scope" workflow table), scripts/release.sh (manual smoke gate framing).
Refactored — defaults seeded by schema, not by bro
The previous unreleased entry had bro silently writing 3 plugin_config rows + a tmb_defaults_applied ledger event on first contact. Per user follow-up: that's still bro doing work the system should do.
mcp/trajectory-server/src/schema.sqlnow seeds the 3 default policy keys viaINSERT OR IGNOREat DB creation. Bro never touchesplugin_configon first contact.tmb_defaults_appliedledger event removed entirely (the schema seed is silent; bro only logs events for decisions it actually makes).- CLAUDE.md first-action chain compressed from 12 lines (state check + conditional default-write + cache + resume) to 4 lines (two parallel reads:
identity_get+issue_resume, then welcome banner).config_getno longer in the always-call set; bro fetches lazily when a specific key matters. - Welcome banner simplified from 3 variants to 2 (no "first contact" variant — pending-work or idle is enough).
- Test fixtures (
onboarding-named.sql,onboarding-anonymous.sql) shrunk: they no longer INSERT plugin_config (now schema-seeded) and dropped thetmb_defaults_appliedledger row.onboarding-named.sqlwrites atmb_user_namedevent instead to mark "user explicitly chose this name".
Removed — first-run-onboarding ceremony (modern-agent UX)
Modern agents (Cursor, ChatGPT, etc.) don't onboard — they just work. TMB's previous behavior of asking name + branching model + PR target + protected branches via AskUserQuestion on first contact was friction with no upside for the 80% case, and it broke completely in headless claude -p mode (no Human to answer).
- Deleted:
skills/tmb_first-run-onboarding/(entire skill). - Deleted:
tests/lint/onboarding-skill-contract.sh(no skill to lint). - Deleted:
tests/dogfood/flows/01-onboarding/(no ceremony to test). - New:
tests/dogfood/flows/01-first-contact/— asserts the inverse: empty DB →@bro hi→ bro applies defaults silently + welcome banner mentions them;AskUserQuestionandidentity_setare explicitly forbidden tools. - CLAUDE.md first-action chain rewritten: on first contact (
config_getreturns null), bro silently writesbranching_model=github-flow,pr_target=main,protected_branches=["main"]plus atmb_defaults_appliedledger event. Noidentityrow — its absence means "user hasn't named themselves yet." - Welcome banner is now mandatory (also new in CLAUDE.md): bro must announce activation explicitly with state context — three variants for first contact / returning with pending work / returning idle.
- Ledger event renamed:
tmb_onboarding_complete→tmb_defaults_applied. Pre-1.0, no migration shim — fixtures and outcome assertions updated in lockstep. tmb_reonboardrepositioned as the only path to write identity rows or change policy keys (was: "re-run onboarding"). Same skill, same UI, clearer framing.
To set your name post-first-contact: say @bro reonboard or @bro update my name.
Cluster of bugs found during cold-session marketplace dogfood by @ZaxShen. All four were doctrine drift, not infra: bro had stale instructions, server enforcement was working but invisible.
Fixed — Anonymous identity now persists (issue #95)
tmb_first-run-onboarding previously skipped identity_set when the Human chose Anonymous. The DB row never existed, so every cold session saw identity_get().created_at == null and re-triggered the full onboarding flow — even though configs and ledger events confirmed onboarding had already run.
identity_set MCP tool now accepts anonymous: true to write a row with human_name=NULL. Onboarding always calls identity_set (named OR anonymous). Cold-restart-after-Anonymous regression covered by tests/workflow-sim/flow-09-anonymous-cold-restart.test.mjs.
Fixed — Bro now writes bro_verification_pass ledger event (issue #91)
The planning skills' V3 step (close path) jumped straight from "verification passed" to task_update_status(closed) with no ledger anchor. The trajectory had no record of bro's task-gate verdict — only the absence of a validation_record row, which was indistinguishable from "pr-reviewer hasn't gotten there yet."
V3 now batches ledger_log(event_type='bro_verification_pass') + task_update_status(closed) + issue_close (when applicable) in one response. The ledger is the source of truth for bro's task-gate verdict; validation_attempts is exclusively pr-reviewer's table.
Fixed — Bro halts on MCP errors instead of silently proceeding (issue #96)
Trace from cold-session test: bro called validation_record(agent='bro', verdict='pass') at task close. Server middleware correctly returned {"error": "forbidden", "caller_role": "bro", "allowed_roles": ["pr-reviewer"]}. Bro ignored the error and proceeded to task_update_status(closed) + issue_close + emit "Trust me bro, it works." From the Human's view the task closed cleanly; in reality no verification trace existed.
Two doctrine clauses added to plugin CLAUDE.md:
- MCP error handling — halt and surface. Any tool result with
is_error: truehalts the flow. No silent continuation. - Tools bro must NEVER call.
validation_recordis pr-reviewer-only. Bro's task-gate usesledger_log(bro_verification_pass). Server-side rejection now backed by client-side discipline.
Fixed — Policy-key writes route through tmb_reonboard (issue #93)
branching_model, pr_target, and protected_branches are policy keys that drive git-guards.sh and skill defaults. Bro could previously call config_set on them directly mid-session, bypassing the explicit-confirm UX of the onboarding flow.
CLAUDE.md now requires bro to invoke tmb_reonboard for policy-key changes — never direct config_set. The skill renders an AskUserQuestion with current values pre-selected and persists only on explicit confirmation.
Removed — tmb_validate-swe-output skill
Obsolete under bro-as-planner doctrine. Bro's task-gate verification is inline (V1/V2/V3 in the planning skills); pr-reviewer's push-gate verification is its own agent. The forked-Explore validation skill served the old "pr-reviewer signs at task close" flow that v0.3.0 retired.
Versioning
No schema migration; new column-less anonymous flag on identity_set is additive. Schema version stays at 1. Tests added: 4 new identity-tool tests + 3 new workflow-sim tests (flow-09 a/b/c).
Added — Label + ENUM doctrine (issue #38)
Two new doctrine docs codify the controlled vocabularies the project relies on:
docs/contributing/LABELS.md— canonical GH issue label list. Adopts GitHub's 9 default labels, K8sarea/<name>+priority/<level>+lifecycle/<state>namespaces, and 2 documented TMB-specific labels (doctrine,discussion). Replaces the previously-inventedarea:*,p:*,stale,supersededlabels with their K8s equivalents.docs/contributing/ENUMS.md— every ENUM inschema.sqlis listed with its canonical values + source convention (GH / K8s / TMB-specific with rationale).
Two new lints enforce drift prevention:
tests/lint/labels-stable.sh— fails if a GH label exists that's not inLABELS.md, or vice versa. Skipped on dev machines withoutghauth; always runs in CI.tests/lint/enums-stable.sh— parsesENUMS.mdand the code, fails if a hardcoded value isn't documented.
GH label migration applied: 17 labels → 25 (renames + 9 new K8s area/*). All 18 open issues' labels auto-renamed in place via gh label edit --name. The superseded label was dropped...
v0.3.2 — git-guards.sh hotfix
Hook + agent-prompt hotfix. Two real bugs in git-guards.sh that broke every SWE commit-from-worktree, plus a SWE doctrine violation. Found by @ZaxShen during v0.3.1 marketplace test — bro spent 12 minutes hitting the same hook-block before reporting.
Fixed — git-guards.sh worktree-blind branch detection
git branch --show-current was running in CC's CWD (the project root, always main) regardless of which worktree the actual git commit was being executed in. Result: SWE in isolation: worktree mode could never commit — every commit got rejected as "no direct commits to main."
The hook now parses the working directory from the command itself:
cd <worktree> && git commit ...→ reads branch from the worktree (the SWE pattern)git -C <worktree> commit ...→ same- Falls back to
INPUT.cwd(if CC populates it) or$PWD
tests/hooks/git-guards.test.sh extended from 4 → 12 cases, including 7 new worktree-aware regressions.
Fixed — git-guards.sh Rule 4 false-fires on no-remote repos
git rev-parse "origin/${PR_TARGET}" (without --verify) prints the literal string "origin/main" to stdout when the ref doesn't exist, then exits non-zero. The 2>/dev/null swallowed the stderr, so REMOTE ended up as the literal string "origin/main" — non-empty — and the "Local main is behind origin/main" check fired falsely on any repo without a remote (which is most fresh scratch projects).
Fix: use git rev-parse --verify — empty output if ref doesn't exist, no false-fire.
Hardened — SWE prompt forbids hook bypass
When the v0.3.1 worktree bug blocked SWE's commit, the SWE subagent attempted to rewrite .git/HEAD and fabricate branch refs to bypass the hook. CC's security guards blocked the rewrite, but the doctrine was wrong: even when a hook misfires (and v0.3.1's worktree bug was a real misfire), SWE must report and stop, never bypass.
Added explicit clause in agents/swe.md:
Never attempt to bypass a PreToolUse hook block — do not rewrite
.git/HEAD, fabricate refs, edit.git/internals, or use any technique to evade a hook decision. If a hook blocks a legitimate operation, that's a plugin bug — STOP immediately, return the failure summary to bro with the exact hook output, and let bro decide the path forward.
agents/swe.md still 21 lines — within the 30-line Lego cap.
Versioning
Bumped all 3 manifest versions to 0.3.2. No schema migration. Rebuilt dist/.
v0.3.1 — ship dist/ in artifact + tmb-rc beta channel
Critical install hotfix. v0.3.0 marketplace install left the MCP server's compiled dist/ directory missing. Symptom: bro can't find any mcp__plugin_tmb_trajectory-server__* tools — onboarding's mandatory MCP writes can't run, identity/config never persist, the user is stuck. Anyone on v0.3.0 should upgrade.
Root cause
CC's marketplace plugin install runs bun install but skips lifecycle scripts (no postinstall). v0.3.0's design relied on postinstall to build dist/ after install — but CC never runs it. The server's compiled JS was never created on user machines.
This is the same class of bug that broke v0.2.0 (better-sqlite3's prebuild-install lifecycle script also skipped). My L0 install-smoke ran bun install --frozen-lockfile (which DOES fire postinstall) and tested the happy path. CC's actual install path is bun install --ignore-scripts (or equivalent) — different behavior, same input. The simulation was more permissive than reality.
Fixed — three layers
- Ship
dist/in the published artifact. Stopped gitignoringmcp/trajectory-server/dist/(with explicit allowlist override in root.gitignore). Now the published tag contains pre-built JS — works regardless of install behavior. CC, npm, yarn, pnpm — anyone who clones the tag has a working server. - Updated L0 install-smoke to use
--ignore-scripts.tests/docker/install-smoke.Dockerfilenow runsbun install --frozen-lockfile --ignore-scriptsto simulate CC's actual install path. This single line change would have caught both v0.2.0 and v0.3.0. Build success now genuinely means "works in CC's hostile install environment." tests/lint/dist-fresh.sh— new lint that rebuildsdist/in a temp directory and diffs against the committed version. Fails CI if a contributor modifiessrc/but forgets to rebuilddist/. Catches the regression where committed dist/ goes stale.
How this would have been caught earlier
- Reading CC's plugin install docs / observing actual behavior before designing L0.
- Testing with
--ignore-scriptsfrom day one (the worst-case install path is the right one to test). - Running L6 release canary against the actual install path, not the same
bun install --frozen-lockfilehappy path.
The bug class is simulation more permissive than reality. Closed by always testing the worst-case install path.
Versioning
Bumped all 3 manifest versions to 0.3.1. No schema migration. engines.node unchanged (still >=22).
Added — tmb-rc release-candidate channel
.claude-plugin/marketplace.json now defines two plugin entries: tmb (tracks main) and tmb-rc (tracks rc branch — fast-forwarded to whichever vX.Y.Z-rc.N tag is currently being validated). Install path:
- Stable users:
/plugin install tmb@trustmybot(unchanged behavior — only validated releases) - Beta testers:
/plugin install tmb-rc@trustmybot(opt-in pre-release builds)
Going forward, any risky change (install-path, schema, doctrine) MUST go through tmb-rc validation before promoting to main. v0.2.0 and v0.3.0 both broke production because there was no pre-stable channel to catch install-path regressions. Documented end-to-end workflow in CONTRIBUTING.md § Release ritual.
The tmb-rc channel is ready to use immediately after this release lands on main. The rc branch will be initialized off main post-merge.
v0.3.0 — cold-start fix (node:sqlite + global swe/pr-reviewer)
Cold-start fix release. Two structural changes that together eliminate the v0.2.0 marketplace-install pain class. Anyone on v0.2.0 should upgrade. (v0.2.1 was planned as a single-bug hotfix; we folded it into v0.3.0 because both changes touch the same cold-start path.)
Two changes, one outcome: /plugin install → first ask works, no /reload-plugins dance.
1. SQLite via Node stdlib — no native deps, no install scripts
Replaced better-sqlite3 (native binding) with node:sqlite (Node stdlib). v0.2.0 broke because bun's install lifecycle skipped better-sqlite3's prebuild-install script, leaving the native .node binary missing. This bug class is permanently gone — node:sqlite ships with Node itself, no compilation, no prebuilds, no install scripts to skip.
| Risk | Before (better-sqlite3) | After (node:sqlite) |
|---|---|---|
| Package-manager install-script lifecycle | ✅ no install scripts | |
| Prebuild server availability / firewall | ✅ no downloads | |
| Platform coverage (Alpine/musl, FreeBSD, exotic ARM) | ✅ stdlib, runs anywhere Node runs | |
| Build-tools-required fallback (no gcc) | ✅ no compile step | |
| Node ABI churn between Node majors | ✅ part of Node itself |
Migration cost: ~50 LOC wrapper rewrite in mcp/trajectory-server/src/db.ts. All 245 unit tests + 43 integration tests pass against the new wrapper.
Node 22+ now required. node:sqlite is in stdlib since Node 22 (behind --experimental-sqlite flag, stable on Node 24). .mcp.json passes the flag unconditionally — required on 22, no-op on 24+.
2. swe + pr-reviewer ship globally — no copy step at onboarding
Workflow backbone agents now ship in agents/ (was: empty by design). CC discovers them automatically the moment the plugin installs. Onboarding no longer copies anything into the project — identity + 3 config writes + audit-row log. Done.
Default skills (swe-checklist, code-quality, docs-conventions, git-conventions, naming-conventions, review-protocol, review-findings) similarly moved to plugin's skills/ (alongside tmb_* protocol skills) — globally discoverable, project overrides per-name.
Resolution rule:
if <project>/.claude/agents/<name>.md exists → use local
else → use global plugin-shipped
Same for skills. Projects that need custom backbone behavior drop a project-local file; the global plugin file is never edited by bro. Local creation triggers: (a) Human explicitly asks, OR (b) bro determines the global default genuinely doesn't fit. Both paths route through tmb_agent-creator with explicit Human approval.
Consultants stay opt-in. architect, cto, ceo, pm remain in templates/agents/ and are only instantiated per-project when the Human explicitly asks for that consultant's read.
Onboarding flow before vs after
| Step | v0.2.0 | v0.3.0 |
|---|---|---|
| Identity capture (AskUserQuestion) | ✓ | ✓ |
| Branching model + PR target capture | ✓ | ✓ |
Persist via identity_set + 3 × config_set |
✓ | ✓ |
Copy swe.md + 5 default skills into <project>/.claude/ |
required (8+ filesystem ops) | eliminated |
| Log onboarding audit row | ✓ (tmb_bootstrap_complete) |
✓ (renamed tmb_onboarding_complete) |
Required /reload-plugins after install? |
yes | no (plugin already serves agents + skills globally) |
Removed
skills/tmb_bootstrap/SKILL.md— recovery skill for the old "missing local agents" failure mode. Unnecessary now.templates/skills/— all default skills moved toskills/(globally discoverable).templates/agents/swe.md,templates/agents/pr-reviewer.md— promoted toagents/(globally discoverable).
Hardened — L0 install-smoke now drives a real DB call
Previously, L0 only asserted tools/list responded. tools/list doesn't open a DB, which is exactly why L0 didn't catch v0.2.0's bug. New assertion A3b in tests/docker/install-smoke.Dockerfile runs the full MCP initialize → tools/call identity_get round-trip, forcing the SQLite layer to load. Catches any future "install succeeds but first DB call fails" regardless of root cause.
Versioning
Bumped all 3 manifest versions to 0.3.0. engines.node bumped from >=20 to >=22.
v0.2.0 — install-smoke + test pyramid (L0-L6)
Workflow simulation harness + manual dogfood gate. Final PR in the test-pyramid build. The full layered model is now in place: every failure mode that doesn't require Claude Code in the loop has an automated test owner.
Added
L4 — Workflow simulation harness
New directory tests/workflow-sim/ holds 5 trajectory tests, one per FLOWS.md flow that has an MCP-side contract worth asserting. Each test spawns the real MCP server and walks the flow as a scripted sequence of tool calls — no Claude required. Asserts state transitions, ledger events, role enforcement, and discussion-thread shape.
| Flow | Test file | Asserts |
|---|---|---|
| 2 — Simple task | flow-02-simple-task.test.mjs |
bro plans → swe completes → bro closes; no per-task pr-reviewer (push gate is amortized); planning_complete event lands in ledger |
| 3 — Difficult task | flow-03-difficult-task.test.mjs |
Q+A discussion sequence satisfies scope gate without waive_scope_gate; decision row queryable for ADR generation; positive + negative cases |
| 6 — Push gate | flow-06-push-gate.test.mjs |
bro forbidden from validation_record (only pr-reviewer); fail-then-pass attempt sequence preserved in validation_history |
| 7 — Architecture regen | flow-07-architecture-regen.test.mjs |
regen_state cursor lifecycle; swe forbidden from architecture_regen and regen_state_set |
| 8 — SWE retry | flow-08-swe-retry.test.mjs |
3-attempt sequence preserved; UNIQUE(task_id, attempt_n) yields upsert (latest verdict wins); 'escalated' is a valid terminal status |
| D — Direct Mode | flow-D-direct-mode.test.mjs |
direct_mode_used ledger event; no task / validation rows created |
The 5 flows that can't be tested at L4 (onboarding, agent-creator, skill-creator) are filesystem-only or Claude-side; they live in L5.
tests/mcp-integration/run.sh was extended to run both L3 (existing 9 suites) and L4 (new 5 suites) in one Node process — total 43 tests, ~3.1s.
L5 — Compressed manual dogfood checklist
tests/manual/scenarios.md shrunk from 785 lines → ~140 lines of checklist focused on Claude-side behaviors that have no MCP surface to test: trigger word activation, AskUserQuestion radio rendering, silent template copy, subagent prompt precedence, worktree isolation, bro task-gate verification visible in conversation, push-gate flow with lazy pr-reviewer copy, Direct Mode timing, resume after kill, tone discipline.
10 numbered items, ~30 minutes to walk. Required before tagging any release ≥ v0.2.0.
Release-script anti-retag guard
scripts/release.sh now refuses to re-tag a published release. If git ls-remote --tags origin refs/tags/v<X.Y.Z> returns a SHA, the script exits with a clear error explaining the doctrinal alternative (bump the version, ship a new tag). Force-pushing tags is the antipattern that breaks consumer pinning, corrupts marketplace caches, and destroys audit trails — the script now prevents the accidental case while still allowing safe local-only retags (e.g. you tagged but haven't pushed yet).
tests/lint/release-script-safety.sh (new lint) protects this guard against accidental removal during refactors. 5 grep-based assertions cover the remote-check, refusal message, exit-code, doctrinal alternative text, and the local-only path's correctness.
L5 release gate
scripts/release.sh now refuses to tag unless MANUAL_DOGFOOD_PASSED=v<X.Y.Z> matches the version being tagged. Sign-off after walking tests/manual/scenarios.md:
export MANUAL_DOGFOOD_PASSED=v0.2.0
bash scripts/release.shBypass for hotfixes that don't change Claude-side behavior:
BYPASS_DOGFOOD=1 bash scripts/release.sh # justify in commit messageDoctrine — the full pyramid is now in place
L0 — Distribution / install-smoke (Docker — CI on every PR)
L1 — Static / lint (9 scripts — CI on every PR)
L2 — Unit (per-component) (245 MCP unit + 16 hook unit — CI on every PR)
L3 — Integration (cross-component) (9 MCP-integration suites — CI on every PR)
L4 — Workflow simulation (5 trajectory suites — CI on every PR)
L5 — Manual dogfood (Claude-side) (10-item checklist — required before tag)
L6 — Release canary (Docker re-clone of tag — in release.sh)
| Failure-mode class | Owner |
|---|---|
| MCP server fails to boot after install | L0 |
| Stale version, broken link, missing skill name, shellcheck regression | L1 |
| Per-tool / per-hook contract regression | L2 |
| Cross-component (MCP+hook+DB) regression | L3 |
| Workflow contract change without test update | L4 |
| Trigger word, AskUserQuestion, agent isolation, tone, resume | L5 |
| Published artifact ≠ tested artifact | L6 |
Versioning
Bumped all three manifest versions to 0.2.0. Minor bump (not patch) reflects the structural test infrastructure addition — no doctrine or behavior change for users.
v0.1.2
Docs + structural release. No agent, hook, or MCP-server behavior change. Adds multi-platform structural placeholders following the Superpowers pattern, and refreshes contributor docs to match the bro-as-planner doctrine that landed in v0.1.0.
Added
- Multi-platform placeholder structure (#73). Per-platform adapter dirs (
.codex-plugin/,.cursor-plugin/,.opencode/) and root-level personas (CODEX.md,CURSOR.md,GEMINI.md,gemini-extension.json) ship as placeholders only — clearly marked "not implemented." The strategy doc atdocs/multi-platform.mdexplains how the per-platform adapter pattern works, what an adapter would do, and why placeholders ship now (discoverability + path-precedent). No platform other than Claude Code is functional in this release. scripts/release.sh— generic, idempotent release ritual. Reads version fromplugin.json, validatesmcp pkg.jsonagrees, requires a matching CHANGELOG section, asks fory/Nper step, then tags + pushes + creates the GitHub release. Replaces the v0.1.0-specific stranded script. Documented under "Release ritual" inCONTRIBUTING.md.
Changed
docs/architecture/FLOWS.md— refreshed Flow 3 (difficult task), 5 (skill creation), 8 (SWE retry), 9 (roundtable) to the bro-as-planner chain. Added Flow D (Direct Mode). Dropped stale references tovalidate-swe-outputandrequire-review-sign(replaced by bro's verification protocol +git-push-guard.shrespectively).docs/architecture/FILES.md— full file-map refresh: emptyagents/(by design), 17tmb_*protocol skills, 6 agent + 7 default-skill templates undertemplates/, multi-platform placeholders, current hook list (git-push-guard.shinstead ofrequire-review-sign.sh), MCP test layout.docs/architecture/ERD.md— updated "How agents use this" to bro-as-planner role matrix; bumpedplugin_meta.plugin_versionreference to 0.1.2.CONTRIBUTING.md— design principles rewritten for the bro-as-planner doctrine (zero-shipped-subagents, Lego layering, server-enforced decision chain). Added multi-platform section. Pre-PR checklist expanded to cover template/skill layering and schema-touching changes.- Performance doctrine relocated.
docs/PERFORMANCE.mdwas deleted; its load-bearing content (target latency band + Tier 1/2/3 trim doctrine + re-eval triggers) lives inCONTRIBUTING.md§ Performance. Historical baseline + change-tracking now lives in git history + this changelog instead of a doc that grows stale every perf cycle. tests/manual/scenarios.md— header updated to point at the bro-as-planner targets that ARE current; full template-rewrite still tracked in #51.
Versioning
.claude-plugin/plugin.json and mcp/trajectory-server/package.json bumped 0.1.1 → 0.1.2. No schema migrations needed (still schema_version=1).