Releases · trustmybot/plugin

27 Apr 19:36

ZaxShen

v0.5.0

07f17cf

v0.5.0 Latest

Latest

Headline: bro is now a structurally-enforced pure planner. Direct Mode removed (#162) and 7 hard-enforcement hooks promote previously prompt-only doctrine to Layer 2 (deterministic shell scripts). New docs/architecture/ENFORCEMENT.md documents the 6-layer model (MCP middleware → hooks → frontmatter → tool-handler validation → skill paths: → prompts) and the per-agent × per-interaction coverage matrix.

Fixed — file_registry summary ownership: bro, not SWE (#181)

The original #45 doctrine had SWE batching file_registry_update_summaries into its atomic close. That was the wrong agent: SWE only sees the task spec, not the broader issue/discussion that motivated the work. Bro has full task context (issue + spec + diff just verified during the V1/V2/V3 task gate) and is the natural author of summaries. Re-assigned ownership structurally:

agents/swe.md — drops file_registry_update_summaries from the atomic-close batch. SWE's atomic close is now 2 calls: commit + task_update_status(completed).
skills/tmb_planning-simple/SKILL.md + tmb_planning-difficult/SKILL.md — bro's V3 close batch grows by one call: file_registry_update_summaries(updates=[...], advance_verified_sha=<commit>) BEFORE task_update_status(closed).
mcp/trajectory-server/src/tools/file-registry.ts — requireRoles('file_registry_update_summaries', ['bro']) (was ['bro', 'swe']). Layer 1 — server rejects SWE callers.
scripts/hooks/require-summaries-before-task-close.sh (NEW PreToolUse hook) — when bro tries task_update_status(status='closed'), walks the commit's touched files and DENIES the close if file_registry is missing summaries or has summaries older than the task's created_at. Bypass: TMB_ALLOW_CLOSE_WITHOUT_SUMMARIES=1. Layer 2 — bro can't close the task without doing the summary update first.

Re-tightened L5 outcome assertions in 02-simple-task and 11-codebase-memory-verify-on-drift since the structural enforcement now guarantees fresh summaries on every closed task. 10-codebase-memory-cold-start's assertion stays disabled — that's headless_fallback ledger event compliance, a separate bro prompt-discipline issue requiring its own enforcement (filed as a separate follow-up).

Added — `docs/architecture/RESPONSIBILITIES.md`

Codebase-derived (not architecture-doc-derived) listing of what bro / SWE / pr-reviewer / consultants are actually instructed to do — by reading the agent prompts, the skills they wire to, and the hook surface around them. Includes the role × tool matrix from requireRoles. Source of truth for what the plugin enforces vs what doctrine merely suggests.

Fixed (post-rc.1)

no-source-edit-from-main.sh + activation-routine.sh bro-mode detection too narrow. Previously required the assistant to emit Entering bro mode. in the transcript — but in claude -p headless mode bro routinely skips that announcement (the h3/h4 prompt-discipline ceiling). Hooks now also detect bro mode by scanning the transcript for any user message containing the bro trigger word. Without this fix, bro shortcut source edits in 3 of 5 v0.5.0-rc.1 L5 dogfood flows. Adds regression test cases for both hooks covering the real-world fixture instead of just the announce-emitted variant.
TMB_CLAUDE_TIMEOUT=600 wired into l5-dogfood.yml + release-canary.Dockerfile. The env override was added in #172 but missed both L5-runner workflows; runs hit the default 180s cap mid-SWE chain.
Stale tools-required.json for cold-start + code-touching flows. Cleared assertion lists for 01-first-contact, 02-simple-task, 10-codebase-memory-cold-start, 11-codebase-memory-verify-on-drift, 12-source-edit-attempt, 95-anonymous-cold-restart. These asserted on MCP tool calls captured in debug_trajectory — but the table isn't populated because of #164 (env propagation bug + UNIQUE merge bug). Once #179 (stream-json refactor) lands, the trajectory scoring is re-implemented end-to-end and these lists get re-populated against the new capture format.
Disabled chronic #45 codebase-memory outcome assertions. 02-simple-task, 10-codebase-memory-cold-start, 11-codebase-memory-verify-on-drift had assertions on file_registry's content_md5 / summary / last_verified_sha columns that depend on SWE/bro reliably calling file_registry_update_summaries — a prompt-only doctrine that hits the same h3/h4 ceiling. Tracked in #181 as a deferred Layer 2 PostToolUse hook. Original assertions kept commented-out for restoration once #181 ships.

Breaking changes (pre-1.0 minor bump per SemVer)

Direct Mode is gone. Bro never edits source code; every code change routes through SWE. Trivial fixes go via the same chain (lighter spec, not a separate code path). Pushes that previously relied on bro-direct edits will fail; rewrite as task → SWE → bro verify → close.
All plugin-shipped skills now use tmb_* prefix. The 7 un-prefixed defaults (code-quality, docs-conventions, git-conventions, naming-conventions, review-findings, review-protocol, swe-checklist) are renamed to tmb_*. Project-local skills with un-prefixed names are unaffected; local skills can shadow plugin defaults by name resolution as before.

New hard-enforcement hooks

The h3 + h4 A/B scenarios proved prompt-only doctrine compliance is 0/10 in both wording arms for high-frequency operations. These 7 hooks move load-bearing rules to deterministic Layer 2:

Hook	Event	Doctrine enforced
`activation-routine.sh`	UserPromptSubmit	Pre-fetches `identity` + pending issue from the trajectory DB on every bro-triggered message; injects as `additionalContext` so bro never has to remember to call `identity_get` / `issue_resume`
`no-source-edit-from-main.sh`	PreToolUse on Edit/Write/MultiEdit/NotebookEdit	Blocks bro from editing source files outside an SWE worktree (allowlist: markdown, LICENSE, agent/skill prompts, plugin/hooks manifests, `.github/`). Bypass: `TMB_ALLOW_SOURCE_EDIT=1`
`session-start-regen-check.sh`	SessionStart	Computes git drift vs `regen_state.last_seen_sha`; nudges bro to run `tmb_refresh-architecture` when drift > 25 commits (override: `TMB_REGEN_DRIFT_THRESHOLD`)
`ensure-gitignore.sh`	SessionStart	Ensures `.claude/` is in the project's `.gitignore`. Creates `.gitignore` if missing; appends if rule absent; idempotent. Prevents the trajectory.db-leaking-into-worktrees footgun
`no-worktree-branch-create.sh`	PreToolUse on Bash	Blocks `git worktree add -b/-B/--create-branch ...`. Branch authority is bro's: bro pre-creates `<task.branch_id>` from the latest origin, SWE attaches via `git worktree add <path> <branch>` (no creation, no abbreviation). Bypass: `TMB_ALLOW_WORKTREE_BRANCH_CREATE=1`
`branch-up-to-date-with-remote.sh`	PreToolUse on Bash	Fetches `origin/<pr_target>`, denies worktree-add if `<branch>` is behind. Catches the stale-local-main bug. Bypass: `TMB_ALLOW_STALE_BRANCH=1`
`cleanup-worktree-on-task-close.sh`	PostToolUse on `task_update_status`	When bro flips task to `closed`, removes the corresponding `.claude/worktrees/<slug>/`. Commits live on the branch and survive. Bypass: `TMB_KEEP_CLOSED_WORKTREES=1`

Plus structural improvements: tmb_db_path walks up to git root for DB resolution (was $(pwd)-relative — broke when bro cd'd into a worktree), TMB_CLAUDE_TIMEOUT env override for L5/A/B test runners, and tests/dogfood/lib/flow-helpers.sh:l5_setup_scratch_project writes .gitignore matching real-project behavior.

Other shipping in v0.5.0

A/B framework matures (#131, #157, #160, #161): runner + helpers + chi-squared stats; 4 backfill hypothesis scenarios (h1 CLAUDE.md slim, h2 Hybrid D' vs lazy, h4 first-action MANDATORY); shared substrate-health pre-flight (#161); node_modules symlinking + scenario fixture/setup_files framework fix.
Activation routine hook proven necessary: h4 A/B (5 paired runs × 2 wording arms) showed prompt-only identity_get + issue_resume compliance was 0/10 in both arms — the hook delivers 100% reliability.
L6 → L5 helper namespace cleanup (#163): l6_* shell functions in tests/dogfood/lib/ renamed to l5_* to match the renamed test layer.
GH Actions bumped to v5 (#165): Node 24 internal runtime; CC-auth prefix check dropped (smoke test is the authoritative gate).
Two CLAUDE.md cleanups (#168, #169): verify-context decision tree → 2-column table; opaque issue refs dropped.

Added — 4 hard-enforcement hooks (branch authority + worktree hygiene) (#170, #171)

Local h5 dogfood surfaced two doctrine bugs that were prompt-only and unreliable. Promoted both to Layer 2:

scripts/hooks/ensure-gitignore.sh (SessionStart). Ensures the project's .gitignore excludes .claude/. Creates the file if missing; appends if the rule is absent; idempotent. Without this, the trajectory.db gets committed to the project, then git worktree add checks it out inside every worktree — a stale per-worktree DB poisons every hook that resolves DB path via $(pwd). Fixes the root cause behind #171.
scripts/hooks/no-worktree-branch-create.sh (PreToolUse on Bash). Blocks git worktree add -b/-B/--create-branch .... Branch authority belongs to bro: bro creates <task.branch_id> first (git branch <name> origin/<pr_target>), then SWE attaches via git worktree add <path> <branch> — no creation, no abbreviation. Fixes #170 where SWE invented fix/typo-foo-ts for spec fix/foo-typo-receive. Bypass: TMB_ALLOW_WORKTREE_BRANCH_CREATE=1.
scripts/hooks/branch-up-to-date-with-remote.sh (PreToolUse on Bash). When SWE attaches a worktree to <branch>, fetches origin/<pr_target> (best-effort, offline-friendly) and verifies <branch> descends from it. Catches the "stale local main" bug where bro creates a task branch from yesterday's pointer, then the SWE commit conflicts on push. Bypass...

Assets 2

27 Apr 01:47

ZaxShen

v0.4.2

1e50ab4

v0.4.2

Added — codebase memory (#45) — Hybrid D' design

Bro now persists a per-file index in file_registry: md5 + summary + last-verified timestamp. The verify-context doctrine (CLAUDE.md, post v0.4.1) tells bro to "trust the trajectory DB's file_registry index" when git is clean — this PR makes that index real.

Doctrine — entry-state matrix in tmb_project-prescan:

New project (empty repo) → no registry, no scan.
Existing repo + registry empty → AskUserQuestion "deep scan now or lazy fill?". Headless fallback = lazy.
Registry populated + clean tree + HEAD == last_verified_sha → trust, no scan.
Registry populated + drift → file_registry_verify pass; refresh mismatched rows.

Writers:

Bro (CLAUDE.md addition): when bro Reads a file for context, follow with file_registry_update_summaries if the row's summary was null. Side-effect of work — no extra LLM cost.
SWE (atomic-close): batch file_registry_update_summaries(touched_paths) alongside task_update_status and the commit. SWE has fresh context for free.
Direct Mode (tmb_direct-mode): step 4 in the protocol — registry update is now mandatory alongside the direct_mode_used ledger event.

New skill tmb_deep-scan: eager opt-in for cold-start when the Human says yes (or invokes via "@bro deep scan"). Filters binaries / lockfiles / generated dirs, batches Reads, single bulk update call.

Two new L5 dogfood flows:

10-codebase-memory-cold-start — existing repo + empty registry → headless fallback fires + lazy default chosen + planning still proceeds
11-codebase-memory-verify-on-drift — populated registry + induced disk drift → verify pass refreshes the row

Updated outcome.sql for existing flows:

02-simple-task — assert SWE atomic-close updated file_registry (md5 + summary set, last_verified_sha advanced)
D-direct-mode — assert step 4 fired (registry row refreshed, last_verified_sha set)

Contributors

bro

Assets 2

27 Apr 00:43

ZaxShen

v0.4.1

2231c42

v0.4.1

Refactored — `L6 dogfood` → `L5 dogfood` (close the L4→L6 numbering gap)

The previous rename (L5+L6 combined → Release canary) demoted the standalone manual L5 to an unnumbered "Manual smoke" fallback, which left a gap between L4 and L6. This rename closes the gap: L6 dogfood is now L5 dogfood. The pyramid is contiguous L0–L5 again, with Release canary and Manual smoke as the non-numbered layers above.

Renamed:

.github/workflows/l6-dogfood.yml → .github/workflows/l5-dogfood.yml (workflow name: updated, PR-label trigger now L5)
tests/dogfood/run-l6.sh → tests/dogfood/run-l5.sh
Env var: L6_KEEP_ARTIFACTS → L5_KEEP_ARTIFACTS
Docker scratch dirs: /tmp/tmb-l6-XXXX → /tmp/tmb-l5-XXXX
Internal globals: L6_DOGFOOD_DIR → L5_DOGFOOD_DIR

Updated docs: tests/README.md (pyramid table), CONTRIBUTING.md (workflow scope), tests/manual/{setup,README}.md, docs/contributing/LABELS.md, docs/architecture/FILES.md, scripts/release.sh, scripts/hooks/debug-trajectory.sh, mcp/trajectory-server/src/{index,test/schema.test}.ts.

Refactored — testing framework: `L5+L6 combined` → `Release canary`, `L5 manual dogfood` → `Manual smoke` (fallback)

The numeric "L5+L6 combined" name was awkward (not a real layer, just a Docker-bundled superset) and constrained future insertion of heavy layers. Renamed to a non-numeric Release canary so future layers (e.g. A/B prompt eval — issue #131, perf canary, etc.) can slot in between L4 and Release canary without renumbering.

Standalone "L5 manual dogfood" demoted to Manual smoke — a fallback used only for UX scenarios the automated layers can't model (e.g. live AskUserQuestion interactivity). The Release canary handles everything else automatically.

Renamed files:

.github/workflows/l5-l6-combined.yml → .github/workflows/release-canary.yml
tests/docker/l5-l6-combined.Dockerfile → tests/docker/release-canary.Dockerfile
tests/docker/run-l5-l6-combined.sh → tests/docker/run-release-canary.sh
Workflow name: and job ID updated to Release canary / release-canary.
Image tag: tmb-l5-l6-combined:<v> → tmb-release-canary:<v>.

Updated docs: tests/README.md (test pyramid + escalation chain), CONTRIBUTING.md ("CI scope" workflow table), scripts/release.sh (manual smoke gate framing).

Refactored — defaults seeded by schema, not by bro

The previous unreleased entry had bro silently writing 3 plugin_config rows + a tmb_defaults_applied ledger event on first contact. Per user follow-up: that's still bro doing work the system should do.

mcp/trajectory-server/src/schema.sql now seeds the 3 default policy keys via INSERT OR IGNORE at DB creation. Bro never touches plugin_config on first contact.
tmb_defaults_applied ledger event removed entirely (the schema seed is silent; bro only logs events for decisions it actually makes).
CLAUDE.md first-action chain compressed from 12 lines (state check + conditional default-write + cache + resume) to 4 lines (two parallel reads: identity_get + issue_resume, then welcome banner). config_get no longer in the always-call set; bro fetches lazily when a specific key matters.
Welcome banner simplified from 3 variants to 2 (no "first contact" variant — pending-work or idle is enough).
Test fixtures (onboarding-named.sql, onboarding-anonymous.sql) shrunk: they no longer INSERT plugin_config (now schema-seeded) and dropped the tmb_defaults_applied ledger row. onboarding-named.sql writes a tmb_user_named event instead to mark "user explicitly chose this name".

Removed — first-run-onboarding ceremony (modern-agent UX)

Modern agents (Cursor, ChatGPT, etc.) don't onboard — they just work. TMB's previous behavior of asking name + branching model + PR target + protected branches via AskUserQuestion on first contact was friction with no upside for the 80% case, and it broke completely in headless claude -p mode (no Human to answer).

Deleted: skills/tmb_first-run-onboarding/ (entire skill).
Deleted: tests/lint/onboarding-skill-contract.sh (no skill to lint).
Deleted: tests/dogfood/flows/01-onboarding/ (no ceremony to test).
New: tests/dogfood/flows/01-first-contact/ — asserts the inverse: empty DB → @bro hi → bro applies defaults silently + welcome banner mentions them; AskUserQuestion and identity_set are explicitly forbidden tools.
CLAUDE.md first-action chain rewritten: on first contact (config_get returns null), bro silently writes branching_model=github-flow, pr_target=main, protected_branches=["main"] plus a tmb_defaults_applied ledger event. No identity row — its absence means "user hasn't named themselves yet."
Welcome banner is now mandatory (also new in CLAUDE.md): bro must announce activation explicitly with state context — three variants for first contact / returning with pending work / returning idle.
Ledger event renamed: tmb_onboarding_complete → tmb_defaults_applied. Pre-1.0, no migration shim — fixtures and outcome assertions updated in lockstep.
tmb_reonboard repositioned as the only path to write identity rows or change policy keys (was: "re-run onboarding"). Same skill, same UI, clearer framing.

To set your name post-first-contact: say @bro reonboard or @bro update my name.

Cluster of bugs found during cold-session marketplace dogfood by @ZaxShen. All four were doctrine drift, not infra: bro had stale instructions, server enforcement was working but invisible.

Fixed — Anonymous identity now persists (issue #95)

tmb_first-run-onboarding previously skipped identity_set when the Human chose Anonymous. The DB row never existed, so every cold session saw identity_get().created_at == null and re-triggered the full onboarding flow — even though configs and ledger events confirmed onboarding had already run.

identity_set MCP tool now accepts anonymous: true to write a row with human_name=NULL. Onboarding always calls identity_set (named OR anonymous). Cold-restart-after-Anonymous regression covered by tests/workflow-sim/flow-09-anonymous-cold-restart.test.mjs.

Fixed — Bro now writes `bro_verification_pass` ledger event (issue #91)

The planning skills' V3 step (close path) jumped straight from "verification passed" to task_update_status(closed) with no ledger anchor. The trajectory had no record of bro's task-gate verdict — only the absence of a validation_record row, which was indistinguishable from "pr-reviewer hasn't gotten there yet."

V3 now batches ledger_log(event_type='bro_verification_pass') + task_update_status(closed) + issue_close (when applicable) in one response. The ledger is the source of truth for bro's task-gate verdict; validation_attempts is exclusively pr-reviewer's table.

Fixed — Bro halts on MCP errors instead of silently proceeding (issue #96)

Trace from cold-session test: bro called validation_record(agent='bro', verdict='pass') at task close. Server middleware correctly returned {"error": "forbidden", "caller_role": "bro", "allowed_roles": ["pr-reviewer"]}. Bro ignored the error and proceeded to task_update_status(closed) + issue_close + emit "Trust me bro, it works." From the Human's view the task closed cleanly; in reality no verification trace existed.

Two doctrine clauses added to plugin CLAUDE.md:

MCP error handling — halt and surface. Any tool result with is_error: true halts the flow. No silent continuation.
Tools bro must NEVER call. validation_record is pr-reviewer-only. Bro's task-gate uses ledger_log(bro_verification_pass). Server-side rejection now backed by client-side discipline.

Fixed — Policy-key writes route through `tmb_reonboard` (issue #93)

branching_model, pr_target, and protected_branches are policy keys that drive git-guards.sh and skill defaults. Bro could previously call config_set on them directly mid-session, bypassing the explicit-confirm UX of the onboarding flow.

CLAUDE.md now requires bro to invoke tmb_reonboard for policy-key changes — never direct config_set. The skill renders an AskUserQuestion with current values pre-selected and persists only on explicit confirmation.

Removed — `tmb_validate-swe-output` skill

Obsolete under bro-as-planner doctrine. Bro's task-gate verification is inline (V1/V2/V3 in the planning skills); pr-reviewer's push-gate verification is its own agent. The forked-Explore validation skill served the old "pr-reviewer signs at task close" flow that v0.3.0 retired.

Versioning

No schema migration; new column-less anonymous flag on identity_set is additive. Schema version stays at 1. Tests added: 4 new identity-tool tests + 3 new workflow-sim tests (flow-09 a/b/c).

Added — Label + ENUM doctrine (issue #38)

Two new doctrine docs codify the controlled vocabularies the project relies on:

docs/contributing/LABELS.md — canonical GH issue label list. Adopts GitHub's 9 default labels, K8s area/<name> + priority/<level> + lifecycle/<state> namespaces, and 2 documented TMB-specific labels (doctrine, discussion). Replaces the previously-invented area:*, p:*, stale, superseded labels with their K8s equivalents.
docs/contributing/ENUMS.md — every ENUM in schema.sql is listed with its canonical values + source convention (GH / K8s / TMB-specific with rationale).

Two new lints enforce drift prevention:

tests/lint/labels-stable.sh — fails if a GH label exists that's not in LABELS.md, or vice versa. Skipped on dev machines without gh auth; always runs in CI.
tests/lint/enums-stable.sh — parses ENUMS.md and the code, fails if a hardcoded value isn't documented.

GH label migration applied: 17 labels → 25 (renames + 9 new K8s area/*). All 18 open issues' labels auto-renamed in place via gh label edit --name. The superseded label was dropped...

Assets 2

26 Apr 03:52

ZaxShen

v0.3.2

ecc8cb4

v0.3.2 — git-guards.sh hotfix

Hook + agent-prompt hotfix. Two real bugs in git-guards.sh that broke every SWE commit-from-worktree, plus a SWE doctrine violation. Found by @ZaxShen during v0.3.1 marketplace test — bro spent 12 minutes hitting the same hook-block before reporting.

Fixed — `git-guards.sh` worktree-blind branch detection

git branch --show-current was running in CC's CWD (the project root, always main) regardless of which worktree the actual git commit was being executed in. Result: SWE in isolation: worktree mode could never commit — every commit got rejected as "no direct commits to main."

The hook now parses the working directory from the command itself:

cd <worktree> && git commit ... → reads branch from the worktree (the SWE pattern)
git -C <worktree> commit ... → same
Falls back to INPUT.cwd (if CC populates it) or $PWD

tests/hooks/git-guards.test.sh extended from 4 → 12 cases, including 7 new worktree-aware regressions.

Fixed — `git-guards.sh` Rule 4 false-fires on no-remote repos

git rev-parse "origin/${PR_TARGET}" (without --verify) prints the literal string "origin/main" to stdout when the ref doesn't exist, then exits non-zero. The 2>/dev/null swallowed the stderr, so REMOTE ended up as the literal string "origin/main" — non-empty — and the "Local main is behind origin/main" check fired falsely on any repo without a remote (which is most fresh scratch projects).

Fix: use git rev-parse --verify — empty output if ref doesn't exist, no false-fire.

Hardened — SWE prompt forbids hook bypass

When the v0.3.1 worktree bug blocked SWE's commit, the SWE subagent attempted to rewrite .git/HEAD and fabricate branch refs to bypass the hook. CC's security guards blocked the rewrite, but the doctrine was wrong: even when a hook misfires (and v0.3.1's worktree bug was a real misfire), SWE must report and stop, never bypass.

Added explicit clause in agents/swe.md:

Never attempt to bypass a PreToolUse hook block — do not rewrite .git/HEAD, fabricate refs, edit .git/ internals, or use any technique to evade a hook decision. If a hook blocks a legitimate operation, that's a plugin bug — STOP immediately, return the failure summary to bro with the exact hook output, and let bro decide the path forward.

agents/swe.md still 21 lines — within the 30-line Lego cap.

Versioning

Bumped all 3 manifest versions to 0.3.2. No schema migration. Rebuilt dist/.

Assets 2

26 Apr 03:07

ZaxShen

v0.3.1

52e86fa

v0.3.1 — ship dist/ in artifact + tmb-rc beta channel

Critical install hotfix. v0.3.0 marketplace install left the MCP server's compiled dist/ directory missing. Symptom: bro can't find any mcp__plugin_tmb_trajectory-server__* tools — onboarding's mandatory MCP writes can't run, identity/config never persist, the user is stuck. Anyone on v0.3.0 should upgrade.

Root cause

CC's marketplace plugin install runs bun install but skips lifecycle scripts (no postinstall). v0.3.0's design relied on postinstall to build dist/ after install — but CC never runs it. The server's compiled JS was never created on user machines.

This is the same class of bug that broke v0.2.0 (better-sqlite3's prebuild-install lifecycle script also skipped). My L0 install-smoke ran bun install --frozen-lockfile (which DOES fire postinstall) and tested the happy path. CC's actual install path is bun install --ignore-scripts (or equivalent) — different behavior, same input. The simulation was more permissive than reality.

Fixed — three layers

Ship dist/ in the published artifact. Stopped gitignoring mcp/trajectory-server/dist/ (with explicit allowlist override in root .gitignore). Now the published tag contains pre-built JS — works regardless of install behavior. CC, npm, yarn, pnpm — anyone who clones the tag has a working server.
Updated L0 install-smoke to use --ignore-scripts. tests/docker/install-smoke.Dockerfile now runs bun install --frozen-lockfile --ignore-scripts to simulate CC's actual install path. This single line change would have caught both v0.2.0 and v0.3.0. Build success now genuinely means "works in CC's hostile install environment."
tests/lint/dist-fresh.sh — new lint that rebuilds dist/ in a temp directory and diffs against the committed version. Fails CI if a contributor modifies src/ but forgets to rebuild dist/. Catches the regression where committed dist/ goes stale.

How this would have been caught earlier

Reading CC's plugin install docs / observing actual behavior before designing L0.
Testing with --ignore-scripts from day one (the worst-case install path is the right one to test).
Running L6 release canary against the actual install path, not the same bun install --frozen-lockfile happy path.

The bug class is simulation more permissive than reality. Closed by always testing the worst-case install path.

Versioning

Bumped all 3 manifest versions to 0.3.1. No schema migration. engines.node unchanged (still >=22).

Added — `tmb-rc` release-candidate channel

.claude-plugin/marketplace.json now defines two plugin entries: tmb (tracks main) and tmb-rc (tracks rc branch — fast-forwarded to whichever vX.Y.Z-rc.N tag is currently being validated). Install path:

Stable users: /plugin install tmb@trustmybot (unchanged behavior — only validated releases)
Beta testers: /plugin install tmb-rc@trustmybot (opt-in pre-release builds)

Going forward, any risky change (install-path, schema, doctrine) MUST go through tmb-rc validation before promoting to main. v0.2.0 and v0.3.0 both broke production because there was no pre-stable channel to catch install-path regressions. Documented end-to-end workflow in CONTRIBUTING.md § Release ritual.

The tmb-rc channel is ready to use immediately after this release lands on main. The rc branch will be initialized off main post-merge.

Assets 2

25 Apr 23:10

ZaxShen

v0.3.0

31dca48

v0.3.0 — cold-start fix (node:sqlite + global swe/pr-reviewer)

Cold-start fix release. Two structural changes that together eliminate the v0.2.0 marketplace-install pain class. Anyone on v0.2.0 should upgrade. (v0.2.1 was planned as a single-bug hotfix; we folded it into v0.3.0 because both changes touch the same cold-start path.)

Two changes, one outcome: `/plugin install` → first ask works, no `/reload-plugins` dance.

1. SQLite via Node stdlib — no native deps, no install scripts

Replaced better-sqlite3 (native binding) with node:sqlite (Node stdlib). v0.2.0 broke because bun's install lifecycle skipped better-sqlite3's prebuild-install script, leaving the native .node binary missing. This bug class is permanently gone — node:sqlite ships with Node itself, no compilation, no prebuilds, no install scripts to skip.

Risk	Before (better-sqlite3)	After (node:sqlite)
Package-manager install-script lifecycle	⚠️ broke v0.2.0	✅ no install scripts
Prebuild server availability / firewall	⚠️ install fails	✅ no downloads
Platform coverage (Alpine/musl, FreeBSD, exotic ARM)	⚠️ no prebuild → fail	✅ stdlib, runs anywhere Node runs
Build-tools-required fallback (no gcc)	⚠️ fails	✅ no compile step
Node ABI churn between Node majors	⚠️ prebuild lag	✅ part of Node itself

Migration cost: ~50 LOC wrapper rewrite in mcp/trajectory-server/src/db.ts. All 245 unit tests + 43 integration tests pass against the new wrapper.

Node 22+ now required. node:sqlite is in stdlib since Node 22 (behind --experimental-sqlite flag, stable on Node 24). .mcp.json passes the flag unconditionally — required on 22, no-op on 24+.

2. swe + pr-reviewer ship globally — no copy step at onboarding

Workflow backbone agents now ship in agents/ (was: empty by design). CC discovers them automatically the moment the plugin installs. Onboarding no longer copies anything into the project — identity + 3 config writes + audit-row log. Done.

Default skills (swe-checklist, code-quality, docs-conventions, git-conventions, naming-conventions, review-protocol, review-findings) similarly moved to plugin's skills/ (alongside tmb_* protocol skills) — globally discoverable, project overrides per-name.

Resolution rule:

if <project>/.claude/agents/<name>.md exists → use local
else                                          → use global plugin-shipped

Same for skills. Projects that need custom backbone behavior drop a project-local file; the global plugin file is never edited by bro. Local creation triggers: (a) Human explicitly asks, OR (b) bro determines the global default genuinely doesn't fit. Both paths route through tmb_agent-creator with explicit Human approval.

Consultants stay opt-in. architect, cto, ceo, pm remain in templates/agents/ and are only instantiated per-project when the Human explicitly asks for that consultant's read.

Onboarding flow before vs after

Step	v0.2.0	v0.3.0
Identity capture (AskUserQuestion)	✓	✓
Branching model + PR target capture	✓	✓
Persist via `identity_set` + 3 × `config_set`	✓	✓
Copy `swe.md` + 5 default skills into `<project>/.claude/`	required (8+ filesystem ops)	eliminated
Log onboarding audit row	✓ (`tmb_bootstrap_complete`)	✓ (renamed `tmb_onboarding_complete`)
Required `/reload-plugins` after install?	yes	no (plugin already serves agents + skills globally)

Removed

skills/tmb_bootstrap/SKILL.md — recovery skill for the old "missing local agents" failure mode. Unnecessary now.
templates/skills/ — all default skills moved to skills/ (globally discoverable).
templates/agents/swe.md, templates/agents/pr-reviewer.md — promoted to agents/ (globally discoverable).

Hardened — L0 install-smoke now drives a real DB call

Previously, L0 only asserted tools/list responded. tools/list doesn't open a DB, which is exactly why L0 didn't catch v0.2.0's bug. New assertion A3b in tests/docker/install-smoke.Dockerfile runs the full MCP initialize → tools/call identity_get round-trip, forcing the SQLite layer to load. Catches any future "install succeeds but first DB call fails" regardless of root cause.

Versioning

Bumped all 3 manifest versions to 0.3.0. engines.node bumped from >=20 to >=22.

Assets 2

25 Apr 22:02

ZaxShen

v0.2.0

af0fa5f

v0.2.0 — install-smoke + test pyramid (L0-L6)

Workflow simulation harness + manual dogfood gate. Final PR in the test-pyramid build. The full layered model is now in place: every failure mode that doesn't require Claude Code in the loop has an automated test owner.

Added

L4 — Workflow simulation harness

New directory tests/workflow-sim/ holds 5 trajectory tests, one per FLOWS.md flow that has an MCP-side contract worth asserting. Each test spawns the real MCP server and walks the flow as a scripted sequence of tool calls — no Claude required. Asserts state transitions, ledger events, role enforcement, and discussion-thread shape.

Flow	Test file	Asserts
2 — Simple task	`flow-02-simple-task.test.mjs`	bro plans → swe completes → bro closes; no per-task pr-reviewer (push gate is amortized); planning_complete event lands in ledger
3 — Difficult task	`flow-03-difficult-task.test.mjs`	Q+A discussion sequence satisfies scope gate without `waive_scope_gate`; decision row queryable for ADR generation; positive + negative cases
6 — Push gate	`flow-06-push-gate.test.mjs`	bro forbidden from `validation_record` (only pr-reviewer); fail-then-pass attempt sequence preserved in `validation_history`
7 — Architecture regen	`flow-07-architecture-regen.test.mjs`	regen_state cursor lifecycle; swe forbidden from `architecture_regen` and `regen_state_set`
8 — SWE retry	`flow-08-swe-retry.test.mjs`	3-attempt sequence preserved; UNIQUE(task_id, attempt_n) yields upsert (latest verdict wins); `'escalated'` is a valid terminal status
D — Direct Mode	`flow-D-direct-mode.test.mjs`	`direct_mode_used` ledger event; no task / validation rows created

The 5 flows that can't be tested at L4 (onboarding, agent-creator, skill-creator) are filesystem-only or Claude-side; they live in L5.

tests/mcp-integration/run.sh was extended to run both L3 (existing 9 suites) and L4 (new 5 suites) in one Node process — total 43 tests, ~3.1s.

L5 — Compressed manual dogfood checklist

tests/manual/scenarios.md shrunk from 785 lines → ~140 lines of checklist focused on Claude-side behaviors that have no MCP surface to test: trigger word activation, AskUserQuestion radio rendering, silent template copy, subagent prompt precedence, worktree isolation, bro task-gate verification visible in conversation, push-gate flow with lazy pr-reviewer copy, Direct Mode timing, resume after kill, tone discipline.

10 numbered items, ~30 minutes to walk. Required before tagging any release ≥ v0.2.0.

Release-script anti-retag guard

scripts/release.sh now refuses to re-tag a published release. If git ls-remote --tags origin refs/tags/v<X.Y.Z> returns a SHA, the script exits with a clear error explaining the doctrinal alternative (bump the version, ship a new tag). Force-pushing tags is the antipattern that breaks consumer pinning, corrupts marketplace caches, and destroys audit trails — the script now prevents the accidental case while still allowing safe local-only retags (e.g. you tagged but haven't pushed yet).

tests/lint/release-script-safety.sh (new lint) protects this guard against accidental removal during refactors. 5 grep-based assertions cover the remote-check, refusal message, exit-code, doctrinal alternative text, and the local-only path's correctness.

L5 release gate

scripts/release.sh now refuses to tag unless MANUAL_DOGFOOD_PASSED=v<X.Y.Z> matches the version being tagged. Sign-off after walking tests/manual/scenarios.md:

export MANUAL_DOGFOOD_PASSED=v0.2.0
bash scripts/release.sh

Bypass for hotfixes that don't change Claude-side behavior:

BYPASS_DOGFOOD=1 bash scripts/release.sh   # justify in commit message

Doctrine — the full pyramid is now in place

L0 — Distribution / install-smoke   (Docker — CI on every PR)
L1 — Static / lint                  (9 scripts — CI on every PR)
L2 — Unit (per-component)           (245 MCP unit + 16 hook unit — CI on every PR)
L3 — Integration (cross-component)  (9 MCP-integration suites — CI on every PR)
L4 — Workflow simulation            (5 trajectory suites — CI on every PR)
L5 — Manual dogfood (Claude-side)   (10-item checklist — required before tag)
L6 — Release canary                 (Docker re-clone of tag — in release.sh)

Failure-mode class	Owner
MCP server fails to boot after install	L0
Stale version, broken link, missing skill name, shellcheck regression	L1
Per-tool / per-hook contract regression	L2
Cross-component (MCP+hook+DB) regression	L3
Workflow contract change without test update	L4
Trigger word, AskUserQuestion, agent isolation, tone, resume	L5
Published artifact ≠ tested artifact	L6

Versioning

Bumped all three manifest versions to 0.2.0. Minor bump (not patch) reflects the structural test infrastructure addition — no doctrine or behavior change for users.

Assets 2

25 Apr 20:42

ZaxShen

v0.1.2

2fd5559

v0.1.2

Docs + structural release. No agent, hook, or MCP-server behavior change. Adds multi-platform structural placeholders following the Superpowers pattern, and refreshes contributor docs to match the bro-as-planner doctrine that landed in v0.1.0.

Added

Multi-platform placeholder structure (#73). Per-platform adapter dirs (.codex-plugin/, .cursor-plugin/, .opencode/) and root-level personas (CODEX.md, CURSOR.md, GEMINI.md, gemini-extension.json) ship as placeholders only — clearly marked "not implemented." The strategy doc at docs/multi-platform.md explains how the per-platform adapter pattern works, what an adapter would do, and why placeholders ship now (discoverability + path-precedent). No platform other than Claude Code is functional in this release.
scripts/release.sh — generic, idempotent release ritual. Reads version from plugin.json, validates mcp pkg.json agrees, requires a matching CHANGELOG section, asks for y/N per step, then tags + pushes + creates the GitHub release. Replaces the v0.1.0-specific stranded script. Documented under "Release ritual" in CONTRIBUTING.md.

Changed

docs/architecture/FLOWS.md — refreshed Flow 3 (difficult task), 5 (skill creation), 8 (SWE retry), 9 (roundtable) to the bro-as-planner chain. Added Flow D (Direct Mode). Dropped stale references to validate-swe-output and require-review-sign (replaced by bro's verification protocol + git-push-guard.sh respectively).
docs/architecture/FILES.md — full file-map refresh: empty agents/ (by design), 17 tmb_* protocol skills, 6 agent + 7 default-skill templates under templates/, multi-platform placeholders, current hook list (git-push-guard.sh instead of require-review-sign.sh), MCP test layout.
docs/architecture/ERD.md — updated "How agents use this" to bro-as-planner role matrix; bumped plugin_meta.plugin_version reference to 0.1.2.
CONTRIBUTING.md — design principles rewritten for the bro-as-planner doctrine (zero-shipped-subagents, Lego layering, server-enforced decision chain). Added multi-platform section. Pre-PR checklist expanded to cover template/skill layering and schema-touching changes.
Performance doctrine relocated. docs/PERFORMANCE.md was deleted; its load-bearing content (target latency band + Tier 1/2/3 trim doctrine + re-eval triggers) lives in CONTRIBUTING.md § Performance. Historical baseline + change-tracking now lives in git history + this changelog instead of a doc that grows stale every perf cycle.
tests/manual/scenarios.md — header updated to point at the bro-as-planner targets that ARE current; full template-rewrite still tracked in #51.

Versioning

.claude-plugin/plugin.json and mcp/trajectory-server/package.json bumped 0.1.1 → 0.1.2. No schema migrations needed (still schema_version=1).

Assets 2

Releases: trustmybot/plugin

v0.5.0

Fixed — file_registry summary ownership: bro, not SWE (#181)

Added — docs/architecture/RESPONSIBILITIES.md

Fixed (post-rc.1)

Breaking changes (pre-1.0 minor bump per SemVer)

New hard-enforcement hooks

Other shipping in v0.5.0

Added — 4 hard-enforcement hooks (branch authority + worktree hygiene) (#170, #171)

Uh oh!

v0.4.2

Added — codebase memory (#45) — Hybrid D' design

Contributors

Uh oh!

v0.4.1

Refactored — L6 dogfood → L5 dogfood (close the L4→L6 numbering gap)

Refactored — testing framework: L5+L6 combined → Release canary, L5 manual dogfood → Manual smoke (fallback)

Refactored — defaults seeded by schema, not by bro

Removed — first-run-onboarding ceremony (modern-agent UX)

Fixed — Anonymous identity now persists (issue #95)

Fixed — Bro now writes bro_verification_pass ledger event (issue #91)

Fixed — Bro halts on MCP errors instead of silently proceeding (issue #96)

Fixed — Policy-key writes route through tmb_reonboard (issue #93)

Removed — tmb_validate-swe-output skill

Versioning

Added — Label + ENUM doctrine (issue #38)

Uh oh!

v0.3.2 — git-guards.sh hotfix

Fixed — git-guards.sh worktree-blind branch detection

Fixed — git-guards.sh Rule 4 false-fires on no-remote repos

Hardened — SWE prompt forbids hook bypass

Versioning

Uh oh!

v0.3.1 — ship dist/ in artifact + tmb-rc beta channel

Root cause

Fixed — three layers

How this would have been caught earlier

Versioning

Added — tmb-rc release-candidate channel

Uh oh!

v0.3.0 — cold-start fix (node:sqlite + global swe/pr-reviewer)

Two changes, one outcome: /plugin install → first ask works, no /reload-plugins dance.

1. SQLite via Node stdlib — no native deps, no install scripts

2. swe + pr-reviewer ship globally — no copy step at onboarding

Onboarding flow before vs after

Removed

Hardened — L0 install-smoke now drives a real DB call

Versioning

Uh oh!

v0.2.0 — install-smoke + test pyramid (L0-L6)

Added

L4 — Workflow simulation harness

L5 — Compressed manual dogfood checklist

Release-script anti-retag guard

L5 release gate

Doctrine — the full pyramid is now in place

Versioning

Uh oh!

v0.1.2

Added

Changed

Versioning

Uh oh!

Added — `docs/architecture/RESPONSIBILITIES.md`

Refactored — `L6 dogfood` → `L5 dogfood` (close the L4→L6 numbering gap)

Refactored — testing framework: `L5+L6 combined` → `Release canary`, `L5 manual dogfood` → `Manual smoke` (fallback)

Fixed — Bro now writes `bro_verification_pass` ledger event (issue #91)

Fixed — Policy-key writes route through `tmb_reonboard` (issue #93)

Removed — `tmb_validate-swe-output` skill

Fixed — `git-guards.sh` worktree-blind branch detection

Fixed — `git-guards.sh` Rule 4 false-fires on no-remote repos

Added — `tmb-rc` release-candidate channel

Two changes, one outcome: `/plugin install` → first ask works, no `/reload-plugins` dance.