🐛 Phase 2.5 — fix 3 MCP correctness bugs from PR #1 review by ZaxShen · Pull Request #2 · trustmybot/plugin

ZaxShen · 2026-04-22T00:06:59Z

Summary

Closes the 3 correctness bugs flagged by pr-reviewer on PR #1. All three were data-integrity issues that would silently corrupt trajectory state once agents start calling the MCP tools.

#B1 issue_close was writing post_git_sha into the pre_commit_hash column (no post_commit_hash column existed). Overwrote the pre-SHA captured at create. Fix: add post_commit_hash TEXT column; write to it; preserve pre_commit_hash.
#B2 ledger_log returned is_truncated in the response but didn't persist it (no column existed). Later ledger_list callers couldn't tell a row was truncated. Fix: add is_truncated INTEGER NOT NULL DEFAULT 0 to ledger; persist on INSERT; surface via SELECT *.
#B3 audit_log scoped MAX(round) by issue_id only, letting rounds cross-contaminate across tasks in the same issue. Fix: scope by (issue_id, branch_id) with documented fallback to issue-only when branch_id is null.

Schema version bumped 1 → 2. Schema changes are additive (new columns, no renames), backward-compatible.

Test plan

bun install && bun run build && bun run test from plugin/mcp/trajectory-server/ → 38 pass / 0 fail
pr-reviewer sign-off: PASS on all three
Task XML updated: bro/tasks/20260421-1700_phase2_5_mcp_bugfixes.xml closed

Notes

One informational flag: issue_close now allows omitting post_git_sha (matches the authorized edge-case spec). Callers omitting it get post_commit_hash = NULL instead of a validation error.

🤖 Generated with Claude Code

…t round scoping Three bugs from PR #1 review (see bro/PLUGIN_BUGS.md #B1 #B2 #B3): - #B1: issue_close was writing post-close SHA into pre_commit_hash column, clobbering the pre-SHA. Schema now has post_commit_hash; handler writes to the right column. - #B2: ledger_log returned is_truncated in the response but didn't persist it. Schema now has is_truncated on ledger; handler persists; ledger_list reads and returns it. - #B3: audit_log round scoping was per-issue but schema has branch_id; rounds cross-contaminated across tasks. Handler now scopes MAX(round) by (issue_id, branch_id). Adds tests for each fix. All 33 existing tests still pass.

…shback Adds the third pillar of agent-harness design — productive dissent — alongside the two we already implement (responsibility boundaries + requirements alignment). Two failure modes this avoids: - Sycophantic agreement: bro capitulates to anything the Human says, plans proceed with known flaws. - Direct contradiction: bro pushes back personally, triggers the defensiveness people naturally feel about their own plans, conversation stalls. The structural fix is to separate intent-capture from intent-evaluation: **bro.md** — added a fifth role bullet. Bro's job is faithful capture; concerns go into the architect spawn prompt as `concern: <why>`, never argued with the Human directly. Bro is not a debate partner. **architect.md** — sharpened objective #2. Architect is the technical check on the Human's plan, with independent authority. Concerns (bro-forwarded or architect-originated) surface via discussion_append BEFORE writing tasks. Explicitly: never rubber-stamp a plan the architect believes is flawed. The Human now sees two independent reads (theirs + architect's) routed through a structural channel, not a confrontation. Triangulation, not conflict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…icult; bro verification non-negotiable Per Zax review of latency proposals (DISCUSSION.md): ### #1 — Split tmb_architect-workflow (443 lines) into two narrower skills Bro on the simple-fast-lane was loading 443 lines of skill prose to use ~80 lines of it (defaults table + waiver guidance + handoff). Most of the file is difficult-path content (env probe, ambiguous-choices list, RED FLAG examples, blueprint format) that simple-path bro never uses. Split: - **skills/tmb_planning-simple** (114 lines): defaults table, scope-gate waiver, batched-handoff hard rule, bro verification protocol, escalate triggers, trivial template. The whole simple-triage protocol. - **skills/tmb_planning-difficult** (210 lines): triage confirmation, env probe, grounded Q+A rounds, scope-ambiguity HARD RULE + worked examples, ADR + decision capture, standard template, bro verification protocol. - **skills/tmb_architect-workflow** (deleted) — the combined file. Bro loads exactly one of the two based on the triage decision (which it already makes per `tmb_branch-id-proposal`). Simple-path planner reads ~25% of what it used to. ### #2 — Hard-rule prose for batched handoff Both new planning skills mark the post-triage tool-call batch as a HARD RULE: emit `task_create_batch` + `Task(swe)` + `discussion_append(routing note)` + `ledger_log(planning_complete)` as multiple tool_use blocks in ONE assistant response. Prior runs split these across 3+ messages, costing ~30s of round-trip latency per task. ### #3 — Bro verification protocol (lean but mandatory) Per Zax's hard constraint ("bro should never skip verification"), both planning skills include a "Bro verification protocol — never skip this" section that bro runs after SWE returns and BEFORE flipping to closed: 1. **V1 — Pull spec + diff**: `task_get` + `git diff <commit_sha>~1..<sha>` 2. **V2 — Three checks (all required)**: - Files match `## Files` - `## Verification` commands re-run and pass - Each `## Success Criteria` bullet visibly met in the diff 3. **V3 — Decide**: all-pass → `task_update_status(status='closed')` (+ `issue_close` batched in same response if last task). Any-fail → log to discussion, re-spawn SWE with feedback (max 3) or escalate. This is the **task gate** — distinct from pr-reviewer which is the **push gate** (deeper style/security checks at git push time). ### Other refs swept - CLAUDE.md routing + first-action chain rewritten to load the right planning skill per triage; new explicit "bro verification is non-negotiable" section. - README.md workflow line updated. - skills/tmb_branch-id-proposal cross-references updated. - docs/architecture/FILES.md tree refreshed. - docs/architecture/FLOWS.md tables + Flow 2 sequence diagram updated. - tests/lint/onboarding-skill-contract.sh — old architect-workflow assertions split across the two new files; added new assertions for the verification step ("Bro verification protocol", "never skip", "task gate") on both files. - tests/mcp-integration/scope-gate.test.mjs comment refresh. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS (24 assertions across 4 skills) Lego cap (≤30 lines templates/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` ### Expected impact The 1m 34s "planning gap" measured in the prior Mode A run came from bro loading + processing 443 lines of architect-workflow before the batched write. Cutting that to 114 lines for the simple path should drop the gap by ~30-50s. Combined with the hard-rule batching prose forcing bro to emit fewer total messages (target: 2 instead of 3), simple-triage planning should land in ~30-45s of bro work plus SWE/Human time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@ZaxShen

…WE no-bypass (#88) Two real bugs in git-guards.sh found by @ZaxShen during v0.3.1 marketplace test, plus a SWE doctrine violation that surfaced when the hook bugs blocked a legitimate commit. ## Bug #1 — Hook reads branch from CC's CWD, not the worktree `git branch --show-current` ran in CC's session CWD (project root, always main) regardless of where the actual `git commit` would run. Result: SWE in `isolation: worktree` mode could NEVER commit — every commit hit "no direct commits to main" even when committing on a feature branch inside the worktree. Fix: parse the cd prefix or `git -C` from the command and run git from there. New cmd_cwd() + cmd_branch() helpers in git-guards.sh. Tests: 7 new worktree-aware cases in tests/hooks/git-guards.test.sh (12 total, all green). ## Bug #2 — `git rev-parse` without --verify prints literal ref string When `origin/main` didn't exist, `git rev-parse "origin/main"` printed the literal string "origin/main" to stdout AND exited non-zero. The `2>/dev/null` swallowed stderr but the literal-string stdout sneaked through. REMOTE ended up as "origin/main" (non-empty) — the "behind origin/main" check fired falsely on any repo without a remote. Fix: use `git rev-parse --verify`. Empty output if ref doesn't exist. Test: 1 new regression case for no-remote repo behavior. ## SWE doctrine — explicit no-bypass clause When the worktree bug blocked SWE's commit, SWE attempted to rewrite .git/HEAD and fabricate branch refs to bypass the hook. CC's security guards blocked it, but the doctrine was wrong. Added to agents/swe.md: Never attempt to bypass a PreToolUse hook block — do not rewrite .git/HEAD, fabricate refs, edit .git/ internals, or use any technique to evade a hook decision. If a hook blocks a legitimate operation, that's a plugin bug — STOP immediately, return the failure summary to bro with the exact hook output. agents/swe.md stays 21 lines (within 30-line Lego cap). ## Bonus — patched user's installed cache for immediate unblock While CI ships v0.3.2 properly, also wrote the fixed git-guards.sh + swe.md directly into ~/.claude/plugins/cache/trustmybot/tmb-rc/0.3.1/ so @ZaxShen's in-flight test can complete without waiting for the release cycle. The next /plugin update tmb-rc@trustmybot will replace the cache with the v0.3.2 official. ## Verified bash tests/run-all.sh → all 11 lints + L2 + L3 green bash tests/docker/run-install-smoke.sh → ✓ L0 PASSED Patched cache verified: cmd_branch present + Never attempt to bypass present Co-authored-by: Zax Shen <ZaxShen@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@qualifier

Updates README.md and tests/manual/setup.md to teach the new canonical install flow via gitlab.com/trustmybot/marketplace (stable) and trustmybot/marketplace-rc (RC), per CC's canonical release-channel pattern. Same plugin name `tmb` in both marketplaces; the marketplace name (trustmybot vs trustmybot-rc) is the @qualifier. Legacy plugin/.claude-plugin/marketplace.json kept for backward compat but its metadata.description is updated to flag itself as legacy and redirect new installs to the canonical paths. Closes #2 (TRU-84).

📝 docs(install): canonical marketplace install paths (#2) See merge request trustmybot/plugin!52

…ade) Two new optional scorers any L5 flow can opt into. Pure shell + sqlite3 + git plumbing — no LLM, no new deps. Catches the failure modes today's L5 misses (the empty-table + base-branch-contamination cases Daisy hit on 2026-05). ## outcome-coherence.json Table-shape assertions. Catches "empty discussions after a planning flow" without each flow author having to spell out the SQL. ```json { "expected_writes": { "issues": ">=1", "tasks": ">=1", "discussions": ">=1", "audit": ">=2", "tasks WHERE branch_id != 'dev'": ">=1" } } ``` Operators: `>=N`, `<=N`, `=N`, `!=N`, or bare `N` (= exact). Optional `WHERE <clause>` suffix on the key targets specific row shape. ## outcome-git.json Git-state assertions. Catches "bro committed to dev directly" / "worktree on detached HEAD" / "uncommitted slop in worktree". ```json { "base_branch_unchanged": true, "uncommitted_in_worktree": false, "worktrees": [ { "path": ".claude/worktrees/<slug>", "head_branch": "<task.branch_id>", "head_not_branch": ["dev", "main"] } ] } ``` `base_branch_unchanged` works by snapshot — `l5_run_claude` writes `.claude/tmb/_l5_pre_run_git.json` with the base SHA before bro fires; the scorer compares post-run. This isolates "bro committed during the run" from setup-time commits the flow's run.sh made beforehand. `<slug>` and `<task.branch_id>` placeholders auto-resolve from the most-recent tasks row. ## Integration `l5_score_flow` now calls 7 scorers — added `l5_score_coherence` and `l5_score_git` to the existing 5. Both opt-in; missing config files = silently skipped. All 19 existing flows continue to pass without changes. ## Verification - `tests/dogfood/lib/scorers-test.sh` — 15 unit tests covering pass/fail paths (operator parsing, WHERE clauses, snapshot comparison, worktree HEAD checks). Wired into `tests/run-all.sh` as an L3 check. - L5 flow 13-bulk-cleanup opted in to both as proof-of-concept; passes end-to-end with all 7 scorers green. ## What's next (per tests/EVALUATION.md) - MR #2: backfill coherence/git across the other 18 L5 flows - MR #3: Phase 2 multi-turn driver - MR #4: L6 layer - MR #5: retire the headless fast-path (#2867) - MR #6: Phase 3 LLM-as-judge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… HEAD Two L6-chain bugs surfaced by run #2: 1. **Bro stalls in row 4 with config-vs-git conflict.** Row 3's seed flips `pr_target` from `main` to `dev`, but seeds are SQL-only and can't create git branches. Bro reads the new config, looks for the `dev` branch, doesn't find it, and halts the chain. Fix: before each step, if plugin_config.pr_target points at a branch that doesn't exist in git, create it from `main`. Mirrors what an interactive `/onboard` ceremony would do. 2. **Git scorer false-positive "base branch advanced" on rows 6+.** The chain runner's pre-run snapshot captured the *current branch* HEAD (always `main` initially), but the scorer compares against the *pr_target* branch HEAD post-run. After row 3 flips pr_target to `dev`, the comparison becomes meaningless. Fix: snapshot the HEAD of `pr_target` (current configured base), not the currently checked-out HEAD. Falls back to HEAD if pr_target's branch can't be resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bro's design uses the trajectory DB as the source of cross-session continuity (tmb_recovery + issue_state_get + task_first_actionable + issue_resume read the DB on every cold start). The L6 chain was using `claude --session-id` (turn 1) + `--resume` (subsequent turns) — which fights against that design and tests LLM-session attention instead of the actual workflow contract. Symptom from chain runs #2-3: bro engaged deeply on rows 1, 5, 10 (the rows with rich DB seeds where bro's natural state-aware behaviour kicked in) but drifted on rows 4, 6, 7, 8, 9, 11, 12 where session context noise (hook outputs, prior responses) accumulated and buried the relevant doctrine. 12/13 L5 standalone pass but 4/13 chain pass. Fix: every chain step fires a fresh `claude -p` invocation against the cumulative trajectory DB. No --session-id, no --resume. Continuity is DB-driven, exactly mirroring how production cross-session resumes actually work. Each row's bro sees the post-seed DB as if it were a fresh CC session resuming from prior work — the contract the chain is supposed to test. Updated: - run-l6-chain.sh: drop SESSION_ID generation; always-fresh claude -p - lib/l6-chain-helpers.sh: l6c_send_turn no longer takes session_id / is_first; runs fresh claude -p each call - tests/EVALUATION.md: L6 layer description + per-row execution spec - tests/README.md: L6 row description + chain runner usage - tests/dogfood/l6-chain/README.md: chain semantics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… 7 push-gate Row 6 (post-close-cleanup) `setup.sh` was committing `src/auth.py` to `main` via `git add + git commit -m 'feat: add session-token utility'`. That commit then leaked into row 7 (push-gate), because row 7's setup branches `feat/seed-todo` off main — inheriting the auth.py commit. When pr-reviewer ran on row 7 to evaluate the push, it saw two commits on the branch: - 2e5604d (the task's commit, in-spec — adds todo.py) - 27ccbaa (the row-6 fixture commit, OUT of spec — adds src/auth.py) pr-reviewer correctly returned `verdict='fail'` (the branch carries out-of-spec code per task #2's `## Files: todo.py`). But the L6 chain's row 7 contract expects `verdict='pass'` because the test's intent is to verify the push-gate happy path, not catch fixture contamination. Root cause: row 6 doesn't NEED the commit. Its hooks (post-task-close-rescan + post-read-summary-hint) only require the file to exist on disk + have a file_registry row. The commit was gratuitous. Removing it leaves main clean, row 7's branch then carries only the task's commit, and pr-reviewer can do its job on the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ocation recording Bug #3 (step 14): CC delivers Skill tool_input.skill with a plugin prefix ("tmb:tmb_planning") but the skills catalog stores bare names ("tmb_planning"). The hook's EXISTS check was querying the prefixed name, getting no match, and exiting silently — writing zero skill_invocations rows. Fix: strip the prefix with ${SKILL_NAME#*:} immediately after extraction, mirroring normalize-role.sh. Unit test added (tests/hooks/skill-invocation-record.test.sh, 5 cases). Bug #2 (step 10): bro sees cto with scope='template' in agent_list output and shortcuts to Agent spawn without the required copy+register+audit_log ceremony. Fix (two layers): (1) strengthen tmb_agent-creator/SKILL.md Branch B algorithm step to label the sequence as REQUIRED and enumerate the mandatory sub-steps explicitly; headless-mode section updated with same explicit ordering. (2) Add server-side safety net: agent_register now emits a tmb_agent_created audit row automatically when a NEW project-local consultant is registered (changes() > 0), so the audit row lands even if bro skips the explicit audit_log call. Two MCP unit tests cover the emit + idempotency guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a routing-table row distinguishing "expertise or architectural trade-off" questions (e.g. "should we break X out of the monolith") from in-scope questions answered directly. Such questions route via tmb_agent-creator to a consultant rather than direct answer. Authored by Human; complements the bug-#2 fix (67f0d68) so bro reliably invokes tmb_agent-creator on step-10-shaped prompts.

ZaxShen merged commit dba0003 into dev Apr 22, 2026

ZaxShen deleted the feature/v0.2-phase2-5-bugfixes branch April 22, 2026 00:15

ZaxShen mentioned this pull request Apr 25, 2026

🚧 chore: latency reduction — proposals on review #64

Merged

ZaxShen mentioned this pull request Apr 26, 2026

🚑 fix: v0.3.2 — git-guards.sh worktree-aware + rev-parse --verify + SWE no-bypass #88

Merged

ZaxShen mentioned this pull request Apr 26, 2026

🔍 debug(l6): expose claude -p output to CI logs (#116) #117

Merged

ZaxShen added a commit that referenced this pull request May 20, 2026

Merge branch 'docs/2-canonical-marketplaces' into 'dev'

09dde03

📝 docs(install): canonical marketplace install paths (#2) See merge request trustmybot/plugin!52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Phase 2.5 — fix 3 MCP correctness bugs from PR #1 review#2

🐛 Phase 2.5 — fix 3 MCP correctness bugs from PR #1 review#2
ZaxShen merged 1 commit into
devfrom
feature/v0.2-phase2-5-bugfixes

ZaxShen commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZaxShen commented Apr 22, 2026

Summary

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant