π Phase 2.5 β fix 3 MCP correctness bugs from PR #1 review#2
Merged
Conversation
β¦t round scoping Three bugs from PR #1 review (see bro/PLUGIN_BUGS.md #B1 #B2 #B3): - #B1: issue_close was writing post-close SHA into pre_commit_hash column, clobbering the pre-SHA. Schema now has post_commit_hash; handler writes to the right column. - #B2: ledger_log returned is_truncated in the response but didn't persist it. Schema now has is_truncated on ledger; handler persists; ledger_list reads and returns it. - #B3: audit_log round scoping was per-issue but schema has branch_id; rounds cross-contaminated across tasks. Handler now scopes MAX(round) by (issue_id, branch_id). Adds tests for each fix. All 33 existing tests still pass.
ZaxShen
added a commit
that referenced
this pull request
Apr 24, 2026
β¦shback Adds the third pillar of agent-harness design β productive dissent β alongside the two we already implement (responsibility boundaries + requirements alignment). Two failure modes this avoids: - Sycophantic agreement: bro capitulates to anything the Human says, plans proceed with known flaws. - Direct contradiction: bro pushes back personally, triggers the defensiveness people naturally feel about their own plans, conversation stalls. The structural fix is to separate intent-capture from intent-evaluation: **bro.md** β added a fifth role bullet. Bro's job is faithful capture; concerns go into the architect spawn prompt as `concern: <why>`, never argued with the Human directly. Bro is not a debate partner. **architect.md** β sharpened objective #2. Architect is the technical check on the Human's plan, with independent authority. Concerns (bro-forwarded or architect-originated) surface via discussion_append BEFORE writing tasks. Explicitly: never rubber-stamp a plan the architect believes is flawed. The Human now sees two independent reads (theirs + architect's) routed through a structural channel, not a confrontation. Triangulation, not conflict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
Apr 25, 2026
β¦icult; bro verification non-negotiable Per Zax review of latency proposals (DISCUSSION.md): ### #1 β Split tmb_architect-workflow (443 lines) into two narrower skills Bro on the simple-fast-lane was loading 443 lines of skill prose to use ~80 lines of it (defaults table + waiver guidance + handoff). Most of the file is difficult-path content (env probe, ambiguous-choices list, RED FLAG examples, blueprint format) that simple-path bro never uses. Split: - **skills/tmb_planning-simple** (114 lines): defaults table, scope-gate waiver, batched-handoff hard rule, bro verification protocol, escalate triggers, trivial template. The whole simple-triage protocol. - **skills/tmb_planning-difficult** (210 lines): triage confirmation, env probe, grounded Q+A rounds, scope-ambiguity HARD RULE + worked examples, ADR + decision capture, standard template, bro verification protocol. - **skills/tmb_architect-workflow** (deleted) β the combined file. Bro loads exactly one of the two based on the triage decision (which it already makes per `tmb_branch-id-proposal`). Simple-path planner reads ~25% of what it used to. ### #2 β Hard-rule prose for batched handoff Both new planning skills mark the post-triage tool-call batch as a HARD RULE: emit `task_create_batch` + `Task(swe)` + `discussion_append(routing note)` + `ledger_log(planning_complete)` as multiple tool_use blocks in ONE assistant response. Prior runs split these across 3+ messages, costing ~30s of round-trip latency per task. ### #3 β Bro verification protocol (lean but mandatory) Per Zax's hard constraint ("bro should never skip verification"), both planning skills include a "Bro verification protocol β never skip this" section that bro runs after SWE returns and BEFORE flipping to closed: 1. **V1 β Pull spec + diff**: `task_get` + `git diff <commit_sha>~1..<sha>` 2. **V2 β Three checks (all required)**: - Files match `## Files` - `## Verification` commands re-run and pass - Each `## Success Criteria` bullet visibly met in the diff 3. **V3 β Decide**: all-pass β `task_update_status(status='closed')` (+ `issue_close` batched in same response if last task). Any-fail β log to discussion, re-spawn SWE with feedback (max 3) or escalate. This is the **task gate** β distinct from pr-reviewer which is the **push gate** (deeper style/security checks at git push time). ### Other refs swept - CLAUDE.md routing + first-action chain rewritten to load the right planning skill per triage; new explicit "bro verification is non-negotiable" section. - README.md workflow line updated. - skills/tmb_branch-id-proposal cross-references updated. - docs/architecture/FILES.md tree refreshed. - docs/architecture/FLOWS.md tables + Flow 2 sequence diagram updated. - tests/lint/onboarding-skill-contract.sh β old architect-workflow assertions split across the two new files; added new assertions for the verification step ("Bro verification protocol", "never skip", "task gate") on both files. - tests/mcp-integration/scope-gate.test.mjs comment refresh. ### Layer 1 + 2 status ``` Onboarding contract lint: PASS (24 assertions across 4 skills) Lego cap (β€30 lines templates/): PASS MCP server unit + integration: PASS Hook tests: PASS ``` ### Expected impact The 1m 34s "planning gap" measured in the prior Mode A run came from bro loading + processing 443 lines of architect-workflow before the batched write. Cutting that to 114 lines for the simple path should drop the gap by ~30-50s. Combined with the hard-rule batching prose forcing bro to emit fewer total messages (target: 2 instead of 3), simple-triage planning should land in ~30-45s of bro work plus SWE/Human time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
Apr 26, 2026
β¦WE no-bypass (#88) Two real bugs in git-guards.sh found by @ZaxShen during v0.3.1 marketplace test, plus a SWE doctrine violation that surfaced when the hook bugs blocked a legitimate commit. ## Bug #1 β Hook reads branch from CC's CWD, not the worktree `git branch --show-current` ran in CC's session CWD (project root, always main) regardless of where the actual `git commit` would run. Result: SWE in `isolation: worktree` mode could NEVER commit β every commit hit "no direct commits to main" even when committing on a feature branch inside the worktree. Fix: parse the cd prefix or `git -C` from the command and run git from there. New cmd_cwd() + cmd_branch() helpers in git-guards.sh. Tests: 7 new worktree-aware cases in tests/hooks/git-guards.test.sh (12 total, all green). ## Bug #2 β `git rev-parse` without --verify prints literal ref string When `origin/main` didn't exist, `git rev-parse "origin/main"` printed the literal string "origin/main" to stdout AND exited non-zero. The `2>/dev/null` swallowed stderr but the literal-string stdout sneaked through. REMOTE ended up as "origin/main" (non-empty) β the "behind origin/main" check fired falsely on any repo without a remote. Fix: use `git rev-parse --verify`. Empty output if ref doesn't exist. Test: 1 new regression case for no-remote repo behavior. ## SWE doctrine β explicit no-bypass clause When the worktree bug blocked SWE's commit, SWE attempted to rewrite .git/HEAD and fabricate branch refs to bypass the hook. CC's security guards blocked it, but the doctrine was wrong. Added to agents/swe.md: Never attempt to bypass a PreToolUse hook block β do not rewrite .git/HEAD, fabricate refs, edit .git/ internals, or use any technique to evade a hook decision. If a hook blocks a legitimate operation, that's a plugin bug β STOP immediately, return the failure summary to bro with the exact hook output. agents/swe.md stays 21 lines (within 30-line Lego cap). ## Bonus β patched user's installed cache for immediate unblock While CI ships v0.3.2 properly, also wrote the fixed git-guards.sh + swe.md directly into ~/.claude/plugins/cache/trustmybot/tmb-rc/0.3.1/ so @ZaxShen's in-flight test can complete without waiting for the release cycle. The next /plugin update tmb-rc@trustmybot will replace the cache with the v0.3.2 official. ## Verified bash tests/run-all.sh β all 11 lints + L2 + L3 green bash tests/docker/run-install-smoke.sh β β L0 PASSED Patched cache verified: cmd_branch present + Never attempt to bypass present Co-authored-by: Zax Shen <ZaxShen@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
Updates README.md and tests/manual/setup.md to teach the new canonical install flow via gitlab.com/trustmybot/marketplace (stable) and trustmybot/marketplace-rc (RC), per CC's canonical release-channel pattern. Same plugin name `tmb` in both marketplaces; the marketplace name (trustmybot vs trustmybot-rc) is the @qualifier. Legacy plugin/.claude-plugin/marketplace.json kept for backward compat but its metadata.description is updated to flag itself as legacy and redirect new installs to the canonical paths. Closes #2 (TRU-84).
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
π docs(install): canonical marketplace install paths (#2) See merge request trustmybot/plugin!52
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦ade)
Two new optional scorers any L5 flow can opt into. Pure shell + sqlite3 +
git plumbing β no LLM, no new deps. Catches the failure modes today's L5
misses (the empty-table + base-branch-contamination cases Daisy hit on
2026-05).
## outcome-coherence.json
Table-shape assertions. Catches "empty discussions after a planning
flow" without each flow author having to spell out the SQL.
```json
{
"expected_writes": {
"issues": ">=1",
"tasks": ">=1",
"discussions": ">=1",
"audit": ">=2",
"tasks WHERE branch_id != 'dev'": ">=1"
}
}
```
Operators: `>=N`, `<=N`, `=N`, `!=N`, or bare `N` (= exact). Optional
`WHERE <clause>` suffix on the key targets specific row shape.
## outcome-git.json
Git-state assertions. Catches "bro committed to dev directly" /
"worktree on detached HEAD" / "uncommitted slop in worktree".
```json
{
"base_branch_unchanged": true,
"uncommitted_in_worktree": false,
"worktrees": [
{ "path": ".claude/worktrees/<slug>",
"head_branch": "<task.branch_id>",
"head_not_branch": ["dev", "main"] }
]
}
```
`base_branch_unchanged` works by snapshot β `l5_run_claude` writes
`.claude/tmb/_l5_pre_run_git.json` with the base SHA before bro fires;
the scorer compares post-run. This isolates "bro committed during the
run" from setup-time commits the flow's run.sh made beforehand.
`<slug>` and `<task.branch_id>` placeholders auto-resolve from the
most-recent tasks row.
## Integration
`l5_score_flow` now calls 7 scorers β added `l5_score_coherence` and
`l5_score_git` to the existing 5. Both opt-in; missing config files =
silently skipped. All 19 existing flows continue to pass without
changes.
## Verification
- `tests/dogfood/lib/scorers-test.sh` β 15 unit tests covering pass/fail
paths (operator parsing, WHERE clauses, snapshot comparison, worktree
HEAD checks). Wired into `tests/run-all.sh` as an L3 check.
- L5 flow 13-bulk-cleanup opted in to both as proof-of-concept; passes
end-to-end with all 7 scorers green.
## What's next (per tests/EVALUATION.md)
- MR #2: backfill coherence/git across the other 18 L5 flows
- MR #3: Phase 2 multi-turn driver
- MR #4: L6 layer
- MR #5: retire the headless fast-path (#2867)
- MR #6: Phase 3 LLM-as-judge
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦ HEAD Two L6-chain bugs surfaced by run #2: 1. **Bro stalls in row 4 with config-vs-git conflict.** Row 3's seed flips `pr_target` from `main` to `dev`, but seeds are SQL-only and can't create git branches. Bro reads the new config, looks for the `dev` branch, doesn't find it, and halts the chain. Fix: before each step, if plugin_config.pr_target points at a branch that doesn't exist in git, create it from `main`. Mirrors what an interactive `/onboard` ceremony would do. 2. **Git scorer false-positive "base branch advanced" on rows 6+.** The chain runner's pre-run snapshot captured the *current branch* HEAD (always `main` initially), but the scorer compares against the *pr_target* branch HEAD post-run. After row 3 flips pr_target to `dev`, the comparison becomes meaningless. Fix: snapshot the HEAD of `pr_target` (current configured base), not the currently checked-out HEAD. Falls back to HEAD if pr_target's branch can't be resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
Bro's design uses the trajectory DB as the source of cross-session continuity (tmb_recovery + issue_state_get + task_first_actionable + issue_resume read the DB on every cold start). The L6 chain was using `claude --session-id` (turn 1) + `--resume` (subsequent turns) β which fights against that design and tests LLM-session attention instead of the actual workflow contract. Symptom from chain runs #2-3: bro engaged deeply on rows 1, 5, 10 (the rows with rich DB seeds where bro's natural state-aware behaviour kicked in) but drifted on rows 4, 6, 7, 8, 9, 11, 12 where session context noise (hook outputs, prior responses) accumulated and buried the relevant doctrine. 12/13 L5 standalone pass but 4/13 chain pass. Fix: every chain step fires a fresh `claude -p` invocation against the cumulative trajectory DB. No --session-id, no --resume. Continuity is DB-driven, exactly mirroring how production cross-session resumes actually work. Each row's bro sees the post-seed DB as if it were a fresh CC session resuming from prior work β the contract the chain is supposed to test. Updated: - run-l6-chain.sh: drop SESSION_ID generation; always-fresh claude -p - lib/l6-chain-helpers.sh: l6c_send_turn no longer takes session_id / is_first; runs fresh claude -p each call - tests/EVALUATION.md: L6 layer description + per-row execution spec - tests/README.md: L6 row description + chain runner usage - tests/dogfood/l6-chain/README.md: chain semantics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦ 7 push-gate Row 6 (post-close-cleanup) `setup.sh` was committing `src/auth.py` to `main` via `git add + git commit -m 'feat: add session-token utility'`. That commit then leaked into row 7 (push-gate), because row 7's setup branches `feat/seed-todo` off main β inheriting the auth.py commit. When pr-reviewer ran on row 7 to evaluate the push, it saw two commits on the branch: - 2e5604d (the task's commit, in-spec β adds todo.py) - 27ccbaa (the row-6 fixture commit, OUT of spec β adds src/auth.py) pr-reviewer correctly returned `verdict='fail'` (the branch carries out-of-spec code per task #2's `## Files: todo.py`). But the L6 chain's row 7 contract expects `verdict='pass'` because the test's intent is to verify the push-gate happy path, not catch fixture contamination. Root cause: row 6 doesn't NEED the commit. Its hooks (post-task-close-rescan + post-read-summary-hint) only require the file to exist on disk + have a file_registry row. The commit was gratuitous. Removing it leaves main clean, row 7's branch then carries only the task's commit, and pr-reviewer can do its job on the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦ocation recording Bug #3 (step 14): CC delivers Skill tool_input.skill with a plugin prefix ("tmb:tmb_planning") but the skills catalog stores bare names ("tmb_planning"). The hook's EXISTS check was querying the prefixed name, getting no match, and exiting silently β writing zero skill_invocations rows. Fix: strip the prefix with ${SKILL_NAME#*:} immediately after extraction, mirroring normalize-role.sh. Unit test added (tests/hooks/skill-invocation-record.test.sh, 5 cases). Bug #2 (step 10): bro sees cto with scope='template' in agent_list output and shortcuts to Agent spawn without the required copy+register+audit_log ceremony. Fix (two layers): (1) strengthen tmb_agent-creator/SKILL.md Branch B algorithm step to label the sequence as REQUIRED and enumerate the mandatory sub-steps explicitly; headless-mode section updated with same explicit ordering. (2) Add server-side safety net: agent_register now emits a tmb_agent_created audit row automatically when a NEW project-local consultant is registered (changes() > 0), so the audit row lands even if bro skips the explicit audit_log call. Two MCP unit tests cover the emit + idempotency guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
Adds a routing-table row distinguishing "expertise or architectural trade-off" questions (e.g. "should we break X out of the monolith") from in-scope questions answered directly. Such questions route via tmb_agent-creator to a consultant rather than direct answer. Authored by Human; complements the bug-#2 fix (67f0d68) so bro reliably invokes tmb_agent-creator on step-10-shaped prompts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the 3 correctness bugs flagged by pr-reviewer on PR #1. All three were data-integrity issues that would silently corrupt trajectory state once agents start calling the MCP tools.
issue_closewas writingpost_git_shainto thepre_commit_hashcolumn (nopost_commit_hashcolumn existed). Overwrote the pre-SHA captured at create. Fix: addpost_commit_hash TEXTcolumn; write to it; preservepre_commit_hash.ledger_logreturnedis_truncatedin the response but didn't persist it (no column existed). Laterledger_listcallers couldn't tell a row was truncated. Fix: addis_truncated INTEGER NOT NULL DEFAULT 0toledger; persist on INSERT; surface viaSELECT *.audit_logscopedMAX(round)byissue_idonly, letting rounds cross-contaminate across tasks in the same issue. Fix: scope by(issue_id, branch_id)with documented fallback to issue-only whenbranch_idis null.Schema version bumped 1 β 2. Schema changes are additive (new columns, no renames), backward-compatible.
Test plan
bun install && bun run build && bun run testfromplugin/mcp/trajectory-server/β 38 pass / 0 failbro/tasks/20260421-1700_phase2_5_mcp_bugfixes.xmlclosedNotes
One informational flag:
issue_closenow allows omittingpost_git_sha(matches the authorized edge-case spec). Callers omitting it getpost_commit_hash = NULLinstead of a validation error.π€ Generated with Claude Code