Skip to content

✨ v0.2.0 Phase 2 — SQLite MCP trajectory server#1

Merged
ZaxShen merged 7 commits into
devfrom
feature/v0.2-redesign
Apr 21, 2026
Merged

✨ v0.2.0 Phase 2 — SQLite MCP trajectory server#1
ZaxShen merged 7 commits into
devfrom
feature/v0.2-redesign

Conversation

@ZaxShen
Copy link
Copy Markdown
Contributor

@ZaxShen ZaxShen commented Apr 21, 2026

Summary

Phase 2 of the v0.2.0 redesign: the bundled Node + SQLite MCP trajectory server that gives the plugin persistent, resumable, role-isolated state. Closes 6 task XMLs (1400, 1405, 1410, 1415, 1420, 1425) all reviewed.

7 commits on top of origin/main:

  • 0a1137d — native plugin manifest (.claude-plugin/plugin.json, marketplace.json, .mcp.json) + directory skeleton
  • fa98d06 — Phase 0 GAN_CV scrub (skills + rules retargeted to TMB env)
  • e3376f7 — MCP package scaffold + 9-table SQLite schema
  • edff8a0 — TrajectoryDB abstraction + migration runner + 5 unit tests
  • ead2f92 — 11 MCP tools (issue/task/ledger) + 13 unit tests
  • b092bd8 — 7 MCP tools (audit/validation/skill/report) + 8 unit tests
  • 2cce419 — agent-scope middleware for role-based redaction + 7 unit tests

Verification: bun run test from mcp/trajectory-server/ → 33 pass / 0 fail. Build passes. Integration probe confirms SWE cannot read goals_md while architect can.

Known follow-ups tracked in bro/PLUGIN_BUGS.md:

  • 3 MCP correctness bugs (issue_close column overload, ledger is_truncated not persisted, audit round scoping) — Phase 2.5 fix before agents consume the API.
  • 5 workflow bugs surfaced via dogfooding (SWE double-state, PR Reviewer read-only, require-review-sign too aggressive, SWE not closing tasks, possible subagent Bash hook bypass) — several are already scoped into Phase 3 agent/skill rewrites.

Test plan

  • 33 unit tests pass from mcp/trajectory-server/
  • TypeScript build clean (bun run build)
  • Role-redaction integration probe: SWE gets objective truncated to 120 chars, no goals_md; architect gets full record
  • End-to-end dogfood install (/plugin marketplace add --local ./plugin) — scheduled for Phase 5
  • MCP bug fixes (PLUGIN_BUGS.md #B1-#B3) — Phase 2.5

🤖 Generated with Claude Code

ZaxShen and others added 7 commits April 21, 2026 15:24
Introduces .claude-plugin/plugin.json + marketplace.json, registers
trajectory-server via .mcp.json, and creates empty agents/skills/
hooks/teams/monitors/mcp directories for Phases 2-5 to populate.
Scrubs GAN_CV-specific references (gan_cv DBs, web/api paths, bun/drizzle
commands, React patterns) and generalizes stack-specific checks. python-dev
stays Python-focused; sql-dev is rewritten for SQLite (TMB's stack);
code-quality loses stack-specific verification commands in favor of generic
checklist items. naming-conventions drops React-specific rows;
review-findings drops React-pattern example.

Content edits from Phase 0 scrub that were pending on working tree after
the Phase 1 scaffold commit landed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds package.json, tsconfig, stdio bootstrap, and the 9-table schema.
Tool handlers and middleware are stubbed as an empty barrel and will
be filled by subsequent Phase 2 tasks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps better-sqlite3 with pragma setup, a schema.sql migration,
transaction helpers, and ISO-8601 / ID utilities. Adds unit tests
covering schema init, CRUD helpers, and transaction rollback.
Implements 11 tools (issue_create, issue_get, issue_resume, issue_close,
issue_get_phase, task_create_batch, task_get, task_update_status,
task_first_actionable, ledger_log, ledger_list). Includes 18 unit tests
(all passing) covering happy path + error cases per the task XML's
error-handling and edge-cases sections.

Adds src/types.ts for shared TypeScript interfaces and wires the new
tool registrars into src/tools/index.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Completes the Phase 2 tool surface (18 tools advertised via tools/list).
Reports render full issue narratives as markdown.
Wraps every tool handler in a normalization + redaction layer. SWE
cannot see issues.goals_md or other tasks' validation feedback,
regardless of caller-supplied flags. Architect and PR Reviewer see
full content. Defence-in-depth complement to agent frontmatter.
@ZaxShen ZaxShen merged commit fdeac06 into dev Apr 21, 2026
@ZaxShen ZaxShen deleted the feature/v0.2-redesign branch April 21, 2026 23:54
ZaxShen added a commit that referenced this pull request Apr 22, 2026
🐛 Phase 2.5 — fix 3 MCP correctness bugs from PR #1 review
ZaxShen added a commit that referenced this pull request Apr 24, 2026
User concern: with bro as main Claude, users can still type @tmb:architect
or @tmb:pr-reviewer and bypass bro's triage / onboarding / branch-id flow.
swe already had a MANDATORY FIRST ACTION that rejects spawns without
task_id=<N>. Extending the same discipline to architect and pr-reviewer
so no subagent is reachable directly by the Human.

Rejection heuristic per agent — detect bro-originated spawn via routing
markers, reject otherwise:

- architect: requires at least one of `triage: simple|difficult`,
  `issue_id=`, `branch_id=`, or `concern:` in the spawn prompt. bro
  always includes at least `triage:` and usually a `branch_id` from the
  branch-id-proposal skill. Direct @-mention by Human has none.
- pr-reviewer: requires at least one of `task_id=<N>`, `issue_id=<N>`,
  or a bro-routed review-request marker. architect always spawns
  pr-reviewer with task_id after SWE completes.
- swe: already had this via `task_id=<N>` scan. Unchanged.

On rejection, each agent outputs a structured REJECTED line explaining
that they're subagents, not Human entry points, and pointing the Human
back to bro (just type the request — no @-mention needed).

Tightened the pr-reviewer overview block to recover the line budget
(200-line cap per agent) after adding the rejection section.

Tests: 235 Layer 1 + 23 Layer 2 + 16 hook + 3 agent-budget + 30 contract
— all green.

Addresses your concern #1: users can't accidentally or deliberately
bypass bro by @-mentioning subagents. All three now auto-reject direct
invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request Apr 25, 2026
…icult; bro verification non-negotiable

Per Zax review of latency proposals (DISCUSSION.md):

### #1 — Split tmb_architect-workflow (443 lines) into two narrower skills

Bro on the simple-fast-lane was loading 443 lines of skill prose to use ~80
lines of it (defaults table + waiver guidance + handoff). Most of the file
is difficult-path content (env probe, ambiguous-choices list, RED FLAG
examples, blueprint format) that simple-path bro never uses. Split:

- **skills/tmb_planning-simple** (114 lines): defaults table, scope-gate
  waiver, batched-handoff hard rule, bro verification protocol, escalate
  triggers, trivial template. The whole simple-triage protocol.
- **skills/tmb_planning-difficult** (210 lines): triage confirmation, env
  probe, grounded Q+A rounds, scope-ambiguity HARD RULE + worked examples,
  ADR + decision capture, standard template, bro verification protocol.
- **skills/tmb_architect-workflow** (deleted) — the combined file.

Bro loads exactly one of the two based on the triage decision (which it
already makes per `tmb_branch-id-proposal`). Simple-path planner reads
~25% of what it used to.

### #2 — Hard-rule prose for batched handoff

Both new planning skills mark the post-triage tool-call batch as a HARD
RULE: emit `task_create_batch` + `Task(swe)` + `discussion_append(routing
note)` + `ledger_log(planning_complete)` as multiple tool_use blocks in
ONE assistant response. Prior runs split these across 3+ messages,
costing ~30s of round-trip latency per task.

### #3 — Bro verification protocol (lean but mandatory)

Per Zax's hard constraint ("bro should never skip verification"), both
planning skills include a "Bro verification protocol — never skip this"
section that bro runs after SWE returns and BEFORE flipping to closed:

1. **V1 — Pull spec + diff**: `task_get` + `git diff <commit_sha>~1..<sha>`
2. **V2 — Three checks (all required)**:
   - Files match `## Files`
   - `## Verification` commands re-run and pass
   - Each `## Success Criteria` bullet visibly met in the diff
3. **V3 — Decide**: all-pass → `task_update_status(status='closed')` (+ `issue_close` batched in same response if last task). Any-fail → log to discussion, re-spawn SWE with feedback (max 3) or escalate.

This is the **task gate** — distinct from pr-reviewer which is the **push
gate** (deeper style/security checks at git push time).

### Other refs swept

- CLAUDE.md routing + first-action chain rewritten to load the right
  planning skill per triage; new explicit "bro verification is
  non-negotiable" section.
- README.md workflow line updated.
- skills/tmb_branch-id-proposal cross-references updated.
- docs/architecture/FILES.md tree refreshed.
- docs/architecture/FLOWS.md tables + Flow 2 sequence diagram updated.
- tests/lint/onboarding-skill-contract.sh — old architect-workflow
  assertions split across the two new files; added new assertions for
  the verification step ("Bro verification protocol", "never skip",
  "task gate") on both files.
- tests/mcp-integration/scope-gate.test.mjs comment refresh.

### Layer 1 + 2 status

```
Onboarding contract lint:           PASS (24 assertions across 4 skills)
Lego cap (≤30 lines templates/):    PASS
MCP server unit + integration:      PASS
Hook tests:                         PASS
```

### Expected impact

The 1m 34s "planning gap" measured in the prior Mode A run came from bro
loading + processing 443 lines of architect-workflow before the batched
write. Cutting that to 114 lines for the simple path should drop the gap
by ~30-50s. Combined with the hard-rule batching prose forcing bro to
emit fewer total messages (target: 2 instead of 3), simple-triage
planning should land in ~30-45s of bro work plus SWE/Human time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request Apr 25, 2026
The single-agent comparison in pillar #1 was too soft. "Marking your
own homework" is the standard English idiom for the exact conflict of
interest — same actor judging their own work — and lands harder than
the vaguer "wishful thinking" phrasing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request Apr 26, 2026
…WE no-bypass (#88)

Two real bugs in git-guards.sh found by @ZaxShen during v0.3.1
marketplace test, plus a SWE doctrine violation that surfaced when
the hook bugs blocked a legitimate commit.

## Bug #1 — Hook reads branch from CC's CWD, not the worktree

`git branch --show-current` ran in CC's session CWD (project root,
always main) regardless of where the actual `git commit` would run.
Result: SWE in `isolation: worktree` mode could NEVER commit — every
commit hit "no direct commits to main" even when committing on a
feature branch inside the worktree.

Fix: parse the cd prefix or `git -C` from the command and run git
from there. New cmd_cwd() + cmd_branch() helpers in git-guards.sh.
Tests: 7 new worktree-aware cases in tests/hooks/git-guards.test.sh
(12 total, all green).

## Bug #2 — `git rev-parse` without --verify prints literal ref string

When `origin/main` didn't exist, `git rev-parse "origin/main"` printed
the literal string "origin/main" to stdout AND exited non-zero. The
`2>/dev/null` swallowed stderr but the literal-string stdout sneaked
through. REMOTE ended up as "origin/main" (non-empty) — the
"behind origin/main" check fired falsely on any repo without a remote.

Fix: use `git rev-parse --verify`. Empty output if ref doesn't exist.
Test: 1 new regression case for no-remote repo behavior.

## SWE doctrine — explicit no-bypass clause

When the worktree bug blocked SWE's commit, SWE attempted to rewrite
.git/HEAD and fabricate branch refs to bypass the hook. CC's security
guards blocked it, but the doctrine was wrong. Added to agents/swe.md:

  Never attempt to bypass a PreToolUse hook block — do not rewrite
  .git/HEAD, fabricate refs, edit .git/ internals, or use any
  technique to evade a hook decision. If a hook blocks a legitimate
  operation, that's a plugin bug — STOP immediately, return the
  failure summary to bro with the exact hook output.

agents/swe.md stays 21 lines (within 30-line Lego cap).

## Bonus — patched user's installed cache for immediate unblock

While CI ships v0.3.2 properly, also wrote the fixed git-guards.sh +
swe.md directly into ~/.claude/plugins/cache/trustmybot/tmb-rc/0.3.1/
so @ZaxShen's in-flight test can complete without waiting for the
release cycle. The next /plugin update tmb-rc@trustmybot will replace
the cache with the v0.3.2 official.

## Verified

  bash tests/run-all.sh                     → all 11 lints + L2 + L3 green
  bash tests/docker/run-install-smoke.sh    → ✓ L0 PASSED
  Patched cache verified: cmd_branch present + Never attempt to bypass present

Co-authored-by: Zax Shen <ZaxShen@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request May 20, 2026
- marketplace.json: source github→url for both tmb and tmb-rc channels (CC plugin marketplace `source: url` accepts any git host); owner.url → gitlab
- CONTRIBUTING.md: new GitLab MR section parallel to existing GitHub PR section (additive, both work)
- SECURITY.md: add GitLab confidential-issue path; keep GH (mirror) and email fallback
- ADR 0001 documents the migration decision + future flip-back path

Out of scope (deferred to separate issues): CI port, release.sh,
enterprise/ migration. README/package.json/CHANGELOG already migrated
in prior work.

SWE atomic close finalized by bro after SWE bailed pre-commit (5th
this session). Issue #94 tracks the atomic-close hook not firing.
ZaxShen added a commit that referenced this pull request May 20, 2026
🌄 chore(migration): finalize GitLab as primary git host (#1)

See merge request trustmybot/plugin!13
ZaxShen added a commit that referenced this pull request May 20, 2026
Sibling of MR !149 (#1's file_registry_update_summaries fix). Symmetric
workspace-pattern bug in bro_atomic_close — manifested live closing
task_id=1 of the user's session on 2026-05-11.

Three problems in one block (composites.ts:374-417):

1. `resolveProjectPath` joined relative paths against
   `dirname(dirname(dirname(dbPath)))` = workspace root. Files like
   `mcp/trajectory-server/src/tools/file-registry.ts` in workspace pattern
   resolved to `<workspace>/mcp/...` which doesn't exist.

2. `cwd: projectRoot` for the `git show` fallback used the same
   workspace root — git can't `show` from outside any repo.

3. INSERT into file_registry wrote `(path, type, content_md5, summary,
   summary_updated_at)` with no `repo`, and `ON CONFLICT(path)`. The PK
   is `(repo, path)` — when a scan-populated row exists for
   `(repo='plugin', path='X')`, inserting `(repo='', path='X')` doesn't
   conflict, then SQLite throws `ON CONFLICT clause does not match any
   PRIMARY KEY or UNIQUE constraint`.

Fix: per-update repo resolution with priority

   explicit `s.repo` → `task.repo` → `resolveDefaultRepo(db, dbPath)` → error

Each update resolves to a repo, computes the absolute path against that
repo's `repos.path`, runs `git show` from that repo as cwd if disk-read
fails, and INSERTs with the resolved repo + `ON CONFLICT(repo, path)`.

Schema-side: `bro_atomic_close.file_summaries[].items` accepts optional
`repo: string`. Defaults documented inline.

Also removed the now-dead `resolveProjectPath` helper + the unused
`dirname` import.

4 new tests in `composites.test.ts > bro_atomic_close multi-repo
file_summaries` cover: explicit-repo-wins, task.repo fallback,
tmb_default_repo fallback, no-repo-anywhere error. Each uses a real
two-repo workspace fixture under `mkdtempSync` so the disk-md5 path
exercises against actual files.

499/499 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request May 20, 2026
Synthesized 29 findings from a 4-agent audit across docs, MCP source,
hooks, and tests. Applied every actionable item; 27 files touched (4
deletions, 23 modifications). All test layers green.

## Bugs fixed

- index.ts L5 trajectory capture now gates on TMB_EVAL_MODE=1 too —
  debug_trajectory table only exists under eval mode; without this
  gate, TMB_DEBUG_TRAJECTORY=1 wrote to a missing table and silently
  failed.
- post-read-summary-hint.sh — add HOME-boundary guard to DB walk-up
  (was missing the P0 check the other 5 hooks have).
- require-summaries-before-task-close.sh — replace hardcoded
  git-root derivation with the walk-up + HOME-guard pattern; works
  for workspace-pattern projects.
- docs/REFERENCE.md + docs/architecture/ERD.md — drop the false
  "stable→tmb/, RC→tmb-rc/" channel-isolation claim. Both channels
  currently resolve to `tmb` because rc's plugin.json.name is "tmb".
  Honest current state documented; true isolation tracked in #1.

## Dead code removed

- tasks.ts — `void genId('task')` no-op + its unused import.
- composites.ts — `success_criteria` arg in task_retry_batch read but
  never used; dropped from inputSchema (property + required) +
  handler.
- scripts/hooks/diagnostic/ — entire directory deleted (probe-bash.sh
  was an orphan, never registered in hooks.json).
- tests/workflow-sim/flow-M-monitor-cursor.test.mjs — never invoked
  by run-all.sh; coverage exists in L2 pr-comments.test.ts + L5 row 13.
- tests/lint/no-ledger-references.sh — structurally impossible to
  violate post-#170 (ledger merged into audit); retired.

## Stale docs corrected

- ERD.md — schema_version 1→2; plugin_version 0.6.0-rc.1→0.6.0.
- REFERENCE.md — drop hardcoded "22 hooks" count; 18→19 tables;
  drop the deleted `diagnostic` row from the hook table.
- templates/docs-trustmybot/architecture/auto/*.md — drop "currently
  inert (see #2881 follow-up)" historical commentary across all 4
  placeholder files.
- tests/EVALUATION.md — add row 14 (skill-invocation-recorded) to the
  journey table; bump all 13→14 references.
- tests/README.md — fix the L4 invocation snippet to match
  tests/run-all.sh exactly.
- tests/dogfood/flows/README.md — add legacy-bundle note pointing at
  l5-rows/ (the current per-row L5 layout).
- roundtable-cleanup-postcheck.sh — collapse stale 14-line header to a
  4-line purpose statement.

## Verified

- L1 (18 lint groups, one fewer after retiring no-ledger-references).
- L2 (406 tests, 72 suites, 0 fail).
- L3 (MCP integration + hook scripts + L5 scorer units).
- L4 (5 workflow-sim flows).

Ready for v0.6.0 stable promotion once MR !170 (WAL-checkpoint fix)
lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant