π v0.2.0 Phase 5 β legacy tree removed, README + CLAUDE.md finalized, install.sh deprecated#5
Merged
Merged
Conversation
README documents /plugin marketplace add + /plugin install as the install path. install.sh reduced to a 5-line deprecation stub that prints the new commands and exits 1. Ready for v0.3 removal.
Task 1540's spec was written before the gatekeeper-roster correction (see bro/PLUGIN_BUGS.md #D1). SWE faithfully followed the stale spec which advertised a 5-fixed-global roster (secretary/architect/swe/ pr-reviewer/prompt-engineer). That regresses the corrected README shipped in commit eb27be4. Restoring README to the corrected 2-global (gatekeeper + prompt-engineer) + 5-project-placeholder (ceo/cto/architect/swe/ pr-reviewer) + on-demand model. Keep 1540's install.sh deprecation stub change as-is (that part was correct). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
β¦v0.2 The 10-agent roster + old hook settings are fully superseded by the native plugin layout (plugin/agents/, plugin/skills/, plugin/hooks/). CLAUDE.md now describes the 5 core agents with the agent-creator capability for domain agents.
β¦gineer global, 5 placeholders seeded per project) Task 1535's CLAUDE.md rewrite used a stale 5-fixed-global roster (secretary/architect/swe/pr-reviewer/prompt-engineer + secretary hybrid-named "gatekeeper"). Per the corrected model in bro/PLUGIN_BUGS.md #D1: - Global tier (plugin ships): gatekeeper, prompt-engineer. - Project tier (seeded per project): ceo, cto, architect, swe, pr-reviewer. - On-demand: gatekeeper drives agent-creator flow with explicit user approval per new agent. - pm/gtm/designer NOT in plugin (TMB team internal only). Adds a Persistence section describing the bundled SQLite trajectory MCP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
Apr 24, 2026
β¦s: [X]' Node `I` had `skills: [X]` inside an unquoted Mermaid label, which the parser read as nested square brackets and bailed on. The whole flow #5 (Skill Creation) failed to render on GitHub. - Wrapped every label with literal/risky characters in double quotes (Mermaid spec for safe label content). - Escaped the inner brackets in node I to HTML entities ([ / ]) so the YAML frontmatter syntax still reads correctly. - Also escaped < as < in node B's "fires < 20% of sessions?" since unquoted < can be parsed as opening an HTML tag. No semantic change β same flow, now actually renderable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦nts (#5) l5_run_claude (flow-helpers.sh) and l5_run_arm (ab-helpers.sh) now invoke `claude --output-format stream-json --include-hook-events --include-partial-messages --verbose -p <prompt>` and pipe the JSONL output to <project>/trajectory.jsonl. The previous text-grep capture path is replaced by structured per-message events: assistant.message.usage gives token counts including main-thread (no more #135 lower-bound). assistant.message.content[].name (where type == 'tool_use') gives tool calls directly β no more debug_trajectory dependency. Slim stderr summary line replaces the per-line text echo (assistant_msgs + duration_ms shown for log triage).
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
l5_score_cost, l5_score_trajectory_required, l5_score_trajectory_forbidden
now read from <project>/trajectory.jsonl via jq instead of querying
agent_runs / debug_trajectory.
- cost: sums assistant.message.usage.{input,output}_tokens across the JSONL.
Captures BOTH main-thread and subagent tokens β the #135 caveat from
TRU-76's stopgap is resolved.
- trajectory_required + trajectory_forbidden: extract tool_use names from
assistant content blocks. No env-coupling on TMB_DEBUG_TRAJECTORY.
- outcome: unchanged (SQL on the DB per spec).
Closes #5 (TRU-82). Supersedes the agent_runs cost source from #4 and
the main-thread capture follow-up from #135.
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
π οΈ feat(test): switch L5/A/B capture to stream-json + jq scorers (#5) See merge request trustmybot/plugin!54
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦+ retry-on-failed doctrine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
π fix(153 #5-7): planning-difficult Step 0 + prescan greenfield arch + retry doctrine See merge request trustmybot/plugin!104
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
β¦ade)
Two new optional scorers any L5 flow can opt into. Pure shell + sqlite3 +
git plumbing β no LLM, no new deps. Catches the failure modes today's L5
misses (the empty-table + base-branch-contamination cases Daisy hit on
2026-05).
## outcome-coherence.json
Table-shape assertions. Catches "empty discussions after a planning
flow" without each flow author having to spell out the SQL.
```json
{
"expected_writes": {
"issues": ">=1",
"tasks": ">=1",
"discussions": ">=1",
"audit": ">=2",
"tasks WHERE branch_id != 'dev'": ">=1"
}
}
```
Operators: `>=N`, `<=N`, `=N`, `!=N`, or bare `N` (= exact). Optional
`WHERE <clause>` suffix on the key targets specific row shape.
## outcome-git.json
Git-state assertions. Catches "bro committed to dev directly" /
"worktree on detached HEAD" / "uncommitted slop in worktree".
```json
{
"base_branch_unchanged": true,
"uncommitted_in_worktree": false,
"worktrees": [
{ "path": ".claude/worktrees/<slug>",
"head_branch": "<task.branch_id>",
"head_not_branch": ["dev", "main"] }
]
}
```
`base_branch_unchanged` works by snapshot β `l5_run_claude` writes
`.claude/tmb/_l5_pre_run_git.json` with the base SHA before bro fires;
the scorer compares post-run. This isolates "bro committed during the
run" from setup-time commits the flow's run.sh made beforehand.
`<slug>` and `<task.branch_id>` placeholders auto-resolve from the
most-recent tasks row.
## Integration
`l5_score_flow` now calls 7 scorers β added `l5_score_coherence` and
`l5_score_git` to the existing 5. Both opt-in; missing config files =
silently skipped. All 19 existing flows continue to pass without
changes.
## Verification
- `tests/dogfood/lib/scorers-test.sh` β 15 unit tests covering pass/fail
paths (operator parsing, WHERE clauses, snapshot comparison, worktree
HEAD checks). Wired into `tests/run-all.sh` as an L3 check.
- L5 flow 13-bulk-cleanup opted in to both as proof-of-concept; passes
end-to-end with all 7 scorers green.
## What's next (per tests/EVALUATION.md)
- MR #2: backfill coherence/git across the other 18 L5 flows
- MR #3: Phase 2 multi-turn driver
- MR #4: L6 layer
- MR #5: retire the headless fast-path (#2867)
- MR #6: Phase 3 LLM-as-judge
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 5 β the v0.2.0 close-out. Removes the legacy
.claude/tree, demotesinstall.shto a deprecation stub, and finalizes the docs to the corrected two-tier roster model.Commits
plugin/.claude/tree (31 files: 10 agents + 7 rules + 11 skills + 2 skills-gallery + settings.json) and rewrite CLAUDE.md (per task 1535 spec β same stale-roster issue)What this PR ships
.claude-plugin/plugin.json,agents/,templates/agents/,skills/<name>/SKILL.md,hooks/hooks.json,monitors/,mcp/trajectory-server/).claude/tree fully removedinstall.shdeprecation stub points users to/plugin marketplace add+/plugin installImportant β symlink consequence
Deleting
plugin/.claude/breaks theTMB/.claude β plugin/.claudesymlink that the TMB workspace was using to dogfood. Post-merge action required:This is the live dogfood install (originally task 1545, deferred to a human-driven step since it requires interactive
/plugin installcommands).Deferred
/plugin install, which subagents can't drive.Test plan
ls plugin/.claude/returns "No such file" (legacy tree gone)cat plugin/CLAUDE.mdshows two-tier roster (gatekeeper + prompt-engineer global; ceo/cto/architect/swe/pr-reviewer placeholders; on-demand)bash plugin/install.shexits 1 with deprecation noticecat plugin/README.mdshows corrected install commands and roster model/plugin install tmb@trustmybotfrom a fresh project + spawn/gatekeeper+ complete one trivial workflow loop β please do this in a fresh session post-mergeπ€ Generated with Claude Code