Skip to content

πŸŽ‰ v0.2.0 Phase 5 β€” legacy tree removed, README + CLAUDE.md finalized, install.sh deprecated#5

Merged
ZaxShen merged 4 commits into
devfrom
feature/v0.2-phase5
Apr 22, 2026
Merged

πŸŽ‰ v0.2.0 Phase 5 β€” legacy tree removed, README + CLAUDE.md finalized, install.sh deprecated#5
ZaxShen merged 4 commits into
devfrom
feature/v0.2-phase5

Conversation

@ZaxShen
Copy link
Copy Markdown
Contributor

@ZaxShen ZaxShen commented Apr 22, 2026

Summary

Phase 5 β€” the v0.2.0 close-out. Removes the legacy .claude/ tree, demotes install.sh to a deprecation stub, and finalizes the docs to the corrected two-tier roster model.

Commits

  • c96e6d7 rewrite README + stub install.sh (per task 1540 spec β€” but README content was rewritten with stale roster, see next commit)
  • d16bef8 restore README to corrected 2+5+on-demand roster β€” task 1540's spec was written before #D1 roster correction landed; SWE faithfully followed stale spec; this commit reverts the README portion only, keeps the install.sh stub
  • f0567c3 delete legacy plugin/.claude/ tree (31 files: 10 agents + 7 rules + 11 skills + 2 skills-gallery + settings.json) and rewrite CLAUDE.md (per task 1535 spec β€” same stale-roster issue)
  • dbab7d3 correct CLAUDE.md to two-tier roster: gatekeeper + prompt-engineer global, ceo/cto/architect/swe/pr-reviewer as project placeholders, on-demand via agent-creator. Adds Persistence section.

What this PR ships

  • βœ… Plugin in 100% native Claude Code 2026 layout (.claude-plugin/plugin.json, agents/, templates/agents/, skills/<name>/SKILL.md, hooks/hooks.json, monitors/, mcp/trajectory-server/)
  • βœ… Legacy .claude/ tree fully removed
  • βœ… README and CLAUDE.md aligned to the two-tier roster model
  • βœ… install.sh deprecation stub points users to /plugin marketplace add + /plugin install

Important β€” symlink consequence

Deleting plugin/.claude/ breaks the TMB/.claude β†’ plugin/.claude symlink that the TMB workspace was using to dogfood. Post-merge action required:

# In a fresh terminal, from TMB workspace:
rm /Users/Zax/Git/GitHub/TMB/.claude
cd /Users/Zax/Git/GitHub/TMB/plugin
# (start a fresh claude session)
# Then in Claude Code:
/plugin marketplace add ./plugin       # local-path install
/plugin install tmb@trustmybot
/reload-plugins

This is the live dogfood install (originally task 1545, deferred to a human-driven step since it requires interactive /plugin install commands).

Deferred

  • Task 1545 (formal dogfood smoke test) β€” deferred to manual user action above. Spawning SWE to do it would have required interactive /plugin install, which subagents can't drive.

Test plan

  • ls plugin/.claude/ returns "No such file" (legacy tree gone)
  • cat plugin/CLAUDE.md shows two-tier roster (gatekeeper + prompt-engineer global; ceo/cto/architect/swe/pr-reviewer placeholders; on-demand)
  • bash plugin/install.sh exits 1 with deprecation notice
  • cat plugin/README.md shows corrected install commands and roster model
  • Manual: /plugin install tmb@trustmybot from a fresh project + spawn /gatekeeper + complete one trivial workflow loop β€” please do this in a fresh session post-merge

πŸ€– Generated with Claude Code

ZaxShen and others added 4 commits April 21, 2026 17:51
README documents /plugin marketplace add + /plugin install as the
install path. install.sh reduced to a 5-line deprecation stub that
prints the new commands and exits 1. Ready for v0.3 removal.
Task 1540's spec was written before the gatekeeper-roster correction
(see bro/PLUGIN_BUGS.md #D1). SWE faithfully followed the stale spec
which advertised a 5-fixed-global roster (secretary/architect/swe/
pr-reviewer/prompt-engineer). That regresses the corrected README
shipped in commit eb27be4.

Restoring README to the corrected 2-global (gatekeeper +
prompt-engineer) + 5-project-placeholder (ceo/cto/architect/swe/
pr-reviewer) + on-demand model. Keep 1540's install.sh deprecation
stub change as-is (that part was correct).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…v0.2

The 10-agent roster + old hook settings are fully superseded by the
native plugin layout (plugin/agents/, plugin/skills/, plugin/hooks/).
CLAUDE.md now describes the 5 core agents with the agent-creator
capability for domain agents.
…gineer global, 5 placeholders seeded per project)

Task 1535's CLAUDE.md rewrite used a stale 5-fixed-global roster
(secretary/architect/swe/pr-reviewer/prompt-engineer + secretary
hybrid-named "gatekeeper"). Per the corrected model in
bro/PLUGIN_BUGS.md #D1:

- Global tier (plugin ships): gatekeeper, prompt-engineer.
- Project tier (seeded per project): ceo, cto, architect, swe, pr-reviewer.
- On-demand: gatekeeper drives agent-creator flow with explicit user
  approval per new agent.
- pm/gtm/designer NOT in plugin (TMB team internal only).

Adds a Persistence section describing the bundled SQLite trajectory MCP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZaxShen ZaxShen merged commit b883288 into dev Apr 22, 2026
@ZaxShen ZaxShen deleted the feature/v0.2-phase5 branch April 22, 2026 06:52
ZaxShen added a commit that referenced this pull request Apr 24, 2026
…s: [X]'

Node `I` had `skills: [X]` inside an unquoted Mermaid label, which the
parser read as nested square brackets and bailed on. The whole flow
#5 (Skill Creation) failed to render on GitHub.

- Wrapped every label with literal/risky characters in double quotes
  (Mermaid spec for safe label content).
- Escaped the inner brackets in node I to HTML entities (&#91; / &#93;)
  so the YAML frontmatter syntax still reads correctly.
- Also escaped < as &lt; in node B's "fires < 20% of sessions?" since
  unquoted < can be parsed as opening an HTML tag.

No semantic change β€” same flow, now actually renderable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request May 20, 2026
…nts (#5)

l5_run_claude (flow-helpers.sh) and l5_run_arm (ab-helpers.sh) now invoke
`claude --output-format stream-json --include-hook-events --include-partial-messages
--verbose -p <prompt>` and pipe the JSONL output to <project>/trajectory.jsonl.

The previous text-grep capture path is replaced by structured per-message
events: assistant.message.usage gives token counts including main-thread
(no more #135 lower-bound). assistant.message.content[].name (where type ==
'tool_use') gives tool calls directly β€” no more debug_trajectory dependency.

Slim stderr summary line replaces the per-line text echo (assistant_msgs +
duration_ms shown for log triage).
ZaxShen added a commit that referenced this pull request May 20, 2026
l5_score_cost, l5_score_trajectory_required, l5_score_trajectory_forbidden
now read from <project>/trajectory.jsonl via jq instead of querying
agent_runs / debug_trajectory.

- cost: sums assistant.message.usage.{input,output}_tokens across the JSONL.
  Captures BOTH main-thread and subagent tokens β€” the #135 caveat from
  TRU-76's stopgap is resolved.
- trajectory_required + trajectory_forbidden: extract tool_use names from
  assistant content blocks. No env-coupling on TMB_DEBUG_TRAJECTORY.
- outcome: unchanged (SQL on the DB per spec).

Closes #5 (TRU-82). Supersedes the agent_runs cost source from #4 and
the main-thread capture follow-up from #135.
ZaxShen added a commit that referenced this pull request May 20, 2026
πŸ› οΈ feat(test): switch L5/A/B capture to stream-json + jq scorers (#5)

See merge request trustmybot/plugin!54
ZaxShen added a commit that referenced this pull request May 20, 2026
…+ retry-on-failed doctrine

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ZaxShen added a commit that referenced this pull request May 20, 2026
πŸ“ fix(153 #5-7): planning-difficult Step 0 + prescan greenfield arch + retry doctrine

See merge request trustmybot/plugin!104
ZaxShen added a commit that referenced this pull request May 20, 2026
…ade)

Two new optional scorers any L5 flow can opt into. Pure shell + sqlite3 +
git plumbing β€” no LLM, no new deps. Catches the failure modes today's L5
misses (the empty-table + base-branch-contamination cases Daisy hit on
2026-05).

## outcome-coherence.json

Table-shape assertions. Catches "empty discussions after a planning
flow" without each flow author having to spell out the SQL.

```json
{
  "expected_writes": {
    "issues":      ">=1",
    "tasks":       ">=1",
    "discussions": ">=1",
    "audit":       ">=2",
    "tasks WHERE branch_id != 'dev'": ">=1"
  }
}
```

Operators: `>=N`, `<=N`, `=N`, `!=N`, or bare `N` (= exact). Optional
`WHERE <clause>` suffix on the key targets specific row shape.

## outcome-git.json

Git-state assertions. Catches "bro committed to dev directly" /
"worktree on detached HEAD" / "uncommitted slop in worktree".

```json
{
  "base_branch_unchanged":   true,
  "uncommitted_in_worktree": false,
  "worktrees": [
    { "path": ".claude/worktrees/<slug>",
      "head_branch": "<task.branch_id>",
      "head_not_branch": ["dev", "main"] }
  ]
}
```

`base_branch_unchanged` works by snapshot β€” `l5_run_claude` writes
`.claude/tmb/_l5_pre_run_git.json` with the base SHA before bro fires;
the scorer compares post-run. This isolates "bro committed during the
run" from setup-time commits the flow's run.sh made beforehand.

`<slug>` and `<task.branch_id>` placeholders auto-resolve from the
most-recent tasks row.

## Integration

`l5_score_flow` now calls 7 scorers β€” added `l5_score_coherence` and
`l5_score_git` to the existing 5. Both opt-in; missing config files =
silently skipped. All 19 existing flows continue to pass without
changes.

## Verification

- `tests/dogfood/lib/scorers-test.sh` β€” 15 unit tests covering pass/fail
  paths (operator parsing, WHERE clauses, snapshot comparison, worktree
  HEAD checks). Wired into `tests/run-all.sh` as an L3 check.
- L5 flow 13-bulk-cleanup opted in to both as proof-of-concept; passes
  end-to-end with all 7 scorers green.

## What's next (per tests/EVALUATION.md)

- MR #2: backfill coherence/git across the other 18 L5 flows
- MR #3: Phase 2 multi-turn driver
- MR #4: L6 layer
- MR #5: retire the headless fast-path (#2867)
- MR #6: Phase 3 LLM-as-judge

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant