🔍 debug(l6): expose claude -p output to CI logs (#116) by ZaxShen · Pull Request #117 · trustmybot/plugin

ZaxShen · 2026-04-26T18:35:36Z

Diagnostic PR for #116. Doesn't fix the L6 failure — gathers info to identify which layer breaks.

Problem (#116)

First L6 run on dev produced 0 trajectory rows + 0 tokens for all 4 wired flows. Runner suppressed claude's output via `>/dev/null` — we have no idea what claude was doing.

What this PR adds

Strip output suppression from 16 flow scripts (`l6_run_claude ... >/dev/null` → `l6_run_claude ...`)
Wrap claude output with `[claude]` prefix on CI's stderr
Pre-flight diagnostic steps in `run-l6.sh`:
- `claude --version` (binary present?)
- `claude -p "say hello in one word"` (auth works in -p mode?)
- `claude --plugin-dir -p "say hi in one word"` (plugin loads in -p?)

What we expect to learn

After CI runs this on dev, the logs will show:

If pre-flight ✨ v0.2.0 Phase 2 — SQLite MCP trajectory server #1 fails → `claude` not installed in CI
If pre-flight 🐛 Phase 2.5 — fix 3 MCP correctness bugs from PR #1 review #2 fails → auth broken (token format / not recognized)
If pre-flight 🐛 Phase 2.5 — fix 3 MCP correctness bugs from PR #1 review #2 succeeds + ✨ v0.2.0 Phase 3 — two-tier agent roster + skills ported to native layout #3 fails → `--plugin-dir` doesn't work in -p mode
If both succeed but flows still fail → bro persona trigger not engaging in -p mode

Each finding has a different fix.

Cost

~$0 if any pre-flight fails (no completion). ~$0.30 if all 3 pre-flights succeed + 4 wired flows run.

After this

Based on the diagnostic output, follow-up PR (or comment on #116) to fix the broken layer. May lead to:

L6 v3 with a different invocation pattern
Falling back to L5+L6 combined Docker (which uses marketplace install path, not `--plugin-dir`)
Documenting that L6 requires interactive mode

🤖 Generated with Claude Code

First L6 run on dev (run 24963880924) failed because all 4 wired flows produced 0 trajectory rows + 0 tokens. The runner suppressed claude's stdout (`>/dev/null`) so we had no diagnostic info. This PR: 1. Strips `>/dev/null` redirect from all 16 flow scripts so claude's output flows to CI logs 2. Updates l6_run_claude to wrap output with [claude] prefix on stderr 3. Adds pre-flight diagnostics in run-l6.sh that test claude in isolation BEFORE the flow tests: a. `claude --version` (binary present?) b. `claude -p "say hello in one word"` (basic auth works in -p mode?) c. `claude --plugin-dir <root> -p "say hi in one word"` (plugin loads in -p mode, no bro engagement?) After CI runs this, we'll see which layer breaks. Common possibilities: - Auth: token wasn't recognized in -p mode - -p mode: doesn't engage @bro persona trigger - Plugin loading: --plugin-dir incompatible with -p - MCP: server didn't spawn in -p mode Cost: still ~$0 if all layers fail (no tokens consumed). Up to ~$0.30 if pre-flight succeeds and flows run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#117) The test "silent no-op when workspace not detected" assumed walk-up failure as the only path to no-workspace, but #113's sentinel resolver (at ~/.claude/tmb-active-workspace) added a second path that the test didn't account for. Fix: isolate HOME to a tmpdir for this case.

…dev' 🐛 fix(tests): HOME isolation for session-log-capture silent no-op test (#117) See merge request trustmybot/plugin!36

ZaxShen merged commit 8a079b6 into dev Apr 26, 2026
2 checks passed

ZaxShen deleted the debug/116-claude-p-headless-diagnostics branch April 26, 2026 18:37

ZaxShen added a commit that referenced this pull request May 20, 2026

Merge branch 'fix/117-session-log-capture-test-home-isolation' into '…

7ee6854

…dev' 🐛 fix(tests): HOME isolation for session-log-capture silent no-op test (#117) See merge request trustmybot/plugin!36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔍 debug(l6): expose claude -p output to CI logs (#116)#117

🔍 debug(l6): expose claude -p output to CI logs (#116)#117
ZaxShen merged 1 commit into
devfrom
debug/116-claude-p-headless-diagnostics

ZaxShen commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZaxShen commented Apr 26, 2026

Problem (#116)

What this PR adds

What we expect to learn

Cost

After this

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant