Skip to content

πŸ” debug(l6): expose claude -p output to CI logs (#116)#117

Merged
ZaxShen merged 1 commit into
devfrom
debug/116-claude-p-headless-diagnostics
Apr 26, 2026
Merged

πŸ” debug(l6): expose claude -p output to CI logs (#116)#117
ZaxShen merged 1 commit into
devfrom
debug/116-claude-p-headless-diagnostics

Conversation

@ZaxShen
Copy link
Copy Markdown
Contributor

@ZaxShen ZaxShen commented Apr 26, 2026

Diagnostic PR for #116. Doesn't fix the L6 failure β€” gathers info to identify which layer breaks.

Problem (#116)

First L6 run on dev produced 0 trajectory rows + 0 tokens for all 4 wired flows. Runner suppressed claude's output via `>/dev/null` β€” we have no idea what claude was doing.

What this PR adds

  1. Strip output suppression from 16 flow scripts (`l6_run_claude ... >/dev/null` β†’ `l6_run_claude ...`)
  2. Wrap claude output with `[claude]` prefix on CI's stderr
  3. Pre-flight diagnostic steps in `run-l6.sh`:
    • `claude --version` (binary present?)
    • `claude -p "say hello in one word"` (auth works in -p mode?)
    • `claude --plugin-dir -p "say hi in one word"` (plugin loads in -p?)

What we expect to learn

After CI runs this on dev, the logs will show:

Each finding has a different fix.

Cost

~$0 if any pre-flight fails (no completion). ~$0.30 if all 3 pre-flights succeed + 4 wired flows run.

After this

Based on the diagnostic output, follow-up PR (or comment on #116) to fix the broken layer. May lead to:

  • L6 v3 with a different invocation pattern
  • Falling back to L5+L6 combined Docker (which uses marketplace install path, not `--plugin-dir`)
  • Documenting that L6 requires interactive mode

πŸ€– Generated with Claude Code

First L6 run on dev (run 24963880924) failed because all 4 wired flows
produced 0 trajectory rows + 0 tokens. The runner suppressed claude's
stdout (`>/dev/null`) so we had no diagnostic info.

This PR:
1. Strips `>/dev/null` redirect from all 16 flow scripts so claude's
   output flows to CI logs
2. Updates l6_run_claude to wrap output with [claude] prefix on stderr
3. Adds pre-flight diagnostics in run-l6.sh that test claude in
   isolation BEFORE the flow tests:
   a. `claude --version` (binary present?)
   b. `claude -p "say hello in one word"` (basic auth works in -p mode?)
   c. `claude --plugin-dir <root> -p "say hi in one word"` (plugin
      loads in -p mode, no bro engagement?)

After CI runs this, we'll see which layer breaks. Common possibilities:
- Auth: token wasn't recognized in -p mode
- -p mode: doesn't engage @bro persona trigger
- Plugin loading: --plugin-dir incompatible with -p
- MCP: server didn't spawn in -p mode

Cost: still ~$0 if all layers fail (no tokens consumed). Up to ~$0.30
if pre-flight succeeds and flows run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZaxShen ZaxShen merged commit 8a079b6 into dev Apr 26, 2026
2 checks passed
@ZaxShen ZaxShen deleted the debug/116-claude-p-headless-diagnostics branch April 26, 2026 18:37
ZaxShen added a commit that referenced this pull request May 20, 2026
#117)

The test "silent no-op when workspace not detected" assumed walk-up
failure as the only path to no-workspace, but #113's sentinel resolver
(at ~/.claude/tmb-active-workspace) added a second path that the test
didn't account for. Fix: isolate HOME to a tmpdir for this case.
ZaxShen added a commit that referenced this pull request May 20, 2026
…dev'

πŸ› fix(tests): HOME isolation for session-log-capture silent no-op test (#117)

See merge request trustmybot/plugin!36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant