🧪 feat(tests): L5+L6 combined Docker harness — replaces manual L5 (#112)#113
Merged
Conversation
Replaces manual L5 dogfood with an automated Docker image that simulates CC's marketplace install path THEN runs L6 deterministic-trajectory flows against the marketplace-installed plugin. Catches BOTH: - Install-path bugs (L0's surface — dist/, native bindings, MCP cold spawn) - Workflow doctrine bugs (L6's surface — does bro do the right thing?) In ONE Docker build. Release-only (token-heavy: ~$1-3 per full run). ## What's new - tests/docker/l5-l6-combined.Dockerfile - bun install --ignore-scripts (CC's actual install behavior) - Stages plugin at ~/.claude/plugins/cache/trustmybot/tmb/<version>/ - Hard install assertions (dist/, schema.sql present) - npm install -g @anthropic-ai/claude-code - Runs tests/dogfood/run-l6.sh inside container with token from BuildKit secret - tests/docker/run-l5-l6-combined.sh — local wrapper - With token: full L0 + L6 run - Without token: install-only check (L0 piece), L6 skipped cleanly - .github/workflows/l5-l6-combined.yml — release-only CI - Triggers: tag pushes (every release), workflow_dispatch - Token passed as Docker BuildKit secret, NOT baked into image layers - Soft-fails if secret absent ## Why release-only Per user direction: token-heavy tests run per-release, NOT per-PR. Cost is amortized across releases (one run per tag), trading per-run cost for elimination of the 30-45 min manual L5 dogfood per release. ## Caveats / next steps - Verify the marketplace cache layout matches CC's actual Linux behavior (might differ slightly from macOS) on first CI run - After this lands, manual L5 scenarios.md can be reduced to UX-only items ## Vision User direction (2026-04-26): "automating 99% test cases across all layers by deterministic methods, leading to minimizing our manual tests and making the testing framework standard." L0 + L1-L4 + L6 + L5+L6-combined = automated coverage of every regression class we've ever shipped. Manual L5 stays as a thin layer for genuine UX judgment that automation can't capture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
#113) Subagents (pr-reviewer, swe in worktree) inherit cwd=~ and lack TRAJECTORY_DB_PATH env, so walk-up fails to find the workspace DB. Writes a sentinel at ~/.claude/tmb-active-workspace at SessionStart pointing at the workspace path. tmb_db_path reads the sentinel as a priority resolver before walk-up. Layer 2 fix from #113's three-layer plan. Layer 1 (ToolSearch in spawn prompt) was invalidated empirically; Layer 3 (upstream filing) remains queued.
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
🔥 fix(hooks): sentinel-file env propagation for subagent DB resolution (#113) See merge request trustmybot/plugin!33
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
…undation) New library at scripts/lib/sqlite3-fallback.sh exposes 6 wrapper functions covering the most-used MCP write tools (validation_record, task_update_status, discussion_append, ledger_log, issue_close, file_registry_update_summaries). Each wrapper validates role, writes via sqlite3 directly, and logs a synthetic ledger row (event_type=mcp_unavailable_fallback_invoked) for audit integrity. Formalizes the pattern bro has been using manually since #97/#113 subagent-MCP-availability issues surfaced. Doctrine shift from "writes stay blocked under fallback" to "writes via fallback are sanctioned with audit trail" — see #100 discussion id=53 for the architectural Q+A.
ZaxShen
added a commit
that referenced
this pull request
May 20, 2026
…#118 + #119) Two tightly-coupled fixes from 2026-04-29 L0-L5 verification: - #118: scripts/hooks/write-active-workspace-sentinel.sh was committed at mode 100644 (sibling hooks are 100755). Production regression from #113. Chmod fix. - #119: L0 install-smoke executable-check used (echo FAIL && exit 1) in a subshell, so build kept going past the FAIL. Replace with set -e + brace-group { ...; exit 1; } so exit propagates. Without #119, #118 would have shipped silently (the build didn't fail when the bug existed). Pairing them keeps the verification chain honest going forward.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #112.
Summary
Replaces manual L5 dogfood for everything except UX-only verification. Builds a Docker image that:
bun install --ignore-scripts→ stages at~/.claude/plugins/cache/trustmybot/tmb/<version>/)Catches BOTH install-path bugs AND workflow doctrine bugs in ONE Docker build.
Per user direction
This PR is the load-bearing piece of that vision.
Files
tests/docker/l5-l6-combined.Dockerfile— combined install + claude + L6 flowstests/docker/run-l5-l6-combined.sh— local convenience wrapper (BuildKit secret for token).github/workflows/l5-l6-combined.yml— release-only CI (tag pushes + manual dispatch)Token security
CLAUDE_CODE_OAUTH_TOKENpassed via Docker BuildKit secret (mounted at/run/secrets/cc_token), NOT baked into image layers. Standard Docker secret pattern.When this runs
v*)workflow_dispatchdev/mainWhen secret is absent
Workflow soft-fails: L0 install piece runs, L6 piece skips with
::warning::notice. Forks / external PRs don't break red.Coverage matrix after this lands
Test plan
workflow_dispatchfrom any branchCaveats
~/.claude/plugins/cache/trustmybot/tmb/<version>/is the exact layout CC expects on Linux. Might need adjustment.claude -pto work in headless Docker — same unverified assumption from Automate L5 dogfood as L6: deterministic-trajectory tests in Docker with real Claude Code #108.Next steps after merge
tests/manual/scenarios.mdto remove items now covered by automation🤖 Generated with Claude Code