From 9d9713f43c0152fa400d33de5394a535aacba6f3 Mon Sep 17 00:00:00 2001 From: Zax Shen Date: Sun, 26 Apr 2026 02:12:57 -0700 Subject: [PATCH 1/2] =?UTF-8?q?=F0=9F=A7=AA=20feat(tests):=20L6=20determin?= =?UTF-8?q?istic-trajectory=20dogfood=20+=20opt-in=20capture=20(#108)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Manual L5 dogfood was the release bottleneck. L6 automates it: pre-seed DB → run real `claude -p` → assert MCP/tool sequence matches the expected from FLOWS.md. ## What's new **Schema** — `debug_trajectory` table (15th table). Off by default — populated only when env `TMB_DEBUG_TRAJECTORY=1`. Zero overhead in production. Schema version stays at 1 (additive). **Capture** — - MCP server writes a row per MCP call when env is set (src/index.ts wrapper) - New PreToolUse hook `scripts/hooks/debug-trajectory.sh` (matcher: "*") writes rows for non-MCP calls (Bash, Read, Write, Edit, Task, Skill) **Test infra** at `tests/dogfood/`: - `run-l6.sh` runner - `lib/flow-helpers.sh` shared helpers - 16 flow scripts in `flows/` (4 fully wired, 12 scaffolded) - `fixtures/` pre-seed SQL (empty, onboarding-named, onboarding-anonymous) - `expected/` expected-trajectory files **4 wired flows**: 01-onboarding, 02-simple-task, D-direct-mode (with hard invariants on no-task-spawn + direct_mode_used event), 95-anonymous-cold-restart (with assertions that no re-onboarding writes happen — locks #95 regression). **12 scaffolded flows** auto-skip until their expected-trajectory file is authored. Pattern is copy/paste — each follow-up is ~30 lines. **CI** at `.github/workflows/l6-dogfood.yml` — triggers on tag pushes, PRs labeled `L6`, manual dispatch. Soft-fails when CLAUDE_CODE_OAUTH_TOKEN secret is absent (forks won't break). Uploads trajectory dumps on failure. **Stale doctrine cleanup** (audit): - Onboarding skill: fixed `tmb_bootstrap_complete` → `tmb_onboarding_complete` - Agent-creator skill: dropped tmb_bootstrap ref (skill is gone) - Plugin CLAUDE.md: removed retirement-in-progress note for tmb_bootstrap ## Unverified assumption (flagged in #108) `claude -p` mode behavior with AskUserQuestion. If form auto-fails in headless mode, the onboarding flow trajectory is shorter than expected — that's a real signal to file as follow-up. ## Tests 2 new schema unit tests (table presence + columns + index). All L1-L4 green. L0 will run in CI. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/l6-dogfood.yml | 72 +++++++++++ CHANGELOG.md | 42 +++++++ CLAUDE.md | 2 - hooks/hooks.json | 10 ++ mcp/trajectory-server/dist/index.js | 34 +++++- mcp/trajectory-server/dist/index.js.map | 2 +- mcp/trajectory-server/dist/schema.sql | 21 ++++ mcp/trajectory-server/dist/test/db.test.js | 3 +- .../dist/test/db.test.js.map | 2 +- .../dist/test/schema.test.js | 30 ++++- .../dist/test/schema.test.js.map | 2 +- mcp/trajectory-server/src/index.ts | 42 ++++++- mcp/trajectory-server/src/schema.sql | 21 ++++ mcp/trajectory-server/src/test/db.test.ts | 3 +- mcp/trajectory-server/src/test/schema.test.ts | 42 ++++++- scripts/hooks/debug-trajectory.sh | 62 ++++++++++ skills/tmb_agent-creator/SKILL.md | 2 +- skills/tmb_first-run-onboarding/SKILL.md | 3 +- tests/README.md | 61 +++++++--- tests/dogfood/expected/01-onboarding.txt | 7 ++ tests/dogfood/expected/02-simple-task.txt | 8 ++ .../expected/95-anonymous-cold-restart.txt | 3 + tests/dogfood/expected/D-direct-mode.txt | 6 + tests/dogfood/fixtures/empty.sql | 2 + .../dogfood/fixtures/onboarding-anonymous.sql | 18 +++ tests/dogfood/fixtures/onboarding-named.sql | 18 +++ tests/dogfood/flows/01-onboarding.test.sh | 26 ++++ tests/dogfood/flows/02-simple-task.test.sh | 21 ++++ tests/dogfood/flows/03-difficult-task.test.sh | 27 +++++ tests/dogfood/flows/04-agent-creator.test.sh | 27 +++++ tests/dogfood/flows/05-skill-creation.test.sh | 27 +++++ tests/dogfood/flows/06-push-gate.test.sh | 27 +++++ .../flows/07-architecture-regen.test.sh | 27 +++++ tests/dogfood/flows/08-swe-retry.test.sh | 27 +++++ tests/dogfood/flows/09-roundtable.test.sh | 27 +++++ tests/dogfood/flows/32-team-config.test.sh | 27 +++++ tests/dogfood/flows/92-base-branch.test.sh | 27 +++++ tests/dogfood/flows/94-arch-bootstrap.test.sh | 27 +++++ .../flows/95-anonymous-cold-restart.test.sh | 32 +++++ tests/dogfood/flows/96-halt-on-error.test.sh | 27 +++++ tests/dogfood/flows/C-consultant.test.sh | 27 +++++ tests/dogfood/flows/D-direct-mode.test.sh | 40 ++++++ tests/dogfood/lib/flow-helpers.sh | 114 ++++++++++++++++++ tests/dogfood/run-l6.sh | 73 +++++++++++ 44 files changed, 1120 insertions(+), 28 deletions(-) create mode 100644 .github/workflows/l6-dogfood.yml create mode 100755 scripts/hooks/debug-trajectory.sh create mode 100644 tests/dogfood/expected/01-onboarding.txt create mode 100644 tests/dogfood/expected/02-simple-task.txt create mode 100644 tests/dogfood/expected/95-anonymous-cold-restart.txt create mode 100644 tests/dogfood/expected/D-direct-mode.txt create mode 100644 tests/dogfood/fixtures/empty.sql create mode 100644 tests/dogfood/fixtures/onboarding-anonymous.sql create mode 100644 tests/dogfood/fixtures/onboarding-named.sql create mode 100755 tests/dogfood/flows/01-onboarding.test.sh create mode 100755 tests/dogfood/flows/02-simple-task.test.sh create mode 100755 tests/dogfood/flows/03-difficult-task.test.sh create mode 100755 tests/dogfood/flows/04-agent-creator.test.sh create mode 100755 tests/dogfood/flows/05-skill-creation.test.sh create mode 100755 tests/dogfood/flows/06-push-gate.test.sh create mode 100755 tests/dogfood/flows/07-architecture-regen.test.sh create mode 100755 tests/dogfood/flows/08-swe-retry.test.sh create mode 100755 tests/dogfood/flows/09-roundtable.test.sh create mode 100755 tests/dogfood/flows/32-team-config.test.sh create mode 100755 tests/dogfood/flows/92-base-branch.test.sh create mode 100755 tests/dogfood/flows/94-arch-bootstrap.test.sh create mode 100755 tests/dogfood/flows/95-anonymous-cold-restart.test.sh create mode 100755 tests/dogfood/flows/96-halt-on-error.test.sh create mode 100755 tests/dogfood/flows/C-consultant.test.sh create mode 100755 tests/dogfood/flows/D-direct-mode.test.sh create mode 100644 tests/dogfood/lib/flow-helpers.sh create mode 100755 tests/dogfood/run-l6.sh diff --git a/.github/workflows/l6-dogfood.yml b/.github/workflows/l6-dogfood.yml new file mode 100644 index 00000000..2d927807 --- /dev/null +++ b/.github/workflows/l6-dogfood.yml @@ -0,0 +1,72 @@ +name: L6 dogfood (deterministic trajectory) + +# L6 runs real Claude Code through pre-seeded TMB flows and asserts the +# resulting MCP/tool trajectory matches FLOWS.md. Issue #108. +# +# Triggers: +# - manual via workflow_dispatch (always available) +# - tag pushes (every release gets a green/red signal) +# - PR labeled `L6` (opt-in for risky doctrine changes) +# +# Skipped on forks where the secret isn't available — the secret-presence +# check fails-soft instead of breaking the run. +# +# Security note: untrusted PR-injected strings (titles, bodies, etc.) are +# never interpolated into shell commands. Only secrets and trusted runner +# inputs flow into `run:` blocks via env vars. + +on: + workflow_dispatch: + push: + tags: + - 'v*' + pull_request: + types: [labeled] + +jobs: + l6-dogfood: + if: ${{ github.event_name != 'pull_request' || github.event.label.name == 'L6' }} + runs-on: ubuntu-latest + timeout-minutes: 30 + steps: + - uses: actions/checkout@v4 + + - name: Verify secret is present + env: + TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + run: | + if [ -z "$TOKEN" ]; then + echo "::warning::CLAUDE_CODE_OAUTH_TOKEN repo secret not set — L6 cannot run." + echo "Add the secret in Settings → Secrets and variables → Actions." + exit 0 + fi + echo "Secret present." + + - name: Setup Node 22 + uses: actions/setup-node@v4 + with: + node-version: '22' + + - name: Setup Bun + uses: oven-sh/setup-bun@v2 + + - name: Install plugin deps + build dist/ + run: bun install --frozen-lockfile + + - name: Install Claude Code CLI + run: | + npm install -g @anthropic-ai/claude-code + claude --version + + - name: Run L6 dogfood flows + env: + CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + run: bash tests/dogfood/run-l6.sh + + - name: Upload trajectory dumps on failure + if: failure() + uses: actions/upload-artifact@v4 + with: + name: l6-trajectory-dumps + path: /tmp/tmb-l6-*/ + retention-days: 7 diff --git a/CHANGELOG.md b/CHANGELOG.md index 6790f71a..1137ce5f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -131,6 +131,48 @@ Net: 25 labels → 18. All open issues auto-relabeled in place. This is the **second** label migration in the v0.4.1 pre-stable window. Acceptable because no public consumers depend on the names yet — the rc channel hasn't promoted to stable. +### Added — L6 deterministic-trajectory tests + opt-in debug_trajectory schema (issue #108) + +Manual L5 dogfood was the release bottleneck. L6 automates it by pre-seeding DB state, running real `claude -p`, and asserting the resulting MCP/tool trajectory matches the expected sequence from `FLOWS.md`. New layer in the test pyramid; existing L0–L5 unchanged. + +**New schema table** `debug_trajectory` (15th table): +- Columns: `session_id`, `step_n`, `kind` (`mcp_call`/`tool_use`), `agent`, `tool_or_mcp_name`, `args_json`, `result_json`, `is_error`, `created_at` +- **Off by default — populated only when env `TMB_DEBUG_TRAJECTORY=1`.** Zero overhead in production. +- Schema version stays at 1 (additive change). + +**Capture wiring**: +- MCP server (`src/index.ts`) writes a row per MCP tool call when env is set +- New PreToolUse hook `scripts/hooks/debug-trajectory.sh` (`matcher: "*"`) writes a row per non-MCP tool call (Bash/Read/Write/Edit/Task/Skill) + +**Test infrastructure**: +- `tests/dogfood/run-l6.sh` runner — checks env + tools, dispatches to flow scripts +- `tests/dogfood/lib/flow-helpers.sh` — shared helpers (`l6_setup_scratch_project`, `l6_seed_db`, `l6_run_claude`, `l6_assert_trajectory`) +- `tests/dogfood/flows/` — 16 flow scripts (4 fully wired, 12 scaffold) +- `tests/dogfood/fixtures/` — pre-seed SQL (empty, onboarding-named, onboarding-anonymous) +- `tests/dogfood/expected/` — expected-trajectory files (one MCP/tool call per line) + +**4 fully wired flows** (have expected-trajectory files): +- `01-onboarding` — first-run identity + config writes +- `02-simple-task` — code-touching ask → triage simple → SWE spawn +- `D-direct-mode` — ≤3-line typo fix → Edit + commit, no SWE spawn (with hard invariant assertions) +- `95-anonymous-cold-restart` — regression for #95; cold session must skip re-onboarding + +**12 scaffolded flows** (auto-skip until expected-trajectory authored): `03-difficult-task`, `04-agent-creator`, `05-skill-creation`, `06-push-gate`, `07-architecture-regen`, `08-swe-retry`, `09-roundtable`, `C-consultant`, `32-team-config`, `92-base-branch`, `94-arch-bootstrap`, `96-halt-on-error`. + +**CI workflow** `.github/workflows/l6-dogfood.yml`: +- Triggers: tag pushes, PRs labeled `L6`, manual dispatch +- Soft-fails when `CLAUDE_CODE_OAUTH_TOKEN` secret is absent (forks won't break red) +- Uploads trajectory dumps as artifacts on failure + +**Stale doctrine cleanup** (per the migration audit): +- Onboarding skill: fixed event_type from stale `tmb_bootstrap_complete` → `tmb_onboarding_complete`; dropped reference to "file copies" (swe + pr-reviewer ship globally) +- Agent-creator skill: dropped `tmb_bootstrap` reference (skill is gone in v0.3.0+) +- Plugin CLAUDE.md: removed the "tmb_bootstrap is being retired" sentence (it's already retired) + +**Unverified assumption flagged in the issue**: `claude -p` mode behavior with `AskUserQuestion`. If the form auto-fails in headless mode, that surfaces as a trajectory-shorter-than-expected failure on the onboarding flow — a real signal to address. + +2 new schema tests (table presence + columns + index). All L1-L4 green. + --- ## v0.3.2 — 2026-04-25 diff --git a/CLAUDE.md b/CLAUDE.md index 49d94cc4..0a6cacdd 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -85,8 +85,6 @@ Other (non-policy) `plugin_config` keys may be written directly when the Human a 2. **Cache human_name** — use it when addressing the Human if set. Otherwise plain second-person; no honorifics. 3. **Resume check** — call `issue_resume(agent='bro')` to detect unfinished work. -There is no edge case for "swe.md missing" anymore — `swe` ships globally. The legacy `tmb_bootstrap` skill (recovery for hand-deleted local agents) is now unnecessary in v0.3.0+ and is being retired. - ## Code-touching asks (in addition to first-action chain) Default chain (most asks): diff --git a/hooks/hooks.json b/hooks/hooks.json index ab412204..a92dad45 100644 --- a/hooks/hooks.json +++ b/hooks/hooks.json @@ -37,6 +37,16 @@ "timeout": 5 } ] + }, + { + "matcher": "*", + "hooks": [ + { + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/scripts/hooks/debug-trajectory.sh", + "timeout": 3 + } + ] } ] } diff --git a/mcp/trajectory-server/dist/index.js b/mcp/trajectory-server/dist/index.js index 2b555691..0e2fa130 100644 --- a/mcp/trajectory-server/dist/index.js +++ b/mcp/trajectory-server/dist/index.js @@ -15,13 +15,45 @@ registerTools(server, db); server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: toolDefinitions, })); +// L6 trajectory capture (issue #108). Active only when TMB_DEBUG_TRAJECTORY=1. +// Session ID is per-server-spawn — covers a single `claude -p` invocation. +const debugTrajectoryEnabled = process.env['TMB_DEBUG_TRAJECTORY'] === '1'; +const debugSessionId = `${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +let debugStepCounter = 0; +function maybeRecordTrajectory(toolName, args, result) { + if (!debugTrajectoryEnabled) + return; + try { + const agentName = args?.agent ?? null; + const argsJson = JSON.stringify(args ?? {}).slice(0, 4000); + const firstContent = result.content?.[0]; + const resultText = firstContent && typeof firstContent.text === 'string' ? firstContent.text : ''; + const resultJson = JSON.stringify({ text: resultText.slice(0, 4000) }); + db.run(`INSERT INTO debug_trajectory + (session_id, step_n, kind, agent, tool_or_mcp_name, args_json, result_json, is_error, created_at) + VALUES (?, ?, 'mcp_call', ?, ?, ?, ?, ?, datetime('now'))`, [ + debugSessionId, + ++debugStepCounter, + agentName, + toolName, + argsJson, + resultJson, + result.isError ? 1 : 0, + ]); + } + catch { + // Trajectory capture must never break the actual tool call. + } +} server.setRequestHandler(CallToolRequestSchema, async (request) => { const { name, arguments: args } = request.params; const handler = toolHandlers[name]; if (!handler) { throw new Error(`Unknown tool: ${name}`); } - return handler(args ?? {}); + const result = await handler(args ?? {}); + maybeRecordTrajectory(name, args, result); + return result; }); process.on('SIGINT', () => { db.close(); diff --git a/mcp/trajectory-server/dist/index.js.map b/mcp/trajectory-server/dist/index.js.map index 33943212..91f175e1 100644 --- a/mcp/trajectory-server/dist/index.js.map +++ b/mcp/trajectory-server/dist/index.js.map @@ -1 +1 @@ -{"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,SAAS,EAAE,MAAM,SAAS,CAAC;AACpC,OAAO,EAAE,MAAM,EAAE,MAAM,2CAA2C,CAAC;AACnE,OAAO,EAAE,oBAAoB,EAAE,MAAM,2CAA2C,CAAC;AACjF,OAAO,EACL,sBAAsB,EACtB,qBAAqB,GACtB,MAAM,oCAAoC,CAAC;AAC5C,OAAO,EAAE,eAAe,EAAE,YAAY,EAAE,aAAa,EAAE,MAAM,kBAAkB,CAAC;AAChF,OAAO,EAAE,YAAY,EAAE,aAAa,EAAE,MAAM,SAAS,CAAC;AAEtD,MAAM,MAAM,GAAG,aAAa,EAAE,CAAC;AAC/B,IAAI,MAAM,KAAK,UAAU,EAAE,CAAC;IAC1B,SAAS,CAAC,IAAI,CAAC,OAAO,CAAC,MAAM,CAAC,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAC;AACvD,CAAC;AAED,MAAM,EAAE,GAAG,IAAI,YAAY,CAAC,MAAM,CAAC,CAAC;AAEpC,MAAM,MAAM,GAAG,IAAI,MAAM,CACvB,EAAE,IAAI,EAAE,mBAAmB,EAAE,OAAO,EAAE,OAAO,EAAE,EAC/C,EAAE,YAAY,EAAE,EAAE,KAAK,EAAE,EAAE,EAAE,EAAE,CAChC,CAAC;AAEF,aAAa,CAAC,MAAM,EAAE,EAAE,CAAC,CAAC;AAE1B,MAAM,CAAC,iBAAiB,CAAC,sBAAsB,EAAE,KAAK,IAAI,EAAE,CAAC,CAAC;IAC5D,KAAK,EAAE,eAAe;CACvB,CAAC,CAAC,CAAC;AAEJ,MAAM,CAAC,iBAAiB,CAAC,qBAAqB,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE;IAChE,MAAM,EAAE,IAAI,EAAE,SAAS,EAAE,IAAI,EAAE,GAAG,OAAO,CAAC,MAAM,CAAC;IACjD,MAAM,OAAO,GAAG,YAAY,CAAC,IAAI,CAAC,CAAC;IACnC,IAAI,CAAC,OAAO,EAAE,CAAC;QACb,MAAM,IAAI,KAAK,CAAC,iBAAiB,IAAI,EAAE,CAAC,CAAC;IAC3C,CAAC;IACD,OAAO,OAAO,CAAC,IAAI,IAAI,EAAE,CAAC,CAAC;AAC7B,CAAC,CAAC,CAAC;AAEH,OAAO,CAAC,EAAE,CAAC,QAAQ,EAAE,GAAG,EAAE;IACxB,EAAE,CAAC,KAAK,EAAE,CAAC;IACX,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;AAClB,CAAC,CAAC,CAAC;AAEH,OAAO,CAAC,EAAE,CAAC,SAAS,EAAE,GAAG,EAAE;IACzB,EAAE,CAAC,KAAK,EAAE,CAAC;IACX,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;AAClB,CAAC,CAAC,CAAC;AAEH,MAAM,SAAS,GAAG,IAAI,oBAAoB,EAAE,CAAC;AAC7C,MAAM,MAAM,CAAC,OAAO,CAAC,SAAS,CAAC,CAAC;AAEhC,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,uBAAuB,MAAM,KAAK,CAAC,CAAC"} \ No newline at end of file +{"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,SAAS,EAAE,MAAM,SAAS,CAAC;AACpC,OAAO,EAAE,MAAM,EAAE,MAAM,2CAA2C,CAAC;AACnE,OAAO,EAAE,oBAAoB,EAAE,MAAM,2CAA2C,CAAC;AACjF,OAAO,EACL,sBAAsB,EACtB,qBAAqB,GACtB,MAAM,oCAAoC,CAAC;AAC5C,OAAO,EAAE,eAAe,EAAE,YAAY,EAAE,aAAa,EAAE,MAAM,kBAAkB,CAAC;AAChF,OAAO,EAAE,YAAY,EAAE,aAAa,EAAE,MAAM,SAAS,CAAC;AAEtD,MAAM,MAAM,GAAG,aAAa,EAAE,CAAC;AAC/B,IAAI,MAAM,KAAK,UAAU,EAAE,CAAC;IAC1B,SAAS,CAAC,IAAI,CAAC,OAAO,CAAC,MAAM,CAAC,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAC;AACvD,CAAC;AAED,MAAM,EAAE,GAAG,IAAI,YAAY,CAAC,MAAM,CAAC,CAAC;AAEpC,MAAM,MAAM,GAAG,IAAI,MAAM,CACvB,EAAE,IAAI,EAAE,mBAAmB,EAAE,OAAO,EAAE,OAAO,EAAE,EAC/C,EAAE,YAAY,EAAE,EAAE,KAAK,EAAE,EAAE,EAAE,EAAE,CAChC,CAAC;AAEF,aAAa,CAAC,MAAM,EAAE,EAAE,CAAC,CAAC;AAE1B,MAAM,CAAC,iBAAiB,CAAC,sBAAsB,EAAE,KAAK,IAAI,EAAE,CAAC,CAAC;IAC5D,KAAK,EAAE,eAAe;CACvB,CAAC,CAAC,CAAC;AAEJ,+EAA+E;AAC/E,2EAA2E;AAC3E,MAAM,sBAAsB,GAAG,OAAO,CAAC,GAAG,CAAC,sBAAsB,CAAC,KAAK,GAAG,CAAC;AAC3E,MAAM,cAAc,GAAG,GAAG,IAAI,CAAC,GAAG,EAAE,IAAI,IAAI,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,EAAE,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,EAAE,CAAC;AACjF,IAAI,gBAAgB,GAAG,CAAC,CAAC;AAEzB,SAAS,qBAAqB,CAC5B,QAAgB,EAChB,IAAa,EACb,MAA+D;IAE/D,IAAI,CAAC,sBAAsB;QAAE,OAAO;IACpC,IAAI,CAAC;QACH,MAAM,SAAS,GAAI,IAAuC,EAAE,KAAK,IAAI,IAAI,CAAC;QAC1E,MAAM,QAAQ,GAAG,IAAI,CAAC,SAAS,CAAC,IAAI,IAAI,EAAE,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,CAAC;QAC3D,MAAM,YAAY,GAAG,MAAM,CAAC,OAAO,EAAE,CAAC,CAAC,CAAmC,CAAC;QAC3E,MAAM,UAAU,GACd,YAAY,IAAI,OAAO,YAAY,CAAC,IAAI,KAAK,QAAQ,CAAC,CAAC,CAAC,YAAY,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC;QACjF,MAAM,UAAU,GAAG,IAAI,CAAC,SAAS,CAAC,EAAE,IAAI,EAAE,UAAU,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,EAAE,CAAC,CAAC;QACvE,EAAE,CAAC,GAAG,CACJ;;iEAE2D,EAC3D;YACE,cAAc;YACd,EAAE,gBAAgB;YAClB,SAAS;YACT,QAAQ;YACR,QAAQ;YACR,UAAU;YACV,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC;SACvB,CACF,CAAC;IACJ,CAAC;IAAC,MAAM,CAAC;QACP,4DAA4D;IAC9D,CAAC;AACH,CAAC;AAED,MAAM,CAAC,iBAAiB,CAAC,qBAAqB,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE;IAChE,MAAM,EAAE,IAAI,EAAE,SAAS,EAAE,IAAI,EAAE,GAAG,OAAO,CAAC,MAAM,CAAC;IACjD,MAAM,OAAO,GAAG,YAAY,CAAC,IAAI,CAAC,CAAC;IACnC,IAAI,CAAC,OAAO,EAAE,CAAC;QACb,MAAM,IAAI,KAAK,CAAC,iBAAiB,IAAI,EAAE,CAAC,CAAC;IAC3C,CAAC;IACD,MAAM,MAAM,GAAG,MAAM,OAAO,CAAC,IAAI,IAAI,EAAE,CAAC,CAAC;IACzC,qBAAqB,CAAC,IAAI,EAAE,IAAI,EAAE,MAAM,CAAC,CAAC;IAC1C,OAAO,MAAM,CAAC;AAChB,CAAC,CAAC,CAAC;AAEH,OAAO,CAAC,EAAE,CAAC,QAAQ,EAAE,GAAG,EAAE;IACxB,EAAE,CAAC,KAAK,EAAE,CAAC;IACX,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;AAClB,CAAC,CAAC,CAAC;AAEH,OAAO,CAAC,EAAE,CAAC,SAAS,EAAE,GAAG,EAAE;IACzB,EAAE,CAAC,KAAK,EAAE,CAAC;IACX,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;AAClB,CAAC,CAAC,CAAC;AAEH,MAAM,SAAS,GAAG,IAAI,oBAAoB,EAAE,CAAC;AAC7C,MAAM,MAAM,CAAC,OAAO,CAAC,SAAS,CAAC,CAAC;AAEhC,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,uBAAuB,MAAM,KAAK,CAAC,CAAC"} \ No newline at end of file diff --git a/mcp/trajectory-server/dist/schema.sql b/mcp/trajectory-server/dist/schema.sql index b092a371..86c08aa3 100644 --- a/mcp/trajectory-server/dist/schema.sql +++ b/mcp/trajectory-server/dist/schema.sql @@ -164,3 +164,24 @@ CREATE TABLE IF NOT EXISTS regen_state ( last_seen_sha TEXT, notes TEXT NOT NULL DEFAULT '' ); + +-- L6 deterministic-trajectory test infrastructure (issue #108). +-- Populated ONLY when env TMB_DEBUG_TRAJECTORY=1. Off by default — zero +-- overhead in production. The L6 test runner pre-seeds DB state, runs +-- claude -p with the env set, then asserts the resulting trajectory +-- matches an expected sequence from FLOWS.md. +CREATE TABLE IF NOT EXISTS debug_trajectory ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + session_id TEXT NOT NULL, + step_n INTEGER NOT NULL, + kind TEXT NOT NULL, -- 'mcp_call' | 'tool_use' | 'agent_thinking' + agent TEXT, -- 'bro' | 'swe' | 'pr-reviewer' | NULL + tool_or_mcp_name TEXT NOT NULL, -- e.g. 'mcp__plugin_tmb_trajectory-server__identity_get' or 'Bash' + args_json TEXT NOT NULL DEFAULT '{}', + result_json TEXT NOT NULL DEFAULT '{}', + is_error INTEGER NOT NULL DEFAULT 0, + created_at TEXT NOT NULL DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_debug_trajectory_session + ON debug_trajectory(session_id, step_n); diff --git a/mcp/trajectory-server/dist/test/db.test.js b/mcp/trajectory-server/dist/test/db.test.js index 8dfc7c89..312de2ad 100644 --- a/mcp/trajectory-server/dist/test/db.test.js +++ b/mcp/trajectory-server/dist/test/db.test.js @@ -3,7 +3,7 @@ import assert from 'node:assert/strict'; import { tempDB } from './helpers.js'; import { nowISO, genId } from '../db.js'; describe('TrajectoryDB', () => { - it('opens an in-memory DB and verifies all 14 tables exist with schema_version=1', () => { + it('opens an in-memory DB and verifies all 15 tables exist with schema_version=1', () => { const db = tempDB(); const expectedTables = [ 'issues', @@ -20,6 +20,7 @@ describe('TrajectoryDB', () => { 'plugin_config', 'identity', 'regen_state', + 'debug_trajectory', ]; const rows = db.all("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"); const actualNames = rows.map((r) => r.name).sort(); diff --git a/mcp/trajectory-server/dist/test/db.test.js.map b/mcp/trajectory-server/dist/test/db.test.js.map index 7d684ead..f9837d68 100644 --- a/mcp/trajectory-server/dist/test/db.test.js.map +++ b/mcp/trajectory-server/dist/test/db.test.js.map @@ -1 +1 @@ -{"version":3,"file":"db.test.js","sourceRoot":"","sources":["../../src/test/db.test.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,EAAE,EAAE,MAAM,WAAW,CAAC;AACzC,OAAO,MAAM,MAAM,oBAAoB,CAAC;AACxC,OAAO,EAAE,MAAM,EAAE,MAAM,cAAc,CAAC;AACtC,OAAO,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,UAAU,CAAC;AAEzC,QAAQ,CAAC,cAAc,EAAE,GAAG,EAAE;IAC5B,EAAE,CAAC,8EAA8E,EAAE,GAAG,EAAE;QACtF,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,cAAc,GAAG;YACrB,QAAQ;YACR,OAAO;YACP,QAAQ;YACR,OAAO;YACP,qBAAqB;YACrB,QAAQ;YACR,aAAa;YACb,kBAAkB;YAClB,aAAa;YACb,aAAa;YACb,eAAe;YACf,eAAe;YACf,UAAU;YACV,aAAa;SACd,CAAC;QAEF,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,8FAA8F,CAC/F,CAAC;QACF,MAAM,WAAW,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC;QACnD,MAAM,cAAc,GAAG,CAAC,GAAG,cAAc,CAAC,CAAC,IAAI,EAAE,CAAC;QAElD,MAAM,CAAC,SAAS,CAAC,WAAW,EAAE,cAAc,CAAC,CAAC;QAE9C,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,gDAAgD,CACjD,CAAC;QACF,MAAM,CAAC,EAAE,CAAC,IAAI,KAAK,SAAS,EAAE,+BAA+B,CAAC,CAAC;QAC/D,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,cAAc,EAAE,CAAC,CAAC,CAAC;QAErC,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,0EAA0E,EAAE,GAAG,EAAE;QAClF,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,MAAM,GAAG,GAAG,MAAM,EAAE,CAAC;QAErB,EAAE,CAAC,GAAG,CACJ;8BACwB,EACxB,CAAC,SAAS,EAAE,SAAS,EAAE,YAAY,EAAE,GAAG,EAAE,GAAG,CAAC,CAC/C,CAAC;QACF,EAAE,CAAC,GAAG,CACJ;8BACwB,EACxB,CAAC,SAAS,EAAE,SAAS,EAAE,YAAY,EAAE,GAAG,EAAE,GAAG,CAAC,CAC/C,CAAC;QAEF,MAAM,MAAM,GAAG,EAAE,CAAC,GAAG,CACnB,qDAAqD,EACrD,CAAC,SAAS,CAAC,CACZ,CAAC;QACF,MAAM,CAAC,EAAE,CAAC,MAAM,KAAK,SAAS,CAAC,CAAC;QAChC,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QACrC,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,WAAW,EAAE,SAAS,CAAC,CAAC;QAE5C,MAAM,GAAG,GAAG,EAAE,CAAC,GAAG,CAChB,uCAAuC,CACxC,CAAC;QACF,MAAM,CAAC,KAAK,CAAC,GAAG,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAC5B,MAAM,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QACrC,MAAM,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QAErC,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,wCAAwC,EAAE,GAAG,EAAE;QAChD,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,MAAM,GAAG,GAAG,MAAM,EAAE,CAAC;QAErB,MAAM,CAAC,MAAM,CAAC,GAAG,EAAE;YACjB,EAAE,CAAC,WAAW,CAAC,GAAG,EAAE;gBAClB,EAAE,CAAC,GAAG,CACJ;kCACwB,EACxB,CAAC,gBAAgB,EAAE,oBAAoB,EAAE,YAAY,EAAE,GAAG,EAAE,GAAG,CAAC,CACjE,CAAC;gBACF,MAAM,IAAI,KAAK,CAAC,iBAAiB,CAAC,CAAC;YACrC,CAAC,CAAC,CAAC;QACL,CAAC,EAAE,iBAAiB,CAAC,CAAC;QAEtB,MAAM,GAAG,GAAG,EAAE,CAAC,GAAG,CAChB,wCAAwC,EACxC,CAAC,gBAAgB,CAAC,CACnB,CAAC;QACF,MAAM,CAAC,KAAK,CAAC,GAAG,EAAE,SAAS,EAAE,qCAAqC,CAAC,CAAC;QAEpE,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,iDAAiD,EAAE,GAAG,EAAE;QACzD,MAAM,GAAG,GAAG,MAAM,EAAE,CAAC;QACrB,MAAM,CAAC,KAAK,CAAC,GAAG,EAAE,6CAA6C,CAAC,CAAC;IACnE,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gFAAgF,EAAE,GAAG,EAAE;QACxF,MAAM,GAAG,GAAG,KAAK,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,GAAG,EAAE,EAAE,GAAG,EAAE,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,CAAC;QAC5D,KAAK,MAAM,EAAE,IAAI,GAAG,EAAE,CAAC;YACrB,MAAM,CAAC,EAAE,CAAC,EAAE,CAAC,UAAU,CAAC,MAAM,CAAC,EAAE,qCAAqC,EAAE,EAAE,CAAC,CAAC;QAC9E,CAAC;QACD,MAAM,MAAM,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC;QAC5B,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,EAAE,GAAG,EAAE,4BAA4B,CAAC,CAAC;IAC/D,CAAC,CAAC,CAAC;AACL,CAAC,CAAC,CAAC"} \ No newline at end of file +{"version":3,"file":"db.test.js","sourceRoot":"","sources":["../../src/test/db.test.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,EAAE,EAAE,MAAM,WAAW,CAAC;AACzC,OAAO,MAAM,MAAM,oBAAoB,CAAC;AACxC,OAAO,EAAE,MAAM,EAAE,MAAM,cAAc,CAAC;AACtC,OAAO,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,UAAU,CAAC;AAEzC,QAAQ,CAAC,cAAc,EAAE,GAAG,EAAE;IAC5B,EAAE,CAAC,8EAA8E,EAAE,GAAG,EAAE;QACtF,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,cAAc,GAAG;YACrB,QAAQ;YACR,OAAO;YACP,QAAQ;YACR,OAAO;YACP,qBAAqB;YACrB,QAAQ;YACR,aAAa;YACb,kBAAkB;YAClB,aAAa;YACb,aAAa;YACb,eAAe;YACf,eAAe;YACf,UAAU;YACV,aAAa;YACb,kBAAkB;SACnB,CAAC;QAEF,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,8FAA8F,CAC/F,CAAC;QACF,MAAM,WAAW,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC;QACnD,MAAM,cAAc,GAAG,CAAC,GAAG,cAAc,CAAC,CAAC,IAAI,EAAE,CAAC;QAElD,MAAM,CAAC,SAAS,CAAC,WAAW,EAAE,cAAc,CAAC,CAAC;QAE9C,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,gDAAgD,CACjD,CAAC;QACF,MAAM,CAAC,EAAE,CAAC,IAAI,KAAK,SAAS,EAAE,+BAA+B,CAAC,CAAC;QAC/D,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,cAAc,EAAE,CAAC,CAAC,CAAC;QAErC,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,0EAA0E,EAAE,GAAG,EAAE;QAClF,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,MAAM,GAAG,GAAG,MAAM,EAAE,CAAC;QAErB,EAAE,CAAC,GAAG,CACJ;8BACwB,EACxB,CAAC,SAAS,EAAE,SAAS,EAAE,YAAY,EAAE,GAAG,EAAE,GAAG,CAAC,CAC/C,CAAC;QACF,EAAE,CAAC,GAAG,CACJ;8BACwB,EACxB,CAAC,SAAS,EAAE,SAAS,EAAE,YAAY,EAAE,GAAG,EAAE,GAAG,CAAC,CAC/C,CAAC;QAEF,MAAM,MAAM,GAAG,EAAE,CAAC,GAAG,CACnB,qDAAqD,EACrD,CAAC,SAAS,CAAC,CACZ,CAAC;QACF,MAAM,CAAC,EAAE,CAAC,MAAM,KAAK,SAAS,CAAC,CAAC;QAChC,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QACrC,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,WAAW,EAAE,SAAS,CAAC,CAAC;QAE5C,MAAM,GAAG,GAAG,EAAE,CAAC,GAAG,CAChB,uCAAuC,CACxC,CAAC;QACF,MAAM,CAAC,KAAK,CAAC,GAAG,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAC5B,MAAM,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QACrC,MAAM,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QAErC,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,wCAAwC,EAAE,GAAG,EAAE;QAChD,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,MAAM,GAAG,GAAG,MAAM,EAAE,CAAC;QAErB,MAAM,CAAC,MAAM,CAAC,GAAG,EAAE;YACjB,EAAE,CAAC,WAAW,CAAC,GAAG,EAAE;gBAClB,EAAE,CAAC,GAAG,CACJ;kCACwB,EACxB,CAAC,gBAAgB,EAAE,oBAAoB,EAAE,YAAY,EAAE,GAAG,EAAE,GAAG,CAAC,CACjE,CAAC;gBACF,MAAM,IAAI,KAAK,CAAC,iBAAiB,CAAC,CAAC;YACrC,CAAC,CAAC,CAAC;QACL,CAAC,EAAE,iBAAiB,CAAC,CAAC;QAEtB,MAAM,GAAG,GAAG,EAAE,CAAC,GAAG,CAChB,wCAAwC,EACxC,CAAC,gBAAgB,CAAC,CACnB,CAAC;QACF,MAAM,CAAC,KAAK,CAAC,GAAG,EAAE,SAAS,EAAE,qCAAqC,CAAC,CAAC;QAEpE,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,iDAAiD,EAAE,GAAG,EAAE;QACzD,MAAM,GAAG,GAAG,MAAM,EAAE,CAAC;QACrB,MAAM,CAAC,KAAK,CAAC,GAAG,EAAE,6CAA6C,CAAC,CAAC;IACnE,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gFAAgF,EAAE,GAAG,EAAE;QACxF,MAAM,GAAG,GAAG,KAAK,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,GAAG,EAAE,EAAE,GAAG,EAAE,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,CAAC;QAC5D,KAAK,MAAM,EAAE,IAAI,GAAG,EAAE,CAAC;YACrB,MAAM,CAAC,EAAE,CAAC,EAAE,CAAC,UAAU,CAAC,MAAM,CAAC,EAAE,qCAAqC,EAAE,EAAE,CAAC,CAAC;QAC9E,CAAC;QACD,MAAM,MAAM,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC;QAC5B,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,EAAE,GAAG,EAAE,4BAA4B,CAAC,CAAC;IAC/D,CAAC,CAAC,CAAC;AACL,CAAC,CAAC,CAAC"} \ No newline at end of file diff --git a/mcp/trajectory-server/dist/test/schema.test.js b/mcp/trajectory-server/dist/test/schema.test.js index 1e48c608..6d93edf7 100644 --- a/mcp/trajectory-server/dist/test/schema.test.js +++ b/mcp/trajectory-server/dist/test/schema.test.js @@ -2,7 +2,7 @@ import { describe, it } from 'node:test'; import assert from 'node:assert/strict'; import { tempDB } from './helpers.js'; describe('schema — current table set, default values, constraints', () => { - it('fresh DB contains all 14 tables', () => { + it('fresh DB contains all 15 tables', () => { const db = tempDB(); const expectedTables = [ 'issues', @@ -19,6 +19,7 @@ describe('schema — current table set, default values, constraints', () => { 'plugin_config', 'identity', 'regen_state', + 'debug_trajectory', ]; const rows = db.all("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"); const actualNames = rows.map((r) => r.name).sort(); @@ -78,6 +79,33 @@ describe('schema — current table set, default values, constraints', () => { assert.equal(rows.length, 0); db.close(); }); + it('debug_trajectory has zero rows on init (issue #108)', () => { + const db = tempDB(); + const rows = db.all('SELECT * FROM debug_trajectory'); + assert.equal(rows.length, 0); + db.close(); + }); + it('debug_trajectory has expected columns + index (issue #108)', () => { + const db = tempDB(); + const cols = db.all('PRAGMA table_info(debug_trajectory)'); + const colNames = cols.map((c) => c.name).sort(); + assert.deepEqual(colNames, [ + 'agent', + 'args_json', + 'created_at', + 'id', + 'is_error', + 'kind', + 'result_json', + 'session_id', + 'step_n', + 'tool_or_mcp_name', + ]); + const indexes = db.all("SELECT name FROM sqlite_master WHERE type='index' AND tbl_name='debug_trajectory'"); + const indexNames = indexes.map((i) => i.name); + assert.ok(indexNames.includes('idx_debug_trajectory_session'), 'session-step index must exist for L6 reads'); + db.close(); + }); it('identity CHECK constraint rejects a second row with id != 1', () => { const db = tempDB(); const now = new Date().toISOString(); diff --git a/mcp/trajectory-server/dist/test/schema.test.js.map b/mcp/trajectory-server/dist/test/schema.test.js.map index 1da469a7..c314ea12 100644 --- a/mcp/trajectory-server/dist/test/schema.test.js.map +++ b/mcp/trajectory-server/dist/test/schema.test.js.map @@ -1 +1 @@ -{"version":3,"file":"schema.test.js","sourceRoot":"","sources":["../../src/test/schema.test.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,EAAE,EAAE,MAAM,WAAW,CAAC;AACzC,OAAO,MAAM,MAAM,oBAAoB,CAAC;AACxC,OAAO,EAAE,MAAM,EAAE,MAAM,cAAc,CAAC;AAEtC,QAAQ,CAAC,yDAAyD,EAAE,GAAG,EAAE;IACvE,EAAE,CAAC,iCAAiC,EAAE,GAAG,EAAE;QACzC,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,cAAc,GAAG;YACrB,QAAQ;YACR,OAAO;YACP,QAAQ;YACR,OAAO;YACP,qBAAqB;YACrB,QAAQ;YACR,aAAa;YACb,kBAAkB;YAClB,aAAa;YACb,aAAa;YACb,eAAe;YACf,eAAe;YACf,UAAU;YACV,aAAa;SACd,CAAC;QAEF,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,8FAA8F,CAC/F,CAAC;QACF,MAAM,WAAW,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC;QACnD,MAAM,CAAC,SAAS,CAAC,WAAW,EAAE,CAAC,GAAG,cAAc,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC;QAE1D,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gDAAgD,EAAE,GAAG,EAAE;QACxD,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,gDAAgD,CACjD,CAAC;QACF,MAAM,CAAC,EAAE,CAAC,IAAI,KAAK,SAAS,EAAE,kCAAkC,CAAC,CAAC;QAClE,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,cAAc,EAAE,CAAC,CAAC,CAAC;QAErC,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,4DAA4D,EAAE,GAAG,EAAE;QACpE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAA8C,0BAA0B,CAAC,CAAC;QAC7F,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,WAAW,CAAC,CAAC;QAC1D,MAAM,CAAC,EAAE,CAAC,QAAQ,KAAK,SAAS,EAAE,sCAAsC,CAAC,CAAC;QAC1E,MAAM,CAAC,KAAK,CAAC,QAAQ,CAAC,UAAU,EAAE,IAAI,EAAE,wCAAwC,CAAC,CAAC;QAElF,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,6DAA6D,EAAE,GAAG,EAAE;QACrE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,wCAAwC,CACzC,CAAC;QACF,MAAM,MAAM,GAAG,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,SAAS,CAAC,CAAC;QACtD,MAAM,CAAC,EAAE,CAAC,MAAM,KAAK,SAAS,EAAE,2BAA2B,CAAC,CAAC;QAC7D,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,CAAC,WAAW,EAAE,EAAE,SAAS,EAAE,yBAAyB,CAAC,CAAC;QAC9E,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,OAAO,EAAE,CAAC,EAAE,0BAA0B,CAAC,CAAC;QAE5D,MAAM,GAAG,GAAG,EAAE,CAAC,GAAG,CAChB,8CAA8C,CAC/C,CAAC;QACF,MAAM,EAAE,GAAG,GAAG,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,SAAS,CAAC,CAAC;QACjD,MAAM,CAAC,EAAE,CAAC,EAAE,KAAK,SAAS,EAAE,iCAAiC,CAAC,CAAC;QAC/D,MAAM,CAAC,KAAK,CAAC,EAAE,CAAC,KAAK,EAAE,OAAO,CAAC,CAAC;QAChC,MAAM,CAAC,KAAK,CAAC,EAAE,CAAC,EAAE,EAAE,IAAI,CAAC,CAAC;QAE1B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gCAAgC,EAAE,GAAG,EAAE;QACxC,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,wBAAwB,CAAC,CAAC;QAC9C,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,qCAAqC,EAAE,GAAG,EAAE;QAC7C,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,6BAA6B,CAAC,CAAC;QACnD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,mCAAmC,EAAE,GAAG,EAAE;QAC3C,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,2BAA2B,CAAC,CAAC;QACjD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,qCAAqC,EAAE,GAAG,EAAE;QAC7C,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,6BAA6B,CAAC,CAAC;QACnD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,6DAA6D,EAAE,GAAG,EAAE;QACrE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,MAAM,GAAG,GAAG,IAAI,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;QAErC,EAAE,CAAC,GAAG,CACJ,yFAAyF,EACzF,CAAC,GAAG,EAAE,GAAG,CAAC,CACX,CAAC;QAEF,MAAM,CAAC,MAAM,CACX,GAAG,EAAE;YACH,EAAE,CAAC,GAAG,CACJ,uFAAuF,EACvF,CAAC,GAAG,EAAE,GAAG,CAAC,CACX,CAAC;QACJ,CAAC,EACD,yBAAyB,CAC1B,CAAC;QAEF,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;AACL,CAAC,CAAC,CAAC"} \ No newline at end of file +{"version":3,"file":"schema.test.js","sourceRoot":"","sources":["../../src/test/schema.test.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,EAAE,EAAE,MAAM,WAAW,CAAC;AACzC,OAAO,MAAM,MAAM,oBAAoB,CAAC;AACxC,OAAO,EAAE,MAAM,EAAE,MAAM,cAAc,CAAC;AAEtC,QAAQ,CAAC,yDAAyD,EAAE,GAAG,EAAE;IACvE,EAAE,CAAC,iCAAiC,EAAE,GAAG,EAAE;QACzC,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,cAAc,GAAG;YACrB,QAAQ;YACR,OAAO;YACP,QAAQ;YACR,OAAO;YACP,qBAAqB;YACrB,QAAQ;YACR,aAAa;YACb,kBAAkB;YAClB,aAAa;YACb,aAAa;YACb,eAAe;YACf,eAAe;YACf,UAAU;YACV,aAAa;YACb,kBAAkB;SACnB,CAAC;QAEF,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,8FAA8F,CAC/F,CAAC;QACF,MAAM,WAAW,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC;QACnD,MAAM,CAAC,SAAS,CAAC,WAAW,EAAE,CAAC,GAAG,cAAc,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC;QAE1D,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gDAAgD,EAAE,GAAG,EAAE;QACxD,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,gDAAgD,CACjD,CAAC;QACF,MAAM,CAAC,EAAE,CAAC,IAAI,KAAK,SAAS,EAAE,kCAAkC,CAAC,CAAC;QAClE,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,cAAc,EAAE,CAAC,CAAC,CAAC;QAErC,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,4DAA4D,EAAE,GAAG,EAAE;QACpE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAA8C,0BAA0B,CAAC,CAAC;QAC7F,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,WAAW,CAAC,CAAC;QAC1D,MAAM,CAAC,EAAE,CAAC,QAAQ,KAAK,SAAS,EAAE,sCAAsC,CAAC,CAAC;QAC1E,MAAM,CAAC,KAAK,CAAC,QAAQ,CAAC,UAAU,EAAE,IAAI,EAAE,wCAAwC,CAAC,CAAC;QAElF,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,6DAA6D,EAAE,GAAG,EAAE;QACrE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CACjB,wCAAwC,CACzC,CAAC;QACF,MAAM,MAAM,GAAG,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,SAAS,CAAC,CAAC;QACtD,MAAM,CAAC,EAAE,CAAC,MAAM,KAAK,SAAS,EAAE,2BAA2B,CAAC,CAAC;QAC7D,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,CAAC,WAAW,EAAE,EAAE,SAAS,EAAE,yBAAyB,CAAC,CAAC;QAC9E,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,OAAO,EAAE,CAAC,EAAE,0BAA0B,CAAC,CAAC;QAE5D,MAAM,GAAG,GAAG,EAAE,CAAC,GAAG,CAChB,8CAA8C,CAC/C,CAAC;QACF,MAAM,EAAE,GAAG,GAAG,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,KAAK,SAAS,CAAC,CAAC;QACjD,MAAM,CAAC,EAAE,CAAC,EAAE,KAAK,SAAS,EAAE,iCAAiC,CAAC,CAAC;QAC/D,MAAM,CAAC,KAAK,CAAC,EAAE,CAAC,KAAK,EAAE,OAAO,CAAC,CAAC;QAChC,MAAM,CAAC,KAAK,CAAC,EAAE,CAAC,EAAE,EAAE,IAAI,CAAC,CAAC;QAE1B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gCAAgC,EAAE,GAAG,EAAE;QACxC,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,wBAAwB,CAAC,CAAC;QAC9C,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,qCAAqC,EAAE,GAAG,EAAE;QAC7C,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,6BAA6B,CAAC,CAAC;QACnD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,mCAAmC,EAAE,GAAG,EAAE;QAC3C,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,2BAA2B,CAAC,CAAC;QACjD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,qCAAqC,EAAE,GAAG,EAAE;QAC7C,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,6BAA6B,CAAC,CAAC;QACnD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,qDAAqD,EAAE,GAAG,EAAE;QAC7D,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAC,gCAAgC,CAAC,CAAC;QACtD,MAAM,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC,CAAC;QAE7B,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,4DAA4D,EAAE,GAAG,EAAE;QACpE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QAEpB,MAAM,IAAI,GAAG,EAAE,CAAC,GAAG,CAAmB,qCAAqC,CAAC,CAAC;QAC7E,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,IAAI,EAAE,CAAC;QAChD,MAAM,CAAC,SAAS,CAAC,QAAQ,EAAE;YACzB,OAAO;YACP,WAAW;YACX,YAAY;YACZ,IAAI;YACJ,UAAU;YACV,MAAM;YACN,aAAa;YACb,YAAY;YACZ,QAAQ;YACR,kBAAkB;SACnB,CAAC,CAAC;QAEH,MAAM,OAAO,GAAG,EAAE,CAAC,GAAG,CACpB,mFAAmF,CACpF,CAAC;QACF,MAAM,UAAU,GAAG,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;QAC9C,MAAM,CAAC,EAAE,CACP,UAAU,CAAC,QAAQ,CAAC,8BAA8B,CAAC,EACnD,4CAA4C,CAC7C,CAAC;QAEF,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,6DAA6D,EAAE,GAAG,EAAE;QACrE,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,MAAM,GAAG,GAAG,IAAI,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;QAErC,EAAE,CAAC,GAAG,CACJ,yFAAyF,EACzF,CAAC,GAAG,EAAE,GAAG,CAAC,CACX,CAAC;QAEF,MAAM,CAAC,MAAM,CACX,GAAG,EAAE;YACH,EAAE,CAAC,GAAG,CACJ,uFAAuF,EACvF,CAAC,GAAG,EAAE,GAAG,CAAC,CACX,CAAC;QACJ,CAAC,EACD,yBAAyB,CAC1B,CAAC;QAEF,EAAE,CAAC,KAAK,EAAE,CAAC;IACb,CAAC,CAAC,CAAC;AACL,CAAC,CAAC,CAAC"} \ No newline at end of file diff --git a/mcp/trajectory-server/src/index.ts b/mcp/trajectory-server/src/index.ts index fece8ffc..773bef0a 100644 --- a/mcp/trajectory-server/src/index.ts +++ b/mcp/trajectory-server/src/index.ts @@ -27,13 +27,53 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: toolDefinitions, })); +// L6 trajectory capture (issue #108). Active only when TMB_DEBUG_TRAJECTORY=1. +// Session ID is per-server-spawn — covers a single `claude -p` invocation. +const debugTrajectoryEnabled = process.env['TMB_DEBUG_TRAJECTORY'] === '1'; +const debugSessionId = `${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +let debugStepCounter = 0; + +function maybeRecordTrajectory( + toolName: string, + args: unknown, + result: { content?: ReadonlyArray; isError?: boolean }, +): void { + if (!debugTrajectoryEnabled) return; + try { + const agentName = (args as { agent?: string } | undefined)?.agent ?? null; + const argsJson = JSON.stringify(args ?? {}).slice(0, 4000); + const firstContent = result.content?.[0] as { text?: unknown } | undefined; + const resultText = + firstContent && typeof firstContent.text === 'string' ? firstContent.text : ''; + const resultJson = JSON.stringify({ text: resultText.slice(0, 4000) }); + db.run( + `INSERT INTO debug_trajectory + (session_id, step_n, kind, agent, tool_or_mcp_name, args_json, result_json, is_error, created_at) + VALUES (?, ?, 'mcp_call', ?, ?, ?, ?, ?, datetime('now'))`, + [ + debugSessionId, + ++debugStepCounter, + agentName, + toolName, + argsJson, + resultJson, + result.isError ? 1 : 0, + ], + ); + } catch { + // Trajectory capture must never break the actual tool call. + } +} + server.setRequestHandler(CallToolRequestSchema, async (request) => { const { name, arguments: args } = request.params; const handler = toolHandlers[name]; if (!handler) { throw new Error(`Unknown tool: ${name}`); } - return handler(args ?? {}); + const result = await handler(args ?? {}); + maybeRecordTrajectory(name, args, result); + return result; }); process.on('SIGINT', () => { diff --git a/mcp/trajectory-server/src/schema.sql b/mcp/trajectory-server/src/schema.sql index b092a371..86c08aa3 100644 --- a/mcp/trajectory-server/src/schema.sql +++ b/mcp/trajectory-server/src/schema.sql @@ -164,3 +164,24 @@ CREATE TABLE IF NOT EXISTS regen_state ( last_seen_sha TEXT, notes TEXT NOT NULL DEFAULT '' ); + +-- L6 deterministic-trajectory test infrastructure (issue #108). +-- Populated ONLY when env TMB_DEBUG_TRAJECTORY=1. Off by default — zero +-- overhead in production. The L6 test runner pre-seeds DB state, runs +-- claude -p with the env set, then asserts the resulting trajectory +-- matches an expected sequence from FLOWS.md. +CREATE TABLE IF NOT EXISTS debug_trajectory ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + session_id TEXT NOT NULL, + step_n INTEGER NOT NULL, + kind TEXT NOT NULL, -- 'mcp_call' | 'tool_use' | 'agent_thinking' + agent TEXT, -- 'bro' | 'swe' | 'pr-reviewer' | NULL + tool_or_mcp_name TEXT NOT NULL, -- e.g. 'mcp__plugin_tmb_trajectory-server__identity_get' or 'Bash' + args_json TEXT NOT NULL DEFAULT '{}', + result_json TEXT NOT NULL DEFAULT '{}', + is_error INTEGER NOT NULL DEFAULT 0, + created_at TEXT NOT NULL DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_debug_trajectory_session + ON debug_trajectory(session_id, step_n); diff --git a/mcp/trajectory-server/src/test/db.test.ts b/mcp/trajectory-server/src/test/db.test.ts index 85ddc3e0..fc92c38d 100644 --- a/mcp/trajectory-server/src/test/db.test.ts +++ b/mcp/trajectory-server/src/test/db.test.ts @@ -4,7 +4,7 @@ import { tempDB } from './helpers.js'; import { nowISO, genId } from '../db.js'; describe('TrajectoryDB', () => { - it('opens an in-memory DB and verifies all 14 tables exist with schema_version=1', () => { + it('opens an in-memory DB and verifies all 15 tables exist with schema_version=1', () => { const db = tempDB(); const expectedTables = [ @@ -22,6 +22,7 @@ describe('TrajectoryDB', () => { 'plugin_config', 'identity', 'regen_state', + 'debug_trajectory', ]; const rows = db.all<{ name: string }>( diff --git a/mcp/trajectory-server/src/test/schema.test.ts b/mcp/trajectory-server/src/test/schema.test.ts index ca00df3e..b925c732 100644 --- a/mcp/trajectory-server/src/test/schema.test.ts +++ b/mcp/trajectory-server/src/test/schema.test.ts @@ -3,7 +3,7 @@ import assert from 'node:assert/strict'; import { tempDB } from './helpers.js'; describe('schema — current table set, default values, constraints', () => { - it('fresh DB contains all 14 tables', () => { + it('fresh DB contains all 15 tables', () => { const db = tempDB(); const expectedTables = [ @@ -21,6 +21,7 @@ describe('schema — current table set, default values, constraints', () => { 'plugin_config', 'identity', 'regen_state', + 'debug_trajectory', ]; const rows = db.all<{ name: string }>( @@ -113,6 +114,45 @@ describe('schema — current table set, default values, constraints', () => { db.close(); }); + it('debug_trajectory has zero rows on init (issue #108)', () => { + const db = tempDB(); + + const rows = db.all('SELECT * FROM debug_trajectory'); + assert.equal(rows.length, 0); + + db.close(); + }); + + it('debug_trajectory has expected columns + index (issue #108)', () => { + const db = tempDB(); + + const cols = db.all<{ name: string }>('PRAGMA table_info(debug_trajectory)'); + const colNames = cols.map((c) => c.name).sort(); + assert.deepEqual(colNames, [ + 'agent', + 'args_json', + 'created_at', + 'id', + 'is_error', + 'kind', + 'result_json', + 'session_id', + 'step_n', + 'tool_or_mcp_name', + ]); + + const indexes = db.all<{ name: string }>( + "SELECT name FROM sqlite_master WHERE type='index' AND tbl_name='debug_trajectory'", + ); + const indexNames = indexes.map((i) => i.name); + assert.ok( + indexNames.includes('idx_debug_trajectory_session'), + 'session-step index must exist for L6 reads', + ); + + db.close(); + }); + it('identity CHECK constraint rejects a second row with id != 1', () => { const db = tempDB(); const now = new Date().toISOString(); diff --git a/scripts/hooks/debug-trajectory.sh b/scripts/hooks/debug-trajectory.sh new file mode 100755 index 00000000..6b37d979 --- /dev/null +++ b/scripts/hooks/debug-trajectory.sh @@ -0,0 +1,62 @@ +#!/usr/bin/env bash +# L6 trajectory capture for non-MCP tool calls (issue #108). +# +# Active only when env TMB_DEBUG_TRAJECTORY=1. Writes one row per tool +# call (Bash/Read/Write/Edit/Task/Skill) to the debug_trajectory table. +# MCP tool calls are captured by the server itself in src/index.ts — +# this hook covers everything else. +# +# Never blocks the tool call. On any error, exits 0 silently — capture +# failures must not break the user's session. +set -uo pipefail + +[ "${TMB_DEBUG_TRAJECTORY:-0}" = "1" ] || exit 0 + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=scripts/hooks/lib/query-task.sh +. "$SCRIPT_DIR/lib/query-task.sh" 2>/dev/null || true + +# Resolve the trajectory DB path the same way the MCP server does. +# Prefer env override, else /.claude//trajectory.db. +DB_PATH="${TRAJECTORY_DB_PATH:-}" +if [ -z "$DB_PATH" ]; then + PLUGIN_NAME="tmb" + if [ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -f "${CLAUDE_PLUGIN_ROOT}/.claude-plugin/plugin.json" ]; then + PLUGIN_NAME=$(jq -r '.name // "tmb"' "${CLAUDE_PLUGIN_ROOT}/.claude-plugin/plugin.json" 2>/dev/null || echo "tmb") + fi + DB_PATH="$PWD/.claude/$PLUGIN_NAME/trajectory.db" +fi + +[ -f "$DB_PATH" ] || exit 0 + +INPUT=$(cat) +TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // .tool // empty' 2>/dev/null) +[ -n "$TOOL_NAME" ] || exit 0 + +# Skip MCP tool calls — the server captures those itself with full result data. +case "$TOOL_NAME" in + mcp__*) exit 0 ;; +esac + +# Truncate args to 4KB. Pass through jq for safe escaping. +ARGS_JSON=$(echo "$INPUT" | jq -c '.tool_input // {}' 2>/dev/null | head -c 4000) +[ -n "$ARGS_JSON" ] || ARGS_JSON='{}' + +# Use a session id that's stable per CC session if CC provides one; +# fall back to the day so all calls in one test run share an id. +SESSION_ID="${CLAUDE_SESSION_ID:-$(date +%Y%m%d-%H)}" + +# Compute next step_n in this session — use COALESCE so first call gets 1. +sqlite3 "$DB_PATH" </dev/null || true +INSERT INTO debug_trajectory (session_id, step_n, kind, tool_or_mcp_name, args_json, created_at) +VALUES ( + '$SESSION_ID', + COALESCE((SELECT MAX(step_n) FROM debug_trajectory WHERE session_id='$SESSION_ID'), 0) + 1, + 'tool_use', + '$TOOL_NAME', + json('$ARGS_JSON'), + datetime('now') +); +SQL + +exit 0 diff --git a/skills/tmb_agent-creator/SKILL.md b/skills/tmb_agent-creator/SKILL.md index 0b082643..e70714e3 100644 --- a/skills/tmb_agent-creator/SKILL.md +++ b/skills/tmb_agent-creator/SKILL.md @@ -36,7 +36,7 @@ User-created agents default to **consultant** scope: they advise, return analysi | `ceo.md` | Product-scope consultant — prioritization, business framing | ~21 | | `pm.md` | Product-strategy consultant — user-need framing, success metrics | ~21 | -(`swe.md` and `pr-reviewer.md` also live as templates but they're handled by `tmb_bootstrap`, not this skill.) +(`swe.md` and `pr-reviewer.md` ship globally in the plugin's `agents/` dir — no template copy needed. This skill handles consultants only.) If the Human's request matches a shipped template name → **template-copy mode**. Otherwise → **from-scratch mode**. diff --git a/skills/tmb_first-run-onboarding/SKILL.md b/skills/tmb_first-run-onboarding/SKILL.md index d992ba5a..868681b7 100644 --- a/skills/tmb_first-run-onboarding/SKILL.md +++ b/skills/tmb_first-run-onboarding/SKILL.md @@ -72,8 +72,7 @@ Onboarding completes ONLY after ALL of the following have succeeded AND a final 2. `config_set(agent='bro', key='branching_model', value=)` — `value` is a string, e.g. `value="github-flow"`. 3. `config_set(agent='bro', key='pr_target', value=)` — `value` is a string, e.g. `value="main"`. 4. `config_set(agent='bro', key='protected_branches', value=)` — `value` is a **raw JSON array**, e.g. `value=["main"]`. Do NOT pass `value="[\"main\"]"` (a pre-serialized string). The MCP server calls `JSON.stringify(value)` on what you pass; if you pre-serialize, the DB stores a string and every downstream hook that expects an array breaks. -5. Read+Write file copies for the executor + swe-side skills (Step 5 below — 1 agent file, 5 skill files). -6. `ledger_log(agent='bro', event_type='tmb_bootstrap_complete', summary='...')` — **non-optional audit-trail row.** Without this, the trajectory loses the "onboarding ran here" anchor; future skills + tests assume it exists. +5. `ledger_log(agent='bro', event_type='tmb_onboarding_complete', summary='...')` — **non-optional audit-trail row.** Without this, the trajectory loses the "onboarding ran here" anchor; future skills + tests assume it exists. (No file copies — `swe`, `pr-reviewer`, and 7 default skills ship globally with the plugin.) **Never narrate a rejection** — only report what the MCP tool actually returned. **Never skip a write** because you think it might fail. If a call errors, retry it. If it keeps erroring, surface the exact error to the Human and ask whether to retry or abort. diff --git a/tests/README.md b/tests/README.md index f3b17237..ffbbd517 100644 --- a/tests/README.md +++ b/tests/README.md @@ -2,15 +2,19 @@ Everything test-related for the plugin — how to run, what each layer covers, when to add a test where, and the full manual-test catalog. -## Three layers +## Layered test pyramid -Each catches a different class of bug; skipping any layer means shipping a bug the others cannot see. +Each layer catches a different class of bug; skipping any layer means shipping a bug the others cannot see. | Layer | What | Where | Catches | |---|---|---|---| -| **1 — Unit** | Handler logic, synthetic args; no LLM, no protocol | `mcp/trajectory-server/src/test/*.test.ts` | Handler bugs, constraint violations, return-shape drift | -| **2 — Integration** | Real server subprocess + JSON-RPC stdio; schema contract, role matrix, per-agent workflow | [`mcp-integration/*.test.mjs`](./mcp-integration/) | Schema drift, missing `agent` param, protocol plumbing, role-enforcement gaps, cross-tool workflow bugs | -| **3 — Dogfood** | Human-driven interactive Claude Code session | [`manual/`](./manual/) | UX regressions, agent prompt drift, routing decisions, anything that depends on real LLM judgment | +| **L0** | Install-smoke (Docker `bun install --ignore-scripts`) | [`docker/install-smoke.Dockerfile`](./docker/) | dist/ shipping, prebuild, MCP server cold-spawn — caught v0.2.0 + v0.3.0 | +| **L1** | Lint (version sync, link check, dist freshness, etc.) | [`lint/*.sh`](./lint/) | Stale CHANGELOG, broken links, version drift, doctrine doc parity | +| **L2** | Unit — handler logic, synthetic args; no LLM, no protocol | `mcp/trajectory-server/src/test/*.test.ts` | Handler bugs, constraint violations, return-shape drift | +| **L3** | Integration — real server subprocess + JSON-RPC stdio | [`mcp-integration/*.test.mjs`](./mcp-integration/), [`hooks/*.sh`](./hooks/) | Schema drift, missing `agent` param, protocol plumbing, role enforcement | +| **L4** | Workflow simulation — MCP-only multi-step flows (no real Claude) | [`workflow-sim/*.test.mjs`](./workflow-sim/) | Workflow contract bugs at the MCP-call level | +| **L5** | Manual dogfood — human-driven interactive Claude Code session | [`manual/`](./manual/) | UX regressions only catchable with a human | +| **L6** | **Deterministic-trajectory dogfood — pre-seeded DB + `claude -p` + assert MCP/tool sequence** (issue #108) | [`dogfood/`](./dogfood/) | Doctrine drift between FLOWS.md and reality, agent-prompt regressions, cold-start behavior | **Golden rule:** *Layer N green does not imply Layer N+1 green.* Layer 1 passed with 235 tests while a critical bug sat in production — the MCP schema stripped the `agent` parameter on every call, collapsing all role checks to `caller_role: 'unknown'`. Layer 2 would have caught that at the wire level in milliseconds. Always run all three before tagging a release. @@ -19,15 +23,23 @@ Each catches a different class of bug; skipping any layer means shipping a bug t ``` tests/ ├── README.md ← (you are here) framework + operational -├── run-all.sh ← orchestrator — runs every automated suite -├── mcp-integration/ ← Layer 2 — real server subprocess + JSON-RPC -├── hooks/ ← hook script tests -├── lint/ ← agent-prompt budget + related linters +├── run-all.sh ← orchestrator — runs L0-L4 +├── docker/ ← L0 install-smoke +├── lint/ ← L1 lints (version sync, links, doctrine docs) +├── mcp-integration/ ← L3 real server subprocess + JSON-RPC +├── hooks/ ← L3 hook script tests +├── workflow-sim/ ← L4 MCP-only multi-step workflow tests ├── lib/ ← shared shell-assert helpers -└── manual/ ← Layer 3 — human-run against a real Claude Code session - ├── README.md - ├── setup.md - └── scenarios.md +├── manual/ ← L5 human-run against a real Claude Code session +│ ├── README.md +│ ├── setup.md +│ └── scenarios.md +└── dogfood/ ← L6 deterministic-trajectory tests (issue #108) + ├── run-l6.sh + ├── lib/flow-helpers.sh + ├── flows/.test.sh + ├── fixtures/.sql + └── expected/.txt ``` Layer 1 (MCP unit tests) lives at `mcp/trajectory-server/src/test/` — colocated with the source it tests, following the convention used elsewhere in that package. @@ -57,10 +69,31 @@ bash tests/hooks/run.sh bash tests/lint/agent-line-budget.sh ``` -## Run the manual suite (Layer 3) +## Run the manual suite (L5) See [`manual/README.md`](./manual/README.md) — setup, scenarios, and what to do when a scenario fails. +## Run L6 dogfood (deterministic-trajectory tests) + +L6 drives real Claude Code through pre-seeded TMB workflows and asserts the MCP/tool sequence matches FLOWS.md. Issue #108. + +```bash +# One-time: set the headless auth token +export CLAUDE_CODE_OAUTH_TOKEN="" + +# Run all flows +bash tests/dogfood/run-l6.sh + +# Run a single flow by name substring +bash tests/dogfood/run-l6.sh onboarding +``` + +Each flow lives in `tests/dogfood/flows/.test.sh`. Expected trajectories are `tests/dogfood/expected/.txt` (one MCP/tool call per line, prefixed `mcp_call:` or `tool_use:`). Pre-seed SQL fixtures live in `tests/dogfood/fixtures/.sql`. + +To add a new flow: copy an existing `flows/*.test.sh`, name a fixture (or write one), capture the expected sequence by running once with `TMB_DEBUG_TRAJECTORY=1` and reading the `debug_trajectory` table. + +CI runs L6 on tag pushes and on PRs labeled `L6`. The workflow at `.github/workflows/l6-dogfood.yml` skips silently if the secret is unset. + ## Which layer does a new test belong in? ``` diff --git a/tests/dogfood/expected/01-onboarding.txt b/tests/dogfood/expected/01-onboarding.txt new file mode 100644 index 00000000..5f288a29 --- /dev/null +++ b/tests/dogfood/expected/01-onboarding.txt @@ -0,0 +1,7 @@ +mcp_call:mcp__plugin_tmb_trajectory-server__identity_get +mcp_call:mcp__plugin_tmb_trajectory-server__config_get +mcp_call:mcp__plugin_tmb_trajectory-server__identity_set +mcp_call:mcp__plugin_tmb_trajectory-server__config_set +mcp_call:mcp__plugin_tmb_trajectory-server__config_set +mcp_call:mcp__plugin_tmb_trajectory-server__config_set +mcp_call:mcp__plugin_tmb_trajectory-server__ledger_log diff --git a/tests/dogfood/expected/02-simple-task.txt b/tests/dogfood/expected/02-simple-task.txt new file mode 100644 index 00000000..03f71305 --- /dev/null +++ b/tests/dogfood/expected/02-simple-task.txt @@ -0,0 +1,8 @@ +mcp_call:mcp__plugin_tmb_trajectory-server__identity_get +mcp_call:mcp__plugin_tmb_trajectory-server__config_get +mcp_call:mcp__plugin_tmb_trajectory-server__issue_resume +mcp_call:mcp__plugin_tmb_trajectory-server__issue_create +mcp_call:mcp__plugin_tmb_trajectory-server__discussion_append +mcp_call:mcp__plugin_tmb_trajectory-server__task_create_batch +tool_use:Task +mcp_call:mcp__plugin_tmb_trajectory-server__ledger_log diff --git a/tests/dogfood/expected/95-anonymous-cold-restart.txt b/tests/dogfood/expected/95-anonymous-cold-restart.txt new file mode 100644 index 00000000..43af9185 --- /dev/null +++ b/tests/dogfood/expected/95-anonymous-cold-restart.txt @@ -0,0 +1,3 @@ +mcp_call:mcp__plugin_tmb_trajectory-server__identity_get +mcp_call:mcp__plugin_tmb_trajectory-server__config_get +mcp_call:mcp__plugin_tmb_trajectory-server__issue_resume diff --git a/tests/dogfood/expected/D-direct-mode.txt b/tests/dogfood/expected/D-direct-mode.txt new file mode 100644 index 00000000..2e74c882 --- /dev/null +++ b/tests/dogfood/expected/D-direct-mode.txt @@ -0,0 +1,6 @@ +mcp_call:mcp__plugin_tmb_trajectory-server__identity_get +mcp_call:mcp__plugin_tmb_trajectory-server__config_get +mcp_call:mcp__plugin_tmb_trajectory-server__issue_resume +tool_use:Edit +tool_use:Bash +mcp_call:mcp__plugin_tmb_trajectory-server__ledger_log diff --git a/tests/dogfood/fixtures/empty.sql b/tests/dogfood/fixtures/empty.sql new file mode 100644 index 00000000..b55cada2 --- /dev/null +++ b/tests/dogfood/fixtures/empty.sql @@ -0,0 +1,2 @@ +-- No-op fixture: schema only, no data. Used for first-run / onboarding flows +-- where the entire point is bro detecting the empty state. diff --git a/tests/dogfood/fixtures/onboarding-anonymous.sql b/tests/dogfood/fixtures/onboarding-anonymous.sql new file mode 100644 index 00000000..c9b4c8eb --- /dev/null +++ b/tests/dogfood/fixtures/onboarding-anonymous.sql @@ -0,0 +1,18 @@ +-- Onboarding completed but the user picked Anonymous. +-- Per #95 fix: identity row exists with human_name=NULL, created_at non-null. +-- The first-action chain should see this and skip re-onboarding. + +INSERT INTO identity (id, human_name, created_at, updated_at) +VALUES (1, NULL, datetime('now'), datetime('now')); + +INSERT INTO plugin_config (key, value_json, updated_at) VALUES + ('branching_model', '"github-flow"', datetime('now')), + ('pr_target', '"main"', datetime('now')), + ('protected_branches', '["main"]', datetime('now')); + +INSERT INTO ledger (issue_id, branch_id, from_node, event_type, summary, created_at) +VALUES ( + 0, NULL, 'bro', 'tmb_onboarding_complete', + 'Test fixture — anonymous identity, github-flow.', + datetime('now') +); diff --git a/tests/dogfood/fixtures/onboarding-named.sql b/tests/dogfood/fixtures/onboarding-named.sql new file mode 100644 index 00000000..70cdb85a --- /dev/null +++ b/tests/dogfood/fixtures/onboarding-named.sql @@ -0,0 +1,18 @@ +-- Onboarding completed with a named identity ("Test User"). +-- Use this fixture to skip past first-run onboarding for any flow that +-- needs a clean post-onboarding state. + +INSERT INTO identity (id, human_name, created_at, updated_at) +VALUES (1, 'Test User', datetime('now'), datetime('now')); + +INSERT INTO plugin_config (key, value_json, updated_at) VALUES + ('branching_model', '"github-flow"', datetime('now')), + ('pr_target', '"main"', datetime('now')), + ('protected_branches', '["main"]', datetime('now')); + +INSERT INTO ledger (issue_id, branch_id, from_node, event_type, summary, created_at) +VALUES ( + 0, NULL, 'bro', 'tmb_onboarding_complete', + 'Test fixture — identity Test User, github-flow, main protected.', + datetime('now') +); diff --git a/tests/dogfood/flows/01-onboarding.test.sh b/tests/dogfood/flows/01-onboarding.test.sh new file mode 100755 index 00000000..796ffc3e --- /dev/null +++ b/tests/dogfood/flows/01-onboarding.test.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash +# L6 flow 01 — First-Run Onboarding (FLOWS.md §1) +# +# Pre-state: empty DB (no identity, no config). +# Trigger: @bro hi +# Expected: identity_get + config_get probes return null → bro invokes +# tmb_first-run-onboarding skill → AskUserQuestion → identity_set + +# 3x config_set + ledger_log(tmb_onboarding_complete). +# +# NOTE: AskUserQuestion in `claude -p` mode behavior is unverified. If +# the form auto-fails or returns empty in headless mode, this flow's +# trajectory will be SHORTER than expected. That's a real signal — file +# as a follow-up issue. + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "empty" + +l6_run_claude "$PROJECT" "@bro hi" >/dev/null + +l6_assert_trajectory "$PROJECT" "$L6_DOGFOOD_DIR/expected/01-onboarding.txt" diff --git a/tests/dogfood/flows/02-simple-task.test.sh b/tests/dogfood/flows/02-simple-task.test.sh new file mode 100755 index 00000000..35faf86e --- /dev/null +++ b/tests/dogfood/flows/02-simple-task.test.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +# L6 flow 02 — Simple Task (FLOWS.md §2) +# +# Pre-state: onboarding complete (named identity). +# Trigger: @bro write a python cli todo +# Expected: bro detects code-touching ask → triages simple → creates +# issue + task → spawns SWE → batches ledger_log(planning_complete). +# Spec body assertion is out of scope here (covered by L4). + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro write a python cli todo" >/dev/null + +l6_assert_trajectory "$PROJECT" "$L6_DOGFOOD_DIR/expected/02-simple-task.txt" diff --git a/tests/dogfood/flows/03-difficult-task.test.sh b/tests/dogfood/flows/03-difficult-task.test.sh new file mode 100755 index 00000000..c64f33f3 --- /dev/null +++ b/tests/dogfood/flows/03-difficult-task.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 03-difficult-task — FLOWS.md §3 — Difficult Task (architecture-touching) +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro design a new auth subsystem with OAuth + session token storage + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/03-difficult-task.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 03-difficult-task expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro design a new auth subsystem with OAuth + session token storage" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/04-agent-creator.test.sh b/tests/dogfood/flows/04-agent-creator.test.sh new file mode 100755 index 00000000..a4fedbd2 --- /dev/null +++ b/tests/dogfood/flows/04-agent-creator.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 04-agent-creator — FLOWS.md §4 — On-demand consultant agent creation +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro get the cto's read on whether to use OAuth + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/04-agent-creator.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 04-agent-creator expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro get the cto's read on whether to use OAuth" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/05-skill-creation.test.sh b/tests/dogfood/flows/05-skill-creation.test.sh new file mode 100755 index 00000000..3e65eec0 --- /dev/null +++ b/tests/dogfood/flows/05-skill-creation.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 05-skill-creation — FLOWS.md §5 — On-demand skill creation +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro create a skill for FastAPI conventions + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/05-skill-creation.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 05-skill-creation expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro create a skill for FastAPI conventions" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/06-push-gate.test.sh b/tests/dogfood/flows/06-push-gate.test.sh new file mode 100755 index 00000000..9b1b7682 --- /dev/null +++ b/tests/dogfood/flows/06-push-gate.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 06-push-gate — FLOWS.md §6 — Push gate / PR review +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro review before push + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/06-push-gate.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 06-push-gate expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro review before push" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/07-architecture-regen.test.sh b/tests/dogfood/flows/07-architecture-regen.test.sh new file mode 100755 index 00000000..cc46bb46 --- /dev/null +++ b/tests/dogfood/flows/07-architecture-regen.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 07-architecture-regen — FLOWS.md §7 — Architecture regen +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro refresh architecture docs + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/07-architecture-regen.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 07-architecture-regen expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro refresh architecture docs" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/08-swe-retry.test.sh b/tests/dogfood/flows/08-swe-retry.test.sh new file mode 100755 index 00000000..24f96504 --- /dev/null +++ b/tests/dogfood/flows/08-swe-retry.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 08-swe-retry — FLOWS.md §8 — SWE retry / escalation +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro the swe failed, retry with feedback + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/08-swe-retry.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 08-swe-retry expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro the swe failed, retry with feedback" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/09-roundtable.test.sh b/tests/dogfood/flows/09-roundtable.test.sh new file mode 100755 index 00000000..1efc47d9 --- /dev/null +++ b/tests/dogfood/flows/09-roundtable.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 09-roundtable — FLOWS.md §9 — Roundtable multi-agent deliberation +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro convene cto and architect on the OAuth question + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/09-roundtable.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 09-roundtable expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro convene cto and architect on the OAuth question" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/32-team-config.test.sh b/tests/dogfood/flows/32-team-config.test.sh new file mode 100755 index 00000000..3503ff4d --- /dev/null +++ b/tests/dogfood/flows/32-team-config.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 32-team-config — Issue #32 — Onboarding pre-selects from .claude/tmb/config.json +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: empty +# Trigger: @bro hi + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/32-team-config.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 32-team-config expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "empty" + +l6_run_claude "$PROJECT" "@bro hi" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/92-base-branch.test.sh b/tests/dogfood/flows/92-base-branch.test.sh new file mode 100755 index 00000000..deef13d8 --- /dev/null +++ b/tests/dogfood/flows/92-base-branch.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 92-base-branch — Issue #92 — base-branch confirm with remote +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro write a python cli todo + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/92-base-branch.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 92-base-branch expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro write a python cli todo" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/94-arch-bootstrap.test.sh b/tests/dogfood/flows/94-arch-bootstrap.test.sh new file mode 100755 index 00000000..3c2ec6a6 --- /dev/null +++ b/tests/dogfood/flows/94-arch-bootstrap.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 94-arch-bootstrap — Issue #94 — auto-bootstrap docs/ on small projects +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro write a python cli todo + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/94-arch-bootstrap.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 94-arch-bootstrap expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro write a python cli todo" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/95-anonymous-cold-restart.test.sh b/tests/dogfood/flows/95-anonymous-cold-restart.test.sh new file mode 100755 index 00000000..03d7f513 --- /dev/null +++ b/tests/dogfood/flows/95-anonymous-cold-restart.test.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +# L6 flow #95 — Anonymous cold-restart regression +# +# Pre-state: identity row exists with human_name=NULL (Anonymous), config done. +# Trigger: @bro hi (cold session) +# Expected: identity_get returns row with non-null created_at → bro skips +# onboarding → calls issue_resume → greets the Human in plain +# second-person. No identity_set, no config_set, no AskUserQuestion. + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-anonymous" + +l6_run_claude "$PROJECT" "@bro hi" >/dev/null + +l6_assert_trajectory "$PROJECT" "$L6_DOGFOOD_DIR/expected/95-anonymous-cold-restart.txt" + +# Critical: NO re-onboarding writes should occur. +ROWS_IDENTITY_SET=$(sqlite3 "$PROJECT/.claude/tmb/trajectory.db" \ + "SELECT COUNT(*) FROM debug_trajectory WHERE tool_or_mcp_name LIKE '%identity_set%'") +[ "$ROWS_IDENTITY_SET" = "0" ] || { echo " ✗ #95 regression: bro called identity_set on cold restart (got $ROWS_IDENTITY_SET)" >&2; exit 1; } + +ROWS_CONFIG_SET=$(sqlite3 "$PROJECT/.claude/tmb/trajectory.db" \ + "SELECT COUNT(*) FROM debug_trajectory WHERE tool_or_mcp_name LIKE '%config_set%'") +[ "$ROWS_CONFIG_SET" = "0" ] || { echo " ✗ #95 regression: bro called config_set on cold restart (got $ROWS_CONFIG_SET)" >&2; exit 1; } + +echo " ✓ #95 regression locked: cold restart with Anonymous skips re-onboarding" diff --git a/tests/dogfood/flows/96-halt-on-error.test.sh b/tests/dogfood/flows/96-halt-on-error.test.sh new file mode 100755 index 00000000..67d8b6eb --- /dev/null +++ b/tests/dogfood/flows/96-halt-on-error.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow 96-halt-on-error — Issue #96 — bro halts on MCP forbidden errors +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro mark task 1 as validated + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/96-halt-on-error.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: 96-halt-on-error expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro mark task 1 as validated" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/C-consultant.test.sh b/tests/dogfood/flows/C-consultant.test.sh new file mode 100755 index 00000000..cd9ddc4e --- /dev/null +++ b/tests/dogfood/flows/C-consultant.test.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# L6 flow C-consultant — FLOWS.md §C — Consultant invocation +# +# SCAFFOLD — fill in the expected-trajectory file before enabling. +# Until then this test is a no-op skip. +# +# Pre-state: onboarding-named +# Trigger: @bro get pm's view on the new feature + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +EXPECTED="$L6_DOGFOOD_DIR/expected/C-consultant.txt" +if [ ! -f "$EXPECTED" ]; then + echo " ⊘ skip: C-consultant expected-trajectory not yet authored" + exit 0 +fi + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +l6_run_claude "$PROJECT" "@bro get pm's view on the new feature" >/dev/null + +l6_assert_trajectory "$PROJECT" "$EXPECTED" diff --git a/tests/dogfood/flows/D-direct-mode.test.sh b/tests/dogfood/flows/D-direct-mode.test.sh new file mode 100755 index 00000000..19eeb742 --- /dev/null +++ b/tests/dogfood/flows/D-direct-mode.test.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# L6 flow D — Direct Mode (FLOWS.md §D) +# +# Pre-state: onboarding complete + a README.md to typo-fix. +# Trigger: @bro fix typo "recieve" → "receive" in README.md +# Expected: bro detects ≤3-line single-file scope → Direct Mode → +# Edit + git commit + ledger_log(direct_mode_used). NO task_create_batch, +# NO Task spawn. + +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +. "$HERE/../lib/flow-helpers.sh" + +PROJECT=$(l6_setup_scratch_project) +trap 'l6_cleanup_project "$PROJECT"' EXIT + +l6_seed_db "$PROJECT" "onboarding-named" + +# Plant the typo so bro has something to fix. +( cd "$PROJECT" && echo "We recieve patches via PRs." > README.md + git add . && git commit -qm "chore: add typo'd line" ) + +l6_run_claude "$PROJECT" "@bro fix the typo 'recieve' to 'receive' in README.md" >/dev/null + +l6_assert_trajectory "$PROJECT" "$L6_DOGFOOD_DIR/expected/D-direct-mode.txt" + +# Additional invariants for Direct Mode (ENUMS.md): +ROWS_TASK_BATCH=$(sqlite3 "$PROJECT/.claude/tmb/trajectory.db" \ + "SELECT COUNT(*) FROM debug_trajectory WHERE tool_or_mcp_name LIKE '%task_create_batch%'") +[ "$ROWS_TASK_BATCH" = "0" ] || { echo " ✗ Direct Mode must NOT call task_create_batch (got $ROWS_TASK_BATCH)" >&2; exit 1; } + +ROWS_TASK_SPAWN=$(sqlite3 "$PROJECT/.claude/tmb/trajectory.db" \ + "SELECT COUNT(*) FROM debug_trajectory WHERE tool_or_mcp_name = 'Task'") +[ "$ROWS_TASK_SPAWN" = "0" ] || { echo " ✗ Direct Mode must NOT spawn SWE via Task (got $ROWS_TASK_SPAWN)" >&2; exit 1; } + +EVENT_DIRECT=$(sqlite3 "$PROJECT/.claude/tmb/trajectory.db" \ + "SELECT COUNT(*) FROM ledger WHERE event_type = 'direct_mode_used'") +[ "$EVENT_DIRECT" = "1" ] || { echo " ✗ Direct Mode must log exactly one direct_mode_used event (got $EVENT_DIRECT)" >&2; exit 1; } + +echo " ✓ Direct Mode invariants verified: no task_create_batch, no Task spawn, one direct_mode_used" diff --git a/tests/dogfood/lib/flow-helpers.sh b/tests/dogfood/lib/flow-helpers.sh new file mode 100644 index 00000000..c2ceedfd --- /dev/null +++ b/tests/dogfood/lib/flow-helpers.sh @@ -0,0 +1,114 @@ +#!/usr/bin/env bash +# Shared helpers for L6 flow scripts. Source this from tests/dogfood/flows/*.test.sh. + +set -uo pipefail + +# l6_setup_scratch_project: creates a fresh Docker-isolated scratch dir, +# initializes git, sets test identity. Returns the absolute path on stdout. +l6_setup_scratch_project() { + local dir + dir=$(mktemp -d -t tmb-l6-XXXX) + ( + cd "$dir" || exit 1 + git init -q -b main + git config user.email l6@l6.test + git config user.name "L6 Test" + echo "init" > README.md + git add . && git commit -qm init + mkdir -p .claude/tmb + ) + echo "$dir" +} + +# l6_seed_db : applies a SQL fixture to the +# project's trajectory.db. Fixture must exist at tests/dogfood/fixtures/.sql. +l6_seed_db() { + local dir="$1" fixture="$2" + local fixture_path="$L6_DOGFOOD_DIR/fixtures/${fixture}.sql" + if [ ! -f "$fixture_path" ]; then + printf " ✗ fixture not found: %s\n" "$fixture_path" >&2 + return 1 + fi + local schema_path="$PLUGIN_ROOT/mcp/trajectory-server/src/schema.sql" + sqlite3 "$dir/.claude/tmb/trajectory.db" < "$schema_path" + sqlite3 "$dir/.claude/tmb/trajectory.db" < "$fixture_path" +} + +# l6_run_claude : runs `claude -p` against the prompt +# in the project, with TMB_DEBUG_TRAJECTORY=1, plugin loaded via --plugin-dir. +# Returns exit code from claude. +l6_run_claude() { + local dir="$1" prompt="$2" + ( + cd "$dir" || exit 1 + export TMB_DEBUG_TRAJECTORY=1 + export CLAUDE_CODE_OAUTH_TOKEN="${CLAUDE_CODE_OAUTH_TOKEN}" + timeout 180 claude --plugin-dir "$PLUGIN_ROOT" -p "$prompt" 2>&1 | tail -50 || true + ) +} + +# l6_assert_trajectory : reads the recorded +# trajectory and verifies every line in appears as a +# substring in the actual sequence (in order). Returns 0 if all matched. +# +# Expected file format (one line per expected step): +# mcp_call:mcp__plugin_tmb_trajectory-server__identity_get +# mcp_call:mcp__plugin_tmb_trajectory-server__config_get +# tool_use:Bash +# ... +# +# Allows extra steps to appear between expected lines (subset match in order). +l6_assert_trajectory() { + local dir="$1" expected_file="$2" + if [ ! -f "$expected_file" ]; then + printf " ✗ expected file missing: %s\n" "$expected_file" >&2 + return 1 + fi + local actual + actual=$(sqlite3 "$dir/.claude/tmb/trajectory.db" \ + "SELECT kind || ':' || tool_or_mcp_name FROM debug_trajectory ORDER BY id") + + if [ -z "$actual" ]; then + printf " ✗ no trajectory rows recorded — TMB_DEBUG_TRAJECTORY may not be wired\n" >&2 + return 1 + fi + + # Walk both lists; expected must be a subset-in-order of actual. + local idx=0 + local found=0 + local expected_lines=() + while IFS= read -r line; do + [ -n "$line" ] && expected_lines+=("$line") + done < "$expected_file" + + while IFS= read -r actual_line; do + if [ "$idx" -ge "${#expected_lines[@]}" ]; then + break + fi + if [[ "$actual_line" == *"${expected_lines[$idx]}"* ]]; then + idx=$((idx + 1)) + found=$((found + 1)) + fi + done <<< "$actual" + + if [ "$idx" -eq "${#expected_lines[@]}" ]; then + printf " ✓ matched %d/%d expected steps\n" "$found" "${#expected_lines[@]}" + return 0 + else + printf " ✗ matched %d/%d expected steps; missing: %s\n" \ + "$found" "${#expected_lines[@]}" "${expected_lines[$idx]}" >&2 + printf " --- actual trajectory (first 30) ---\n" >&2 + echo "$actual" | head -30 >&2 + return 1 + fi +} + +# l6_cleanup_project : removes the scratch directory. +l6_cleanup_project() { + local dir="$1" + [ -n "$dir" ] && [ -d "$dir" ] && rm -rf "$dir" +} + +# Initialize globals used by helpers. +L6_DOGFOOD_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +export L6_DOGFOOD_DIR diff --git a/tests/dogfood/run-l6.sh b/tests/dogfood/run-l6.sh new file mode 100755 index 00000000..cfb87b2e --- /dev/null +++ b/tests/dogfood/run-l6.sh @@ -0,0 +1,73 @@ +#!/usr/bin/env bash +# L6 deterministic-trajectory test runner (issue #108). +# +# Drives real Claude Code through TMB workflows by pre-seeding DB state +# (skipping past AskUserQuestion forms), then asserting the resulting +# MCP/tool trajectory matches a flow's expected sequence from FLOWS.md. +# +# Usage: +# bash tests/dogfood/run-l6.sh # all flows +# bash tests/dogfood/run-l6.sh onboarding # one flow by name +# +# Requirements: +# - CLAUDE_CODE_OAUTH_TOKEN env var (CC's headless auth token) +# - claude in PATH +# - sqlite3, jq +# +# Each flow lives in tests/dogfood/flows/.test.sh and is +# self-contained: pre-seed → invoke → assert. Flows run in mktemp scratch +# dirs — no Docker needed; CI runners are already isolated VMs. + +set -uo pipefail + +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PLUGIN_ROOT="$(cd "$HERE/../.." && pwd)" +FILTER="${1:-}" + +if [ -z "${CLAUDE_CODE_OAUTH_TOKEN:-}" ]; then + printf "❌ CLAUDE_CODE_OAUTH_TOKEN not set.\n" + printf " For local runs: export CLAUDE_CODE_OAUTH_TOKEN=...\n" + printf " For CI: configure as a repo secret (Settings → Secrets).\n" + exit 1 +fi + +for cmd in claude sqlite3 jq; do + if ! command -v "$cmd" >/dev/null 2>&1; then + printf "❌ %s not found in PATH.\n" "$cmd" + exit 1 + fi +done + +PASS=0 +FAIL=0 +FAILED_FLOWS=() + +for flow_script in "$HERE/flows"/*.test.sh; do + [ -e "$flow_script" ] || continue + flow_name=$(basename "$flow_script" .test.sh) + + if [ -n "$FILTER" ] && [[ "$flow_name" != *"$FILTER"* ]]; then + continue + fi + + printf "\n=== L6 flow: %s ===\n" "$flow_name" + + if PLUGIN_ROOT="$PLUGIN_ROOT" \ + CLAUDE_CODE_OAUTH_TOKEN="$CLAUDE_CODE_OAUTH_TOKEN" \ + bash "$flow_script"; then + printf " ✓ %s passed\n" "$flow_name" + PASS=$((PASS + 1)) + else + printf " ✗ %s failed\n" "$flow_name" + FAIL=$((FAIL + 1)) + FAILED_FLOWS+=("$flow_name") + fi +done + +printf "\n========================================\n" +printf "L6 dogfood: %d passed, %d failed\n" "$PASS" "$FAIL" + +if [ "$FAIL" -gt 0 ]; then + printf "Failed flows: %s\n" "${FAILED_FLOWS[*]}" + exit 1 +fi From 32964819ca12b2799f2016053a64c0cc2797a1fb Mon Sep 17 00:00:00 2001 From: Zax Shen Date: Sun, 26 Apr 2026 02:14:54 -0700 Subject: [PATCH 2/2] =?UTF-8?q?=F0=9F=94=A7=20chore(tests):=20add=20inspec?= =?UTF-8?q?t-trajectory.sh=20helper=20for=20L6=20scaffold-fill?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When authoring an expected-trajectory file for a new flow, the workflow is: run the flow once with TMB_DEBUG_TRAJECTORY=1, then read the debug_trajectory table. This script does step 2 cleanly — finds the right DB (handles channel isolation), prints in the L6-expected format ready to paste into tests/dogfood/expected/.txt. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/dogfood/inspect-trajectory.sh | 45 +++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100755 tests/dogfood/inspect-trajectory.sh diff --git a/tests/dogfood/inspect-trajectory.sh b/tests/dogfood/inspect-trajectory.sh new file mode 100755 index 00000000..029adbff --- /dev/null +++ b/tests/dogfood/inspect-trajectory.sh @@ -0,0 +1,45 @@ +#!/usr/bin/env bash +# Inspect a debug_trajectory dump in the format L6 expects. +# +# Use this when authoring an expected-trajectory file for a new flow: +# 1. Run the flow once with TMB_DEBUG_TRAJECTORY=1 +# 2. Run this script against the resulting DB +# 3. Copy the output to tests/dogfood/expected/.txt +# +# Usage: +# bash tests/dogfood/inspect-trajectory.sh +# bash tests/dogfood/inspect-trajectory.sh /tmp/tmb-l6-XXXX +# bash tests/dogfood/inspect-trajectory.sh # uses $PWD + +set -uo pipefail + +PROJECT="${1:-$PWD}" +DB_PATH="$PROJECT/.claude/tmb/trajectory.db" + +if [ ! -f "$DB_PATH" ]; then + # Fall back to channel-isolated path + for candidate in "$PROJECT/.claude/tmb-rc/trajectory.db" "$PROJECT/.claude"/*/trajectory.db; do + [ -f "$candidate" ] && DB_PATH="$candidate" && break + done +fi + +if [ ! -f "$DB_PATH" ]; then + echo "❌ no trajectory.db found under $PROJECT/.claude/" >&2 + exit 1 +fi + +ROW_COUNT=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM debug_trajectory" 2>/dev/null || echo 0) + +if [ "$ROW_COUNT" = "0" ]; then + echo "⊘ no debug_trajectory rows recorded." >&2 + echo " Was TMB_DEBUG_TRAJECTORY=1 set when claude ran?" >&2 + exit 1 +fi + +echo "# Trajectory from $DB_PATH ($ROW_COUNT rows)" +echo "# Copy lines below into tests/dogfood/expected/.txt" +echo "# (Edit out any setup/teardown calls that aren't part of the flow under test.)" +echo + +sqlite3 "$DB_PATH" \ + "SELECT kind || ':' || tool_or_mcp_name FROM debug_trajectory ORDER BY id"