Skip to content

ci: add wheels-bot Claude automation suite (triage, research, review, propose-fix)#2518

Merged
bpamiri merged 1 commit intodevelopfrom
claude/automated-pr-workflow-Zt17m
May 9, 2026
Merged

ci: add wheels-bot Claude automation suite (triage, research, review, propose-fix)#2518
bpamiri merged 1 commit intodevelopfrom
claude/automated-pr-workflow-Zt17m

Conversation

@bpamiri
Copy link
Copy Markdown
Collaborator

@bpamiri bpamiri commented May 9, 2026

Summary

Adds a five-stage Claude-powered GitHub bot for wheels-dev/wheels, modeled on Bun's public Claude workflows (oven-sh/bun/.github/workflows/claude-*.yml + .claude/commands/). The bot is dormant by default — no workflow runs until a repo admin sets vars.WHEELS_BOT_ENABLED='true'.

The five stages:

Stage Trigger Model Output
Triage issue opened Sonnet Comment classifying as bug / framework-design / other (+ confidence on bug path)
Cross-framework research bot triage marker framework-design Opus Comparison of Rails / Laravel / Django / Phoenix / Spring Boot / +1 with a Wheels-idiomatic API sketch (+ confidence)
Propose Fix high-confidence triage OR research marker (or workflow_dispatch) Opus TDD-mandatory draft PR; gated by bot-tdd-gate.yml
Reviewer A PR opened/synchronize Sonnet Single PR review with line comments + verdict
Reviewer B Reviewer A submits a review Sonnet PR comment critiquing A for sycophancy / false positives / missed issues. Loop cap = 3.

Plus a daily cron (bot-auto-close.yml) that closes stale cannot-reproduce triages after 14 days.

Files

New (17):

  • .claude/commands/_shared-rails.md — common safety rails included in every prompt
  • .claude/commands/{triage-issue,research-frameworks,propose-fix,review-pr,review-the-review,auto-close-stale-triage}.md
  • .github/workflows/bot-{triage,research,propose-fix,tdd-gate,review-a,review-b,auto-close}.yml
  • .github/actions/wheels-bot-skip-check/action.yml — central kill-switch + marker check
  • .github/actions/setup-wheels-test-env/action.yml — composite action lifted from pr.yml:30-164 (LuCLI + Lucee + SQLite + Playwright)
  • docs/contributing/wheels-bot.md — operator handbook

Modified (3):

  • CLAUDE.md — new §"Wheels Bot" subsection
  • CONTRIBUTING.md[skip-claude] opt-out + bot-output legibility
  • .github/pull_request_template.md — opt-out hint at the bottom

Safety controls

  • vars.WHEELS_BOT_ENABLED — repo variable, must be 'true' for any workflow to run. Default unset = dormant.
  • [skip-claude] label or title token — per-issue/PR opt-out, checked by every workflow.
  • HTML-comment markers (wheels-bot:<stage>:<key>) prevent duplicate runs across retries.
  • Bot identity is a custom GitHub App (wheels-bot[bot]) — push permissions scoped to bot/** and fix/bot-*/** via repo ruleset (set up by admins as part of activation).
  • bot-tdd-gate.yml hard-rejects bot PRs that don't include both a spec change and an implementation change. Human PRs bypass automatically.
  • Propose-fix prompt auto-downgrades on sensitive areas (security, migrations, deploy, DI) and posts wheels-bot:fix-held:<issue> instead of opening a PR.
  • Research prompt auto-downgrades when the dominant framework pattern conflicts with a CLAUDE.md anti-pattern (capped at medium) or when frameworks disagree (capped at low) — only high-confidence research auto-fires the fix-PR stage.
  • Reviewer B is capped at 3 rounds; round 4 emits a terminal "no further iterations" comment.

Activation steps (post-merge, admins only)

  1. Create the GitHub App wheels-bot[bot] at github.com/settings/apps/new under wheels-dev. Permissions: Contents R/W, Issues R/W, Pull Requests R/W, Metadata R. No webhooks. Install on this repo only.
  2. Add secrets: WHEELS_BOT_APP_ID, WHEELS_BOT_PRIVATE_KEY. Confirm ANTHROPIC_API_KEY is already present.
  3. Create labels skip-claude, cannot-reproduce in the repo.
  4. Add a repo ruleset allowing the App identity to push only to bot/** and fix/bot-*/**.
  5. Update develop branch protection: add Bot PR TDD Gate to required checks, require 1 approving review from wheels-dev/maintainers.
  6. Set vars.WHEELS_BOT_ENABLED='true' to activate.

Phased rollout — see docs/contributing/wheels-bot.md § "Operating the bot". Phase 1 (Reviewer A only) is the smallest first cut; promote subsequent stages one at a time.

Test plan

  • Phase 1 (Reviewer A) — open a trivial test PR against develop, confirm bot-review-a.yml runs and posts a review with the wheels-bot:review-a marker. Push a second commit, confirm cancel-in-progress supersedes the first run.
  • Phase 2 (Triage) — open a test issue with a CFML repro, confirm bot-triage.yml brings up Lucee + SQLite via the composite action and lands a triage comment with confidence + marker. Re-open the issue, confirm the marker check skips a duplicate run.
  • Phase 3 (Reviewer B) — wait for a Reviewer A review, confirm bot-review-b.yml fires only on bot-authored reviews. Have A re-review, confirm round counter increments. Confirm cap at round 3.
  • Phase 4a (Fix-PR — bug path)workflow_dispatch first; verify spec + implementation both committed, bot-tdd-gate.yml passes, commitlint + fast-test pass, PR is --draft. Then enable auto-fire on triage-confidence:high.
  • Phase 4b (Research + framework-design fix path)workflow_dispatch against historical issues; verify research lands an accurate comparison table + Wheels-idiomatic API sketch with explicit confidence. After ≥ 5 supervised runs, extend bot-propose-fix.yml's trigger to fire on research-confidence:high.
  • Phase 5 (Auto-close) — manual cron run, confirm 14-day stale cannot-reproduce triages close politely.
  • Kill-switch test — set WHEELS_BOT_ENABLED=false, open a PR, confirm bot-review-a.yml exits 0 immediately.

Notes

  • All YAML validated locally with yaml.safe_load.
  • Existing docs-validation.yml orchestrator is intentionally not reused — it is a stateful batch runner; these bots are one-shot per event.
  • pr.yml's required-checks contract is undisturbed — the test-env composite action references its prelude by extraction, not modification.

https://claude.ai/code/session_01F5Ev5XsFMzLncPCZ43hjVP


Generated by Claude Code

… propose-fix)

Adds a five-stage Claude-powered GitHub bot modeled on Bun's public
workflows (oven-sh/bun/.github/workflows/claude-*.yml + .claude/commands/):

- Triage (Sonnet): classifies issues as bug / framework-design / other,
  reproduces bugs against tools/test-local.sh, emits confidence markers.
- Cross-framework research (Opus): for framework-design issues, fans out
  to Rails / Laravel / Django / Phoenix / Spring Boot docs and proposes a
  Wheels-idiomatic path with auto-downgrade rules.
- Propose Fix (Opus): TDD-mandatory draft PR; gated by bot-tdd-gate.yml
  which hard-rejects bot PRs without spec + implementation changes.
- Reviewer A (Sonnet): single PR review with line comments and verdict.
- Reviewer B (Sonnet): critiques A for sycophancy, false positives, and
  missed issues; loop cap = 3 rounds.

Plus a daily cron (bot-auto-close.yml) that closes stale cannot-reproduce
triages after 14 days.

All workflows gated on vars.WHEELS_BOT_ENABLED=='true' (default unset, so
this PR is dormant until an admin opts in). Per-issue/PR opt-out via the
[skip-claude] label or title token. Bot identity is a custom GitHub App
(wheels-bot[bot]); secrets WHEELS_BOT_APP_ID + WHEELS_BOT_PRIVATE_KEY
must be added before enabling.

Setup composite action (setup-wheels-test-env) extracts the LuCLI + Lucee
+ SQLite + Playwright prelude from pr.yml so triage and propose-fix can
reuse it without duplicating ~130 lines.

Documentation: docs/contributing/wheels-bot.md (operator handbook),
CLAUDE.md §"Wheels Bot" (quick reference), CONTRIBUTING.md (opt-out
mechanics).

https://claude.ai/code/session_01F5Ev5XsFMzLncPCZ43hjVP
@github-actions github-actions Bot added the docs label May 9, 2026
@bpamiri bpamiri marked this pull request as ready for review May 9, 2026 15:13
@bpamiri bpamiri merged commit 3c9823f into develop May 9, 2026
6 checks passed
@bpamiri bpamiri deleted the claude/automated-pr-workflow-Zt17m branch May 9, 2026 15:13
Comment on lines +11 to +13
concurrency:
group: wheels-bot-review-a-${{ github.event.pull_request.number }}-${{ github.event.pull_request.head.sha }}
cancel-in-progress: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The concurrency group key on bot-review-a.yml line 12 includes ${{ github.event.pull_request.head.sha }}, but cancel-in-progress: true only cancels runs that share the same group key. Because every push to a PR produces a new head SHA, the second push lands in a different concurrency group and the in-flight run for the previous SHA is never cancelled — both runs proceed in parallel. This directly contradicts the test plan's assertion ("Push a second commit, confirm cancel-in-progress supersedes the first run") and wastes Sonnet --max-turns 25 review-A runs plus their cascading Reviewer-B fan-out. Fix: drop -${{ github.event.pull_request.head.sha }} from the group key so successive pushes share a group and the older run is cancelled. The marker check on line 44 already handles same-SHA idempotency, so SHA-specificity belongs there, not in the concurrency key.

Extended reasoning...

What's broken

bot-review-a.yml lines 11-13:

concurrency:
  group: wheels-bot-review-a-${{ github.event.pull_request.number }}-${{ github.event.pull_request.head.sha }}
  cancel-in-progress: true

GitHub Actions cancel-in-progress only cancels runs that share the same concurrency group key. Including head.sha in the key means each new commit produces a new group, so the previous run is not a sibling of the new run and is never cancelled. The two runs proceed in parallel until both complete.

The PR's own test plan asserts the opposite

From the PR description, Phase 1 test plan:

Push a second commit, confirm cancel-in-progress supersedes the first run.

That assertion will fail as written. A reviewer running it will see two parallel bot-review-a jobs (one for each SHA), not a cancellation.

Step-by-step proof

  1. Developer pushes commit abc123 to PR isAjax() function broken in 1.2 preview #99. GitHub Actions queues a run with concurrency.group = wheels-bot-review-a-99-abc123. The run begins (Sonnet, up to 25 turns).
  2. Two minutes later, developer pushes commit def456 to the same PR. GitHub Actions queues a run with concurrency.group = wheels-bot-review-a-99-def456.
  3. The two group keys differ (...-abc123 vs ...-def456). cancel-in-progress: true only cancels in-progress runs whose group matches the new run's group. There is no in-progress run in group ...-def456, so nothing gets cancelled.
  4. Both runs complete. Reviewer A posts two reviews — one anchored to abc123 (now stale) and one anchored to def456.

The marker check on line 44 (wheels-bot:review-a:${pr}:${sha}) prevents the same SHA from being re-reviewed if the workflow retries, but it does not prevent two different SHAs from being reviewed in parallel. The marker and the concurrency group are addressing different problems.

Impact

  • CI cost regression: every push during an in-flight run double-spends Sonnet --max-turns 25 on Reviewer A, and each Reviewer A review fans out into a Reviewer B run (capped at 3 rounds). On an active PR with rapid pushes, the spend compounds.
  • Stale reviews: the older run posts a review against an outdated SHA, which is exactly the scenario cancel-in-progress was added to prevent.
  • Documentation lies: a contributor running the test plan in good faith will see the assertion fail and not know whether it's the test or the workflow.

Fix

Drop head.sha from the concurrency group key:

concurrency:
  group: wheels-bot-review-a-${{ github.event.pull_request.number }}
  cancel-in-progress: true

This way successive pushes to the same PR collide into the same group, and cancel-in-progress: true cancels the older run as the test plan promises. Same-SHA idempotency is already handled by the line-44 marker check, which is the correct place for SHA-specificity.

Note for context: bot-review-b.yml line 12 also includes ${{ github.event.review.id }} in its group key, but pairs it with cancel-in-progress: false — internally consistent (each review event spawns its own lane, no cancellation intended). Only review-a has the contradiction.

Comment on lines +8 to +11
permissions:
contents: read
checks: write

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When a workflow declares an explicit permissions: block, GitHub Actions defaults every unlisted scope to none — not the workflow default. This block lists only contents: read and checks: write, so pull-requests is implicitly none. The verify step then runs gh pr diff "$PR_NUMBER" --name-only (with set -euo pipefail), which calls GET /repos/{o}/{r}/pulls/{n}/files and requires pull-requests: read. Result: every bot PR's TDD gate will fail-closed with a 403 auth error rather than running the intended spec+impl check, defeating the gate's purpose. Fix: add pull-requests: read to the permissions block.

Extended reasoning...

What is wrong

bot-tdd-gate.yml (lines 8-11) declares:

permissions:
  contents: read
  checks: write

Per GitHub's documented behavior for GITHUB_TOKEN permissions, once permissions: is specified explicitly at the workflow or job level, every unlisted scope defaults to none — not to the repo/workflow default. So pull-requests is implicitly none here.

The verify step (line 42) runs:

env:
  GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
  set -euo pipefail
  changed=$(gh pr diff "$PR_NUMBER" --name-only)

gh pr diff … --name-only is backed by GET /repos/{owner}/{repo}/pulls/{pull_number}/files, which requires pull-requests: read. With the scope set to none, the API returns 403 Resource not accessible by integration, gh exits non-zero, the $(...) command-substitution fails, set -e propagates, and the entire gate hard-fails.

Why this matters

The gate is the central enforcement mechanism behind the bot's TDD invariant — it is supposed to reject bot PRs that lack either a spec change or an implementation change. With this misconfiguration, every real bot PR will fail the gate with a confusing auth error rather than running the intended spec/impl validation. Humans seeing the failure will see "Bot PR TDD Gate: failed" with a permissions error rather than the descriptive error messages designed for this gate (lines 56-60, 64-68). Worse, the gate is fail-closed, so this masks whether the bot is actually following TDD.

The bug is silent in the sense that the workflow is dormant by default (vars.WHEELS_BOT_ENABLED), so it won't surface until activation. But the moment the bot opens its first real PR, the gate will misfire.

Cross-check: every other PR-interacting workflow in this repo grants the scope

.github/workflows/label.yml:16:                  pull-requests: write
.github/workflows/compat-matrix.yml:517,547:      pull-requests: write
.github/workflows/refresh-packages-baseline.yml:29: pull-requests: write
.github/workflows/web-deploy.yml:27:              pull-requests: write
.github/workflows/generate-changelog.yml:42:      pull-requests: write
.github/workflows/snapshot.yml:11:                pull-requests: write
.github/workflows/docs-validation.yml:66:         pull-requests: write

Only this new gate omits it — a copy-paste omission, not a deliberate hardening choice. (Note: those workflows use write because they post comments; this gate only needs read.)

Step-by-step proof

  1. Bot opens PR #N from branch fix/bot-1234-foo against develop.
  2. bot-tdd-gate.yml triggers on pull_request: opened.
  3. Step "Decide if this PR is bot-authored" sets is_bot=true (head ref starts with fix/bot-).
  4. Step "Verify bot PR contains spec + implementation changes" runs.
  5. gh pr diff "$PR_NUMBER" --name-only issues GET /repos/wheels-dev/wheels/pulls/N/files with the GITHUB_TOKEN whose pull-requests scope is none.
  6. GitHub returns 403 {"message":"Resource not accessible by integration"}.
  7. gh exits with code 1; the command substitution changed=$(...) propagates the failure under set -e.
  8. The step fails before any of the spec/impl logic runs. The check named "Bot PR TDD Gate" reports failure with no actionable error message.
  9. Branch protection blocks merge for the wrong reason.

Fix

permissions:
  contents: read
  pull-requests: read
  checks: write

Minor follow-up worth considering: checks: write is declared but no Checks API calls are made in the workflow body — it can probably be dropped. But that's a tidiness nit; the blocker is the missing pull-requests: read.

claude_args: |
--model claude-opus-4-7
--max-turns 60
--allowedTools "Bash(gh:*),Bash(git:*),Bash(bash tools/test-local.sh*),Bash(curl:*),Read,Edit,Write,Grep,Glob"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The --allowedTools list at .github/workflows/bot-propose-fix.yml:106 grants Bash(git:*) — a wildcard that permits every git subcommand including push, push --force, reset --hard, config, rebase, checkout -B, etc. This contradicts the rails in .claude/commands/_shared-rails.md (Tool restrictions) which assert git is read-only and that "the caller workflow handles branch creation and pushes," and undermines the dedicated "Push branch" step at lines 109-117 that is supposed to be the only push path. Tighten to enumerated subcommands (e.g. Bash(git status),Bash(git diff:*),Bash(git log:*),Bash(git show:*),Bash(git grep:*),Bash(git add:*),Bash(git commit:*),Bash(git checkout:*)) — matching what the sibling review/research workflows do.

Extended reasoning...

The contradiction. The shared rails file (.claude/commands/_shared-rails.md lines 9-13) tells the model: "Git operations: read-only only — git status, git log, git diff, git show, git grep. Never git push, git config, git checkout -B on shared branches, git reset --hard, git --force, or any subcommand that rewrites history. The caller workflow handles branch creation and pushes when applicable." The propose-fix command file (.claude/commands/propose-fix.md) further says "Do not use --amend or --force," and the workflow has a dedicated "Push branch" step at .github/workflows/bot-propose-fix.yml lines 109-117 that owns the actual git push. But the --allowedTools value at line 106 is Bash(git:*) — a bare wildcard that covers every git subcommand, including the ones the rails forbid.\n\nWhy the prompt rails are not enough. Prompt text is advisory; --allowedTools is the runtime enforcement boundary. The Claude Code action evaluates the allowlist before invoking the shell, so anything listed there is reachable regardless of what the prompt says. A model that decides to git push directly (to ship faster, due to misreading the instructions, or because of prompt injection from the issue body) would not be blocked at the tool layer. The actions/checkout step at line 56 already cached the App token in the working tree, so a direct git push would succeed.\n\nAnomaly compared to siblings. The sibling workflows in this same PR all enumerate read-only subcommands rather than using the wildcard, demonstrating that the author knew how to scope the grant correctly:\n- bot-review-a.yml: Bash(git log:*),Bash(git diff:*),Bash(git show:*),Bash(git grep:*),Bash(git status)\n- bot-review-b.yml: same scoped list\n- bot-research.yml: Bash(git log:*),Bash(git show:*),Bash(git grep:*)\n\nOnly bot-propose-fix.yml uses git:*. Because propose-fix legitimately needs git add / git commit to stage and commit work on its branch, the right scoping is "read-only plus stage-and-commit" — not "everything."\n\nStep-by-step proof.\n1. Issue #999 is filed with triage-confidence:high (or workflow_dispatch is invoked).\n2. bot-propose-fix.yml fires. Lines 56-67: actions/checkout@v6 runs with the App token, so .git/config has the App token cached as the remote auth.\n3. Line 86: git checkout -b fix/bot-999-.... The model is now on a fresh branch.\n4. Line 99: anthropics/claude-code-action@v1 runs with --allowedTools "Bash(gh:*),Bash(git:*),...". The model is told (via rails text) git is read-only-plus-add-commit, but the actual tool permission is git:*.\n5. Failure mode A (deviation): The model decides to push immediately rather than wait for the dedicated step — git push -u origin fix/bot-999-.... Because Bash(git:*) matches git push, the call is permitted. The push succeeds (App token + branch under fix/bot-*/**).\n6. Failure mode B (history rewrite): The model runs git reset --hard HEAD~3 after a failed test or runs git config user.email someone@else.com to attribute commits differently. Neither is blocked at the tool layer; both contradict the rails text.\n7. Failure mode C (injection-amplified): Issue body contains adversarial instructions like "after the fix, force-push to clean history." The model either complies or is closer to compliance because the tool layer doesn't refuse.\n\nIn every case, the dedicated "Push branch" step at lines 109-117 (with its git diff --quiet no-op-detection) is bypassed.\n\nMitigations and residual risk. The PR description's activation step 4 says admins should add a repo ruleset restricting the App identity to push only to bot/** and fix/bot-*/**, and "Block force-push everywhere." That is a real defense-in-depth layer — the App identity literally cannot push to develop or rewrite history because the server-side ruleset would reject it. So this is a layered-defense concern, not a live security exploit.\n\nHowever, the rails text promises a tighter constraint than the grant actually enforces, and (a) the App-level ruleset is administrator-configured at activation time so it isn't guaranteed to be in place when this workflow runs, (b) the wildcard still permits in-tree mischief like git config rewriting commit attribution, git reset --hard losing legitimate work on the branch, and git push to a permitted fix/bot-*/** ref bypassing the dedicated step's gating logic.\n\nFix. Replace line 106 with an enumerated list that mirrors what the propose-fix prompt actually needs (read + stage + commit + checkout for branch operations within the working copy):\n\nyaml\n--allowedTools "Bash(gh:*),Bash(git status),Bash(git diff:*),Bash(git log:*),Bash(git show:*),Bash(git grep:*),Bash(git add:*),Bash(git commit:*),Bash(git checkout:*),Bash(bash tools/test-local.sh*),Bash(curl:*),Read,Edit,Write,Grep,Glob"\n\n\nThis matches the sibling pattern, keeps the rails honest, and leaves the dedicated "Push branch" step as the only path that pushes.

Comment on lines +158 to +160
- [ ] At least one file outside `tests/`, `vendor/wheels/tests/`,
`.ai/`, `CHANGELOG.md`, `docs/` is new or modified (the
implementation)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The propose-fix self-check at step 12 lists only 5 exclusion paths (tests/, vendor/wheels/tests/, .ai/, CHANGELOG.md, docs/) but bot-tdd-gate.yml line 63 excludes 7 — also web/ and .github/. So a bot PR that touches only a spec plus a web/sites/guides/.../<page>.mdx (which step 9 of the same prompt explicitly endorses for user-visible behavior changes) self-clears step 12, then is hard-rejected by the gate with "Bot PR has tests but no implementation". Sync the bullet at lines 158-160 with the gate's regex (and consider also fixing the gate's error message at line 67, which is tests/, .ai/, docs/, and CHANGELOG.md — even further from its own regex).

Extended reasoning...

What the bug is

bot-tdd-gate.yml and .claude/commands/propose-fix.md are supposed to enforce the same TDD invariant: every bot PR must contain at least one spec change and at least one implementation change. The gate is the enforcement; the prompt's step-12 self-check is the bot's pre-flight.

Their exclusion lists have drifted:

  • Gate (bot-tdd-gate.yml line 63) excludes 7 path prefixes from impl_changes:
    ^(tests/|vendor/wheels/tests/|\.ai/|CHANGELOG\.md|docs/|web/|\.github/)
    
  • Prompt (propose-fix.md lines 158-160) lists only 5:

    At least one file outside tests/, vendor/wheels/tests/, .ai/, CHANGELOG.md, docs/ is new or modified (the implementation)

The prompt is missing web/ and .github/. The gate's own error message at line 67 also drifts the other direction — it says outside tests/, .ai/, docs/, and CHANGELOG.md, missing vendor/wheels/tests/, web/, and .github/.

How it manifests — step-by-step proof

  1. A user-visible bug gets triaged with high confidence; bot-propose-fix.yml fires.
  2. The bot follows step 5: writes a failing spec under vendor/wheels/tests/specs/<layer>/.
  3. The bot follows step 7: implements the fix… but the fix is something user-visible, like updating a documented validation message or a form-helper output. Step 9 explicitly says: "If user-visible behavior changed: update web/sites/guides/src/content/docs/v4-0-0-snapshot/<area>/<page>.mdx".
  4. Imagine the bot mis-classifies the MDX update as the "implementation" change (a reasonable confusion since step 9 lists it under "Update supporting docs" alongside CFML files). Or imagine an issue whose only resolution genuinely is an MDX clarification.
  5. Bot reaches step 12 self-check. Looking at the diff:
    • vendor/wheels/tests/specs/<layer>/<x>Spec.cfc — counts toward "spec" ✓
    • web/sites/guides/.../<page>.mdxnot in the prompt's exclusion list, so the bot considers it the "implementation" ✓
    • All boxes ticked. Bot opens the PR.
  6. bot-tdd-gate.yml runs. Its regex does exclude web/, so impl_changes is empty. Gate fails with: "Bot PR has tests but no implementation."
  7. Bot run wasted; the PR sits broken until a human intervenes.

The same trap exists for .github/: step 7 only forbids .github/workflows/pr.yml, not other .github/ paths (e.g. .github/pull_request_template.md, which this very PR modifies). A bot edit there would self-clear step 12 then trip the gate.

Why existing code doesn't prevent it

The gate is the only line of defense, and its error message also lies — saying the implementation must live outside 4 paths when the regex actually excludes 7. So a confused bot reading the gate's failure message would be told to look outside tests/, .ai/, docs/, CHANGELOG.md, but adding a web/ change still wouldn't fix the failure. There's no shared source for the exclusion list — three places (gate regex, gate error message, prompt self-check bullet) each have their own copy.

Impact

Bot is dormant by default (vars.WHEELS_BOT_ENABLED), and the worst case is a rejected bot PR — which is what the gate exists for. So this is not catastrophic. But it directly degrades bot-PR success rate on a path the prompt explicitly endorses, and the inconsistency means the gate doesn't actually enforce the prompt's discipline as claimed in the prompt's own "TDD invariant" note ("the prompt-level discipline below is enforced by code, so don't skip steps"). The gate's "enforced by code" promise is broken when the prompt and code disagree on what's being enforced.

How to fix

Minimal: update the bullet at propose-fix.md lines 158-160 to list all 7 paths (tests/, vendor/wheels/tests/, .ai/, CHANGELOG.md, docs/, web/, .github/), and update the gate's error message at bot-tdd-gate.yml line 67 to match. Better: extract the exclusion list into a single source — e.g., a small script under tools/ci/ that the gate sources and the prompt cites by reference rather than restating. That kills the drift class entirely.

claude_args: |
--model claude-opus-4-7
--max-turns 40
--allowedTools "Bash(gh:*),Bash(git log:*),Bash(git show:*),Bash(git grep:*),WebFetch,WebSearch,Read,Grep,Glob"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Step 3 of research-frameworks.md instructs the model to "Use the Agent tool to launch 6 parallel sub-agents" for the cross-framework fan-out, but bot-research.yml's --allowedTools list does not include Task (the tool that launches sub-agents). claude-code-action enforces --allowedTools as an explicit allow-list, so the prompt's documented core mechanism cannot run as designed. Either add Task to the workflow's allow-list or rewrite step 3 to do sequential WebFetch calls in a single agent.

Extended reasoning...

What the bug is

.claude/commands/research-frameworks.md step 3 (lines 43–44) is explicit:

Use the Agent tool to launch 6 parallel sub-agents (or fewer if fewer frameworks are relevant). Each agent gets: …Each agent uses WebFetch against the canonical docs and returns its structured summary.

But .github/workflows/bot-research.yml line 71 sets:

--allowedTools "Bash(gh:*),Bash(git log:*),Bash(git show:*),Bash(git grep:*),WebFetch,WebSearch,Read,Grep,Glob"

Task (the Claude Code tool that launches sub-agents — the one referred to as "the Agent tool" in the prompt) is not listed.

Why existing code does not save us

claude-code-action@v1 passes --allowedTools straight through to the Claude CLI as an allow-list. Tools not explicitly granted are denied in non-interactive (--max-turns) runs — the model cannot prompt a human to approve them. The shared rails the prompt cites also reinforce this: _shared-rails.md says "No write-side network tools unless the caller workflows --allowed-tools explicitly grants them" — the same principle applies to Task.

Step-by-step proof

  1. Issue opens, triage classifies it framework-design, posts the trigger marker.
  2. bot-research.yml fires, generates an App token, runs the skip-check, then invokes claude-code-action@v1 with the allow-list above.
  3. The model loads /research-frameworks <issue> and follows step 3, attempting to call Task to launch sub-agent New master #1 (Rails).
  4. The CLI rejects the call because Task is not in --allowedTools. With --max-turns 40 the run is non-interactive, so there is no human-approval fallback.
  5. Best case: the model falls back to sequential WebFetch calls in a single agent — degraded behavior that contradicts the documented design and likely runs into the turn budget given six frameworks × multiple URLs each. Worst case: the model errors out before posting the comment, and the issue gets a partial or no research comment despite the workflow consuming Opus quota and CI minutes.

Impact

The "parallel sub-agent fan-out across 6 frameworks" is the named central mechanism of the research stage in the PR description, docs/contributing/wheels-bot.md ("Launches parallel sub-agents to look up how each of …"), and the prompt itself. With the bot dormant by default the blast radius is bounded today, but the moment an admin sets WHEELS_BOT_ENABLED=true the research stage will not behave as documented on its first real invocation.

How to fix

Pick one:

  • Add Task to the allow-list in .github/workflows/bot-research.yml line 71:

    --allowedTools "Bash(gh:*),Bash(git log:*),Bash(git show:*),Bash(git grep:*),WebFetch,WebSearch,Read,Grep,Glob,Task"

    This is the lowest-friction fix and preserves the prompts parallel-fan-out design. May also warrant raising --max-turns 40 if sub-agents themselves need turn budget.

  • Rewrite step 3 of research-frameworks.md to do sequential WebFetch calls in a single agent (and update docs/contributing/wheels-bot.md and the PR description accordingly so the documented mechanism matches the actual implementation).

Comment on lines +8 to +11
This page is for humans interacting with the bot. For the design rationale,
see the plan at `/root/.claude/plans/i-just-watched-a-polymorphic-plum.md` (or
its archived copy in the repo when published). For the framework's general
contribution rules, see [`CONTRIBUTING.md`](../../CONTRIBUTING.md).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 docs/contributing/wheels-bot.md line 9 references /root/.claude/plans/i-just-watched-a-polymorphic-plum.md, which is an absolute path on the PR author's local Claude Code install — it will not exist on any reader's filesystem, and the diff does not commit that plan into the repo. The hedge "or its archived copy in the repo when published" itself admits the breadcrumb is dead the moment this doc lands. Fix: either commit the plan into the repo and link the canonical path, or simply remove the line.

Extended reasoning...

What the bug is

In docs/contributing/wheels-bot.md (the new operator handbook), the lead-in paragraph contains a pointer to a design plan:

This page is for humans interacting with the bot. For the design rationale,
see the plan at `/root/.claude/plans/i-just-watched-a-polymorphic-plum.md` (or
its archived copy in the repo when published).

/root/.claude/plans/... is an absolute path that exists only on the PR author's local Claude Code install (/root/.claude/ is the per-user state directory used by Claude Code). It is not a path that will resolve on any reader's machine — not on a contributor's laptop, not on CI, not in the rendered docs site.\n\n### Why the existing escape hatch does not save it\n\nThe parenthetical "or its archived copy in the repo when published" is meant to be a fallback, but it is itself an admission that the breadcrumb is currently dead — the archived copy is not in the diff. A grep of the PR's changed-files list confirms only docs/contributing/wheels-bot.md is added under docs/; no plan file was committed alongside it. So neither half of the "X or Y" lookup resolves.\n\n### Impact\n\nThis is not a correctness or security issue — it is a documentation quality / professionalism issue:\n\n1. Readers following the link find nothing and lose trust in the rest of the doc.\n2. A leaked /root/... path signals "machine-generated, not proofread" — exactly the impression a brand-new bot/automation suite should avoid in its own operator handbook.\n3. The whimsical plan filename (i-just-watched-a-polymorphic-plum.md) reinforces that this was a session-local artifact never meant for committed prose.\n\n### Step-by-step proof\n\n1. Reader clones the repo at this PR's SHA.\n2. Reader opens docs/contributing/wheels-bot.md and sees: "see the plan at /root/.claude/plans/i-just-watched-a-polymorphic-plum.md".\n3. Reader runs cat /root/.claude/plans/i-just-watched-a-polymorphic-plum.md → file does not exist (and won't, on any non-PR-author machine).\n4. Reader checks the parenthetical fallback: git ls-files | grep -i polymorphic-plum or find . -name '*polymorphic-plum*' → no results in the repo either.\n5. Both lookup paths fail; the breadcrumb is dead.\n\n### How to fix\n\nTwo options, either is fine:\n\n- Drop the breadcrumb. Replace the two-sentence paragraph with just "This page is for humans interacting with the bot. For the framework's general contribution rules, see CONTRIBUTING.md." The body of the doc already explains the design.\n- Commit the plan and link the committed path. Move the plan into something like docs/superpowers/plans/2026-05-09-wheels-bot.md (mirroring the existing docs/superpowers/plans/ convention referenced in CLAUDE.md) and rewrite the line to see [docs/superpowers/plans/2026-05-09-wheels-bot.md](...).\n\nDropping the line is the lower-risk fix; it doesn't require deciding whether the plan is in shape for public consumption.

Comment on lines +27 to +30
is_bot=true
elif [[ "$PR_HEAD" =~ ^bot/ || "$PR_HEAD" =~ ^fix/bot- ]]; then
is_bot=true
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The TDD gate's branch-fallback regex on .github/workflows/bot-tdd-gate.yml:28 only matches ^bot/ and ^fix/bot-, but .claude/commands/_shared-rails.md lines 48-49 sanction two bot branch patterns: fix/bot-<issue>-<slug> or feature/bot-<slug>. A bot PR opened on a feature/bot-* branch by a non-App identity (manual testing, future workflow variants) would set is_bot=false and silently bypass the spec+impl requirement. Either expand the regex to ^bot/|^fix/bot-|^feature/bot- or strike feature/bot-<slug> from the rails to single-source the contract.

Extended reasoning...

The bug. .claude/commands/_shared-rails.md lines 48-49 explicitly declare two valid bot branch patterns: fix/bot-<issue>-<slug> or feature/bot-<slug>. The TDD gate that enforces the spec+impl requirement on bot PRs uses two signals to decide is_bot in .github/workflows/bot-tdd-gate.yml lines 26-30: a PR_AUTHOR == 'wheels-bot[bot]' check (line 26) and a branch-pattern fallback (line 28). The branch fallback only matches ^bot/ or ^fix/bot- — the documented feature/bot-* pattern is not in the regex.\n\nWhy the fallback exists. The PR_AUTHOR check covers the realistic path today (every bot-authored PR runs through line 26 first), so the immediate practical impact is small. But the branch regex was added as defense-in-depth for cases where the App identity isn't the author: manual workflow_dispatch testing where a maintainer pushes to a bot branch, future automation variants that don't run as the App, or anyone reading _shared-rails.md and pushing to the documented feature/bot-* branch expecting the gate to fire. In every one of those scenarios the gate silently turns into a no-op.\n\nStep-by-step proof. (1) A maintainer reads _shared-rails.md line 49, which says feature/bot-<slug> is a sanctioned bot-branch pattern. (2) They push a test PR on branch feature/bot-foo from their personal account to validate a future automation variant. (3) bot-tdd-gate.yml line 23 reads PR_HEAD=feature/bot-foo. Line 26 fails (PR_AUTHOR is the human, not wheels-bot[bot]). Line 28 evaluates 'feature/bot-foo' =~ ^bot/ → false, and 'feature/bot-foo' =~ ^fix/bot- → false. (4) is_bot=false is written. (5) Lines 36-77 are all conditioned on steps.classify.outputs.is_bot == 'true', so the entire spec+impl enforcement is skipped. The PR passes the gate without containing any spec changes, contradicting the contract documented in _shared-rails.md.\n\nWhy existing code doesn't prevent it. Three mitigations narrow the blast radius but don't close the hole: (a) the PR_AUTHOR check on line 26 catches every PR opened by the App identity; (b) the activation-step ruleset (PR description step 4) only allows the App to push to bot/** and fix/bot-*/**, so the App itself can't currently use feature/bot-*; (c) bot-propose-fix.yml line 80 hardcodes the branch name as fix/bot-${ISSUE_NUMBER}-${slug}. None of these are durable: (a) doesn't help when the author is a human, (b) is admin-managed config that can be expanded if someone implements a future feature/bot-* workflow per the rails, and (c) is one workflow — _shared-rails.md is the contract that future workflows will read.\n\nImpact. Bypassed TDD gate on feature/bot-* branches. The contract is single-sourced poorly: _shared-rails.md lists two branch patterns, the gate enforces one. The gap will materialize the moment any bot-related automation grows a feature/bot-* branch — and since the rails sanction it, that's a likely path for design-track features that aren't bug fixes.\n\nFix. Trivial — pick one of:\n\nyaml\n# Option A: expand the gate regex (preferred if feature/bot-* is intended)\nelif [[ "$PR_HEAD" =~ ^bot/ || "$PR_HEAD" =~ ^fix/bot- || "$PR_HEAD" =~ ^feature/bot- ]]; then\n\n\nOr strike feature/bot-<slug> from _shared-rails.md line 49 so the rails describe what the gate actually enforces. Either way, single-source the contract.

bpamiri added a commit that referenced this pull request May 9, 2026
…action

Composite actions cannot read the `vars` context — that lookup is only
available in workflow files. The skip-check action's internal kill-switch
check failed every time the action was invoked, with:

  Unrecognized named-value: 'vars'.
  Located at position 1 within expression: vars.WHEELS_BOT_ENABLED

This silently broke from PR #2518 onwards but only surfaced after the
kill switch was activated and Reviewer A actually fired on a real PR.

The kill switch is already enforced at the job level via
`if: vars.WHEELS_BOT_ENABLED == 'true'` in every wheels-bot workflow,
so the action's internal check was redundant as well as broken.

Removed:
- `WHEELS_BOT_ENABLED` env binding (couldn't resolve)
- The `if [[ ... != "true" ]]; then skip=true ...` block (redundant)

Description updated to note that the kill switch lives at the job
level, not in this action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bpamiri added a commit that referenced this pull request May 9, 2026
…_bots) (#2520)

* fix(cli): escape # in test help URL fragment

Module.cfc:3723 prints a help URL containing the unescaped fragment
"testing#testing-against-different-engines". Lucee's parser interprets
unescaped # in CFScript string literals as expression delimiters and
crashes the file's compilation with "Invalid Syntax Closing [#] not
found" when no closing # is found.

This crashed Module.cfc compilation, which broke `wheels new` and
the Wheels Snapshots smoke test pipeline (build / Smoke Test Installed
Distribution). Every push to develop has been failing this gate since
PR #2517 introduced the line.

CFML rule: ## inside a string literal outputs a literal #. The fix is
a one-character change.

See CLAUDE.md "# escape gotcha" for the same bug class historical
context.

* ci: remove unreachable vars.WHEELS_BOT_ENABLED check from skip-check action

Composite actions cannot read the `vars` context — that lookup is only
available in workflow files. The skip-check action's internal kill-switch
check failed every time the action was invoked, with:

  Unrecognized named-value: 'vars'.
  Located at position 1 within expression: vars.WHEELS_BOT_ENABLED

This silently broke from PR #2518 onwards but only surfaced after the
kill switch was activated and Reviewer A actually fired on a real PR.

The kill switch is already enforced at the job level via
`if: vars.WHEELS_BOT_ENABLED == 'true'` in every wheels-bot workflow,
so the action's internal check was redundant as well as broken.

Removed:
- `WHEELS_BOT_ENABLED` env binding (couldn't resolve)
- The `if [[ ... != "true" ]]; then skip=true ...` block (redundant)

Description updated to note that the kill switch lives at the job
level, not in this action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: add allowed_bots to all wheels-bot claude-code-action invocations

The anthropics/claude-code-action defaults to blocking workflow runs
initiated by non-human actors (any GitHub Bot identity). Without an
explicit allowlist, every bot-triggered workflow fails fast with:

  Action failed with error: Workflow initiated by non-human actor:
  wheels-bot (type: Bot). Add bot to allowed_bots list or use '*' to
  allow all bots.

This surfaced first on bot-review-b.yml — Reviewer B is triggered by
wheels-bot[bot] submitting a Reviewer A review, and the action blocked
on the bot initiator. Same shape would hit bot-research, bot-propose-fix,
and bot-auto-close (cron actor is github-actions[bot]) as the rollout
progresses through Phases 2-5.

Adding `allowed_bots: 'wheels-bot[bot],github-actions[bot]'` to all six
wheels-bot workflows. Specific allowlist (not '*') because the repo is
public — '*' would let any external GitHub App invoke the action with
prompts they could influence (per the action's docs/security.md).

The two identities allowed:
- wheels-bot[bot]: our App's bot identity
- github-actions[bot]: GitHub's own actor for scheduled and
  workflow-internal triggers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants