pr-auto-merge.yml race: enables auto-merge with class=standard default before pr-classify.yml labels risk:blocked

## Summary

`pr-auto-merge.yml` evaluates and enables auto-merge in parallel with `pr-classify.yml`, treating absent-label as `class=standard` (default). When the classifier later applies `risk:blocked`, the `labeled` event from `GITHUB_TOKEN` is anti-loop-suppressed by GitHub — so `pr-auto-merge.yml` never re-evaluates and never revokes. The PR auto-merges once required checks clear, despite being correctly labeled `risk:blocked`.

`claude-author-automerge.yml` has its own regex-based check that DOES correctly refuse on blocked paths — but it runs in parallel with `pr-auto-merge.yml`, not after, so the latter's enable already won.

## Reproduction (from whois-api-llc/wxa-jake-ai#539)

PR added `.github/workflows/auto-tag.yml` — should have been `risk:blocked` per the caller's `.github/risk-paths.yml`. Event timeline:

```
17:19:25Z  pull_request:opened — pr-auto-merge.yml, claude-author-automerge.yml, pr-classify.yml all START
17:19:30Z  pr-auto-merge.yml reads PR labels → none → "class=standard" (default)
17:19:31Z  pr-auto-merge.yml: "Profile: standard (class=standard, trailer=standard, ai=true)"
17:19:31Z  pr-auto-merge.yml calls `gh pr merge --auto --squash` → AUTO-MERGE ENABLED
17:19:33Z  GitHub records `auto_squash_enabled` event
17:19:34Z  claude-author-automerge.yml: "Risk-tier match — auto-merge blocked" (refused, but TOO LATE)
17:19:48Z  pr-classify.yml applies label `risk:blocked` (17 sec after pr-auto-merge made its decision)
           ↑ This labeled event was authored by GITHUB_TOKEN → GitHub anti-loop-suppresses
              workflow re-runs → pr-auto-merge.yml does NOT re-evaluate → no revocation.
17:22:48Z  CI clears → auto-merge fires → PR merged with `risk:blocked` label still applied
```

[Workflow log evidence for the bad decision](https://github.com/whois-api-llc/wxa-jake-ai/actions/runs/26367760966) — search for the line `Profile: standard (class=standard, trailer=standard, ai=true)`.

Verification of no re-run:

```bash
gh run list --workflow=pr-auto-merge.yml --limit 30 \
  --json databaseId,createdAt,event,headSha \
  --jq '[.[] | select(.headSha=="9e12b019f1f16087b4a7e568299d5078adf24ea4")]'
# Returns exactly 1 run, at 17:19:25Z
```

## Root cause

`pr-auto-merge.yml` line ~70 in the reusable:

```bash
class=$(gh pr view "$PR" --json labels --jq '[.labels[].name | select(startswith("risk:"))][0] // ""' | sed 's/^risk://')
[ -z "$class" ] && class="standard"
```

The `[ -z "$class" ] && class="standard"` default makes the workflow PERMISSIVE when the classifier hasn't yet finished. Combined with the typical Claude-author flow (which auto-adds `Auto-Merge-Risk: low` or has `standard` in the trailer), this opens a ~15-second window where any `risk:blocked` PR gets auto-merge enabled.

The intended re-evaluation mechanism — the `labeled` trigger — is defeated by GitHub's anti-loop policy because the label is applied by `GITHUB_TOKEN`.

## Real-world impact

**Any Claude-authored PR with `Auto-Merge-Risk: standard` or `low` that touches a `risk:blocked` path will auto-merge once required checks clear.** Blocked paths include:

- `Dockerfile`, `docker-compose*.{yml,yaml}`
- `.github/workflows/**`, `.github/risk-paths.yml`, `.github/CODEOWNERS`
- `**/.env`, `**/.env.production`, `**/secrets*`
- (Plus any caller-specific additions to `risk-paths.yml` `blocked:` list)

In whois-api-llc/wxa-jake-ai#539 the merged PR was benign — it was a workflow file I (Claude) wrote and approved. But the same gap would let through a malicious or buggy workflow change just as easily.

Affected fleet: every repo that uses both `pr-auto-merge.yml` and `pr-classify.yml` callers. Per `topcoder1/dotclaude`, that's 33 in `topcoder1/*` and 11 in `whois-api-llc/*` as of 2026-05-02.

## Recommended fix

Convert `pr-auto-merge.yml` to trigger on `workflow_run: workflows: [PR Risk Classifier (reusable)], types: [completed]` instead of `pull_request` directly. This guarantees the classifier has finished and applied its label before `pr-auto-merge.yml` evaluates.

Sketch:

```yaml
on:
  workflow_run:
    workflows: ["PR Risk Classifier (reusable)"]
    types: [completed]
  pull_request_review:
    types: [submitted]
  check_suite:
    types: [completed]   # keep for the post-CI fire when checks finish

jobs:
  evaluate:
    # workflow_run events lack github.event.pull_request — extract PR from the triggering workflow's run.
    if: >-
      github.event_name != 'workflow_run' ||
      github.event.workflow_run.conclusion == 'success'
    ...
```

The PR-resolution step (`Resolve PR number`) already has branching for non-PR events, so the surface area is mostly the `on:` block + a small tweak to read the PR from `github.event.workflow_run.pull_requests`.

### Alternative fixes considered + rejected

1. **Make the default permissive default `class=blocked` instead of `standard`.** Safer in this race window but would block ALL PRs in the race window — defeats the point of auto-merge. Also wrong by default; classifier eventually catches up.
2. **Add a polling loop at the top of `pr-auto-merge.yml` waiting for `risk:*` label.** Works but adds 5-30 sec of latency on every PR. Brittle vs. classifier timeouts.
3. **Apply a `risk:pending` label proactively on `pull_request:opened` BEFORE classifier runs, treat as `blocked` until replaced.** Requires a new tiny workflow + label discipline. Equivalent safety to option (1) above.
4. **Require a PAT for the classifier's `gh pr edit --add-label` call so the `labeled` event triggers re-runs.** Workable but every caller has to configure the PAT — friction. And it puts a PAT into a workflow that doesn't need otherwise.

The `workflow_run` chain is the cleanest because it removes the race instead of mitigating it.

## Verification plan post-fix

1. Synthetic test: open a PR in a test repo (or `topcoder1/ci-workflows` itself) that touches `.github/workflows/**` with `Auto-Merge-Risk: standard` trailer + claude/* branch. Confirm auto-merge is NEVER enabled (not enabled-then-revoked, just never enabled).
2. Negative test: same PR with `Auto-Merge-Risk: high` or no trailer — confirm auto-merge stays off (regression-check the existing happy path).
3. Latency: measure end-to-end time from PR open to auto-merge enabled on a typical `risk:standard` PR — should not regress more than ~1× the classifier runtime (since we're serializing where we used to parallelize).

## Suggested severity

**P1.** Not a bug-in-userspace — a bug in the safety gate. Privilege-escalation paths are unprotected for ~15 sec on every PR. Most PRs aren't claude-authored OR aren't `risk:blocked`, but the union is non-zero and the impact is total (any workflow file change merges silently).

## Background

Filed by Claude after investigating the auto-merge of `wxa-jake-ai#539`. Lesson candidate: this is the inverse of the 2026-05-04 lesson (`wxa_vpn#250`) about over-classifying PRs — that lesson said "trust the bot's regex." This case shows the bot's classifier has a startup-race where the regex is correct but isn't read in time. Lesson refinement: **trust the bot's regex AFTER the classifier has finished running.** The fix above makes that distinction architectural instead of probabilistic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pr-auto-merge.yml race: enables auto-merge with class=standard default before pr-classify.yml labels risk:blocked #76

Summary

Reproduction (from whois-api-llc/wxa-jake-ai#539)

Root cause

Real-world impact

Recommended fix

Alternative fixes considered + rejected

Verification plan post-fix

Suggested severity

Background

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pr-auto-merge.yml race: enables auto-merge with class=standard default before pr-classify.yml labels risk:blocked #76

Description

Summary

Reproduction (from whois-api-llc/wxa-jake-ai#539)

Root cause

Real-world impact

Recommended fix

Alternative fixes considered + rejected

Verification plan post-fix

Suggested severity

Background

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions