Skip to content

ci: add convergence loop with address-review + senior-advisor stages#2540

Merged
bpamiri merged 1 commit intodevelopfrom
peter/bot-convergence-loop
May 10, 2026
Merged

ci: add convergence loop with address-review + senior-advisor stages#2540
bpamiri merged 1 commit intodevelopfrom
peter/bot-convergence-loop

Conversation

@bpamiri
Copy link
Copy Markdown
Collaborator

@bpamiri bpamiri commented May 10, 2026

Summary

Closes the gap between the analytical reviewers (A and B) and actual code change with three architectural additions: an A↔B convergence loop, a new Stage 8: Address Review, and a new Stage 9: Senior Advisor for deadlock resolution.

The new flow

PR opens (or new SHA pushed)
    ↓
Reviewer A reviews → Reviewer B critiques A
    ↓
B decides: aligned with A?
    ├─ Aligned + verdict approve  → converged-approve   → loop ends
    ├─ Aligned + verdict changes  → converged-changes   → Stage 8 ↓
    └─ Not aligned → only round marker → A responds → B re-critiques
                                       (loop, cap 10)
                                              ↓
                                       At cap → :terminal marker
                                              ↓
                                       Stage 9: Senior Advisor (Opus) ↓
                                              ↓
                                       advisor's verdict → converged-* marker
                                       converged-approve → loop ends
                                       converged-changes → Stage 8 ↓
    ↓
Stage 8: Address Review reads consensus, applies changes, pushes new SHA
    ↓
New SHA → fresh Reviewer A → convergence loop restarts on new state
    ↓
Eventually: converged-approve OR outer-cap of 5 implementation rounds

What's new

1. Reviewer A↔B convergence loop

  • .github/workflows/bot-review-a.yml modified — adds an issue_comment trigger and dispatching logic. Pull-request events fire /review-pr (existing); comment events from wheels-bot[bot] matching a non-converged wheels-bot:review-b: marker fire the new /respond-to-critique prompt.
  • .claude/commands/respond-to-critique.md NEW — Reviewer A's response-mode prompt. Engages with B's critique by conceding or defending each finding (with evidence).
  • .claude/commands/review-the-review.md modified — adds the convergence verdict logic per round. Round cap raised from 3 to 10 per SHA.

2. Stage 8: Address Review (Opus, code-modifying)

  • .github/workflows/bot-address-review.yml NEW — fires on wheels-bot:converged-changes: markers. Mirrors propose-fix's shape (Opus model, 60-min timeout, 1000-turn cap, broad allowlist with test runner). Checks out the PR's existing branch, commits and pushes to it.
  • .claude/commands/address-review.md NEW — reads consensus from A↔B exchange, applies changes. Auto-downgrade safety on sensitive areas (security, migrations, deploy, DI). Branch-aware scope (fix/bot-* vs docs/bot-*). Outer-loop cap: 5 implementations per PR.

3. Stage 9: Senior Advisor (Opus, deadlock resolver) — the cherry on top

  • .github/workflows/bot-advisor.yml NEW — fires on B's :terminal marker (cap reached without convergence). Opus + 30-min timeout + read-only allowlist (no writes — just analyzes and posts a verdict).
  • .claude/commands/advise-on-deadlock.md NEW — reads the full A↔B exchange + the disputed code + canonical references; rules on each disputed point with concrete evidence; issues a tie-breaking verdict (approve or changes) that emits one of the existing convergence markers.

The advisor is the only stage running Opus on a non-coding task — its job is pure reasoning depth to break A↔B deadlocks. Without it, deadlocked PRs sit indefinitely waiting on humans. With it, even disagreement-prone reviews resolve automatically.

4. docs/contributing/wheels-bot.md updated

  • Eight stages → nine stages
  • Stage 6 (Reviewer A) — describes both initial-review and response-mode triggers
  • Stage 7 (Reviewer B) — describes the convergence arbiter logic + 10-round cap + advisor handoff at terminal
  • Stage 8 (Address Review) NEW
  • Stage 9 (Senior Advisor) NEW
  • Markers table — gains 6 new entries (review-a-response, converged-approve, converged-changes, address-review, address-held, advisor)
  • Day-to-day Phase 4 paragraph — updated for all new auto-fire stages

Model selection rationale

Stage Model Why
Reviewer A — initial Sonnet Pure analytical
Reviewer A — respond Sonnet Still analytical
Reviewer B — critique + arbitrate Sonnet Convergence judgment
Address Review Opus Modifies real code
Senior Advisor Opus Adjudicates code disputes; reasoning depth required

Per your call: all coding-adjacent stages use Opus. Sonnet for analysis, Opus for action and deep judgment.

Cost expectations per PR

Scenario Cost
Smooth path (1 round of A+B, converged-approve) ~$1.20
Typical (1-2 inner rounds, 1 implementation) ~$10–15
Worst case before advisor (10 rounds × 5 implementations, no deadlock) ~$90
Worst case with advisor (deadlock at round 10, advisor breaks it, 5 implementations) ~$95

Real safety net is WHEELS_BOT_ENABLED=false (kill switch) and the workflow timeout-minutes (5–60 min). The bounded round caps mean a single PR can't bankrupt the budget.

Test plan

After merge, the next bot PR exercises the new flow naturally. Watch for:

  • Initial Reviewer A review fires on PR open (unchanged behavior)
  • Reviewer B's critique includes the ### Convergence section + emits the right convergence marker
  • If converged-changes fires, bot-address-review.yml runs, applies changes, pushes a commit
  • New commit triggers fresh Reviewer A on pull_request: synchronize
  • If A's response is needed, A submits a state-COMMENT review with the respond-to-critique format
  • B re-critiques A's response, emits new convergence verdict
  • If 10 rounds pass without convergence: bot-advisor.yml fires, posts a verdict, emits converged-*
  • Loop terminates on converged-approve OR outer-loop cap hit (5 implementations)
  • bot-tdd-gate.yml continues to skip on docs/bot-* branches
  • All five new auto-fire conditions still respect the kill switch

Coordinated repo-level changes

  • The wheels-bot push scope ruleset (16174270) currently allows pushes to bot/**, fix/bot-*/**, and docs/bot-*/**. No new scope needed — address-review pushes to the same branches propose-fix and write-docs already use.

Net effect

Before tonight: bot triages issues. Reviewers analyze PRs. Human translates feedback into edits.

After this PR: bot triages → fixes → reviews → iterates to consensus → applies consensus → reviews again → loops to converge. Even deadlocks resolve via Opus tie-breaker. The bot is now end-to-end iterative, with bounded effort and human-as-final-merge-gate.

🤖 Generated with Claude Code

@github-actions github-actions Bot added the docs label May 10, 2026
Closes the gap between analytical reviewers (A and B) and actual code
change. Before this PR, A's review and B's critique produced
analytical context for the human to act on — but the human had to
translate that context into code edits manually. The loop was open.

Three architectural additions close it:

## 1. Reviewer A↔B convergence loop

Reviewer A and Reviewer B now back-and-forth until they align on a
recommendation:

- bot-review-a.yml gains a second trigger (issue_comment) and a new
  /respond-to-critique prompt for response mode. The response is a
  review (state=COMMENT), which triggers B's next round.
- review-the-review.md (B's prompt) gains the convergence verdict
  per round:
  - converged-approve — aligned on no changes needed
  - converged-changes — aligned on changes needed (triggers Stage 8)
  - (no marker) — not aligned, A responds in next round
- Inner loop cap: 10 rounds per SHA (was 3 for B). Bun-published
  demos show A↔B going to 100 rounds; 10 is conservative and
  respects "alignment is the goal."

## 2. Stage 8: Address Review (Opus, code-modifying)

New workflow bot-address-review.yml + new prompt /address-review.

Fires on wheels-bot:converged-changes:<pr>:<sha> markers. Reads the
consensus, applies changes to the PR's existing branch, pushes new
commits. The new SHA triggers fresh Reviewer A → loop restarts.

Coding stage: Opus model, broad allowlist with test runner, mirrors
propose-fix's setup. Branch-aware scope: fix/bot-* allows code+tests,
docs/bot-* allows doc paths only.

Outer-loop cap: 5 implementations per PR. Combined with B's 10-round
inner cap, max bot effort is bounded at 5 × 10 = 50 review rounds
before human intervention.

## 3. Stage 9: Senior Advisor (Opus, deadlock resolver)

New workflow bot-advisor.yml + new prompt /advise-on-deadlock.

Fires on B's :terminal marker (cap reached without convergence).
Reads the full A↔B exchange, the disputed code, and canonical
references; rules on each disputed point with concrete evidence;
issues a tie-breaking verdict (approve or changes) that drops back
into the existing convergence flow.

This is the only stage running Opus on a non-coding task. Cost is
justified because the advisor's verdict overrides the deadlock — it
must be right.

Without the advisor, deadlocked PRs would just sit waiting for human
intervention. With it, even disagreement-prone reviews resolve
automatically. The advisor sits as a sibling to address-review:
both fire from B's output but on different markers (converged-
changes vs :terminal).

## What still doesn't change

- WHEELS_BOT_ENABLED=false kill switch — still the active-incident
  safety net
- Required approving review on develop's ruleset — still the merge
  gate
- All bot PRs remain --draft; humans mark ready when ready
- Author-identity checks on every auto-fire if: block — still
  load-bearing

## Model selection

- Reviewer A (initial + respond): Sonnet (analytical)
- Reviewer B (critique + arbitrate): Sonnet (analytical)
- Address Review: Opus (modifies code)
- Senior Advisor: Opus (adjudicates code-related disputes)

Per the user's call: all coding-adjacent stages use Opus.

## Why convergence-then-implement (not implement-on-every-critique)

The earlier sketch fired address-review on every Reviewer B comment,
applying findings A flagged but B might dispute. Wasted work at best,
regressive at worst. This design waits for genuine alignment (or the
advisor's tie-breaking verdict) before any code change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bpamiri bpamiri force-pushed the peter/bot-convergence-loop branch from ff2ea95 to 9c4dae6 Compare May 10, 2026 06:06
@bpamiri bpamiri changed the title ci: add convergence loop with address-review stage ci: add convergence loop with address-review + senior-advisor stages May 10, 2026
Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot -- Reviewer A

Full review submitted. See summary below.

Correctness

[C-1] bot-review-b.yml NOT updated -- skip check blocks B rounds 2-10 (CRITICAL)

The existing skip-check in bot-review-b.yml uses marker-pattern wheels-bot:review-b:PR:SHA: (with trailing colon). After B posts round 1, that marker exists. When As COMMENT review triggers B for round 2, the skip check fires and B exits. Rounds 2-10 never happen. The convergence loop breaks after round 1.

Fix: Update bot-review-b.yml to pass marker-pattern empty string, relying on the concurrency group (keyed on review.id) plus prompt-level round counting for dedup. Alternatively embed review.id in the marker name.

[C-2] Idempotency condition in review-the-review.md step 1 is always false (HIGH)

The condition comment count on current SHA equals the review-id compares a small integer (1-10) to a large GitHub ID (e.g. 2083714294). Can never be true. Dedup is inert.

Fix: Check for the literal review-id value in existing B comment bodies.

Conventions

[CV-1] address-held marker is SHA-agnostic -- permanently blocks PR after one hold (MODERATE)

marker-pattern: wheels-bot:address-held:PR has no SHA. Once fired on SHA1, all future address-review runs on this PR are blocked even after new commits with different sensitive areas.

Fix: Add SHA to the pattern. The prompt already emits address-held:PR:SHA.

[CV-2] skip-claude label not respected in response mode (MODERATE)

skip-check step has if: github.event_name == pull_request. For issue_comment events (A response mode), skip-check never runs. Adding skip-claude mid-loop stops initial reviews but not the A-B back-and-forth.

Fix: Invoke skip-check unconditionally; pass empty marker-pattern for response mode.

Minor

[M-1] git log @{u}..HEAD 2>/dev/null may silently drop a push (LOW)

If @{u} fails, 2>/dev/null makes the output empty, -z is true, exit 0 without pushing. Use git push --no-tags origin HEAD unconditionally instead.

Docs / Commits / Security

docs/contributing/wheels-bot.md update is accurate. Commit ff2ea95 passes commitlint. Bot-identity check present on all new triggers. No secrets in diff.

Required before merge:

C-1 Critical: bot-review-b.yml -- skip check blocks rounds 2-10
C-2 High: review-the-review.md -- dedup condition always false
CV-1 Moderate: bot-address-review.yml -- address-held blocks all future SHA runs
CV-2 Moderate: bot-review-a.yml -- skip-claude label ignored in response mode
M-1 Low: bot-address-review.yml -- @{u} may silently drop push

@bpamiri bpamiri merged commit d7c3a66 into develop May 10, 2026
5 checks passed
@bpamiri bpamiri deleted the peter/bot-convergence-loop branch May 10, 2026 06:13
@wheels-bot
Copy link
Copy Markdown
Contributor

wheels-bot Bot commented May 10, 2026

Wheels Bot — Reviewer B (round 1)

Reviewer A's submission is a bare placeholder: body reads `## Wheels Bot -- Reviewer A placeholder` with a marker and nothing else. State is `CHANGES_REQUESTED` but no findings, no line comments, and no analysis were produced. This is a failed/aborted run, not a review. There is nothing substantive to critique — the rest of this comment focuses on what A missed and on two correctness issues in the diff itself.

Sycophancy

None detected — there is no content to be sycophantic in.

False positives

None detected — no claims were made.

Missed issues

1. A's own review marker has the wrong SHA.
The review was submitted on commit `9c4dae6e63a6fd8845248a92ef2902b4cac230e0` (GitHub's `commit_id` field confirms this), but the marker in the body reads:
```

```
Downstream idempotency checks match on the SHA inside the marker. Any check asking "did Reviewer A already run on the current SHA?" will miss this review entirely and fire a duplicate run.

2. Broken idempotency condition in `.claude/commands/review-the-review.md` (new text, diff lines ~487-490).
The replacement idempotency check reads:

"If the most recent matching comment has the current head SHA AND the comment count on the current SHA already equals the review-id you're processing (a precise dedup), exit silently."

The review-id passed as an argument is a GitHub review ID — a large integer like `4258986177`. No PR will ever accumulate that many Reviewer B comments. The second condition is permanently false, which makes the compound `AND` permanently false. The result is that the idempotency guard never fires on this SHA: B will re-run every time the workflow triggers without checking whether it already commented. This is a regression from the old single-condition check.

3. Concurrency regression in `.github/workflows/bot-review-a.yml` (diff lines ~884-888).
The old concurrency group was:
```
group: wheels-bot-review-a-{pr}-{sha}
cancel-in-progress: true
```
The new group is:
```
group: wheels-bot-review-a-{pr-or-issue}
cancel-in-progress: false
```
Removing the SHA from the key collapses all runs for a given PR into one slot. With `cancel-in-progress: false`, a new push while an in-flight review is running will queue behind it rather than cancel it. If the in-flight run is slow (initial review on a large diff), the response-mode run will block; when it finally starts, it operates on stale context. More concretely: rapid pushes during the convergence loop can cause a response-mode run to finish after a subsequent initial-review run, posting an out-of-order round marker that confuses the round counter.

Verdict alignment

`CHANGES_REQUESTED` with zero findings is self-contradictory. A PR cannot be blocked for unspecified reasons.

Convergence

Not aligned — there is no reviewable content from A to align with. Running the respond-to-critique loop against an empty placeholder will produce nothing useful. A human should re-trigger Reviewer A via `workflow_dispatch` on `bot-review-a.yml` for a fresh run on this SHA, and investigate why the original run produced only the placeholder (likely a turn-cap or timeout in the Claude Code action).

Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer A

TL;DR: This PR wires three architectural additions (A↔B convergence loop, Stage 8 Address Review, Stage 9 Senior Advisor) into the bot pipeline. The design is sound and the docs/commit are clean. However, the same five correctness issues flagged in the previous review at SHA ff2ea95cd1 are still present in the squashed commit — most critically, the convergence loop is broken at round 2 because bot-review-b.yml's skip check was not updated, and the dedup condition in review-the-review.md is structurally inert. Requesting changes on the unresolved correctness findings; the PR is already merged, so these are filed as immediate follow-up.


Correctness

[C-1] bot-review-b.yml skip check blocks convergence rounds 2-10 — loop breaks after round 1 (CRITICAL, unresolved from previous review)

bot-review-b.yml was not modified in this PR. Its skip check still reads:

# .github/workflows/bot-review-b.yml line 50
marker-pattern: 'wheels-bot:review-b:${{ github.event.pull_request.number }}:${{ github.event.pull_request.head.sha }}:'

After round 1, B posts a comment containing <!-- wheels-bot:review-b:<pr>:<sha>:1 -->. When A's COMMENT-state response triggers pull_request_review: submitted → B fires for round 2. The skip check greps comments for the pattern wheels-bot:review-b:<pr>:<sha>: — which is a prefix of the round-1 marker — finds a match, sets skip=true, and B exits. Round 2 never runs. The entire convergence loop collapses after the first B critique.

Fix: change the pattern to something that only fires when a genuine "all rounds done for this SHA" state is reached — e.g. pass an empty marker-pattern and rely on the prompt-level idempotency, or key the pattern on the specific round number being dispatched.


[C-2] review-the-review.md dedup condition is always false — idempotency is inert (HIGH, unresolved from previous review)

.claude/commands/review-the-review.md, step 1 (line 44–45 of the new file):

- If the most recent matching comment has the **current head SHA**
  **AND** the comment count on the current SHA already equals the
  review-id you're processing (a precise dedup), exit silently.

review-id (the ${{ github.event.review.id }} argument passed on line 65 of bot-review-b.yml) is a large GitHub integer ID, e.g. 2083714294. The comment count on the current SHA is a small integer (1–10). These values can never be equal. The dedup condition is always false and provides no protection against double-posting.

Fix: the idempotency check should look for the literal marker string wheels-bot:review-b:<pr>:<sha>:<round> in existing comments, not compare a count to a review-id.


Conventions

[CV-1] address-held skip pattern is SHA-agnostic — permanently blocks PR after one hold (MODERATE, unresolved from previous review)

.github/workflows/bot-address-review.yml line 89:

marker-pattern: 'wheels-bot:address-held:${{ env.PR_NUMBER }}'

The prompt emits <!-- wheels-bot:address-held:<pr>:<sha> --> (SHA included). But the skip check searches for wheels-bot:address-held:<pr> (no SHA). Once a hold fires on any SHA, every future address-review run on this PR exits — even after the author commits new code that moves the sensitive area out of scope.

Fix: include the SHA in the pattern: 'wheels-bot:address-held:${{ env.PR_NUMBER }}:${{ steps.pr.outputs.sha }}'.


[CV-2] [skip-claude] label not respected in A response mode (MODERATE, unresolved from previous review)

.github/workflows/bot-review-a.yml line 67 (new):

- name: Skip check (initial review only — response mode handles its own idempotency)
  if: github.event_name == 'pull_request'
  id: gate

The skip-check step (which checks for [skip-claude] label and [wip] title) runs only on pull_request events. For issue_comment events (A response mode), the step is skipped entirely, leaving steps.gate.outputs.skip unset. The subsequent "Run Reviewer A" gate reads:

if: github.event_name == 'issue_comment' || steps.gate.outputs.skip == 'false'

For issue_comment events, the first clause is always true — Reviewer A fires regardless of any [skip-claude] label. Adding the label mid-loop stops the initial review trigger but not the convergence back-and-forth.

Fix: remove the if: github.event_name == 'pull_request' guard from the skip-check step; let it run unconditionally. Update the "Run Reviewer A" condition to steps.gate.outputs.skip == 'false' for both event types.


Minor

[M-1] git log @{u}..HEAD 2>/dev/null silently drops push on upstream failure (LOW, unresolved from previous review)

.github/workflows/bot-address-review.yml line 128:

if [ -z "$(git log @{u}..HEAD --oneline 2>/dev/null)" ]; then
  echo "No new commits — nothing to push."
  exit 0
fi

If @{u} fails (no tracking branch configured — possible since the checkout used a branch ref but git push origin HEAD hasn't run yet), 2>/dev/null swallows the error, the subshell outputs nothing, -z is true, and the workflow exits 0 without pushing. Bot commits land locally and are silently lost.

Fix: use git push --no-tags origin HEAD unconditionally after confirming the Claude run produced commits, or check for new commits via git diff --stat HEAD^ HEAD (no upstream needed).


[N-1] bot-advisor.yml uses python3 for JSON parsing inconsistently with the rest of the pipeline (LOW, new)

.github/workflows/bot-advisor.yml lines 818–820:

ref=$(echo "$info" | python3 -c "import json,sys; print(json.load(sys.stdin)['headRefName'])")
sha=$(echo "$info" | python3 -c "import json,sys; print(json.load(sys.stdin)['headRefOid'])")

Every other step in this and sibling workflows uses gh pr view --json … -q '.field'. bot-address-review.yml (the sibling workflow added in the same PR) already uses that form. python3 is available on ubuntu-latest so this doesn't break — it's an inconsistency that makes the advisor step more fragile and harder to read.

Fix: replace with two gh pr view calls:

ref=$(gh pr view "$PR_NUMBER" --repo wheels-dev/wheels --json headRefName -q '.headRefName')
sha=$(gh pr view "$PR_NUMBER" --repo wheels-dev/wheels --json headRefOid  -q '.headRefOid')

Commits

ci: add convergence loop with address-review + senior-advisor stages — type ci is in the allowlist, no scope (valid), subject ≤ 100 chars, not ALL-CAPS. Commit body is thorough. Passes commitlint.


Required before the follow-up PR:

ID Severity Finding
C-1 Critical bot-review-b.yml — skip check blocks rounds 2-10, breaking the loop
C-2 High review-the-review.md — dedup condition always false, idempotency inert
CV-1 Moderate bot-address-review.ymladdress-held pattern SHA-agnostic, permanent block
CV-2 Moderate bot-review-a.yml[skip-claude] ignored in response mode
M-1 Low bot-address-review.yml — silent push failure via 2>/dev/null
N-1 Low bot-advisor.ymlpython3 JSON parsing instead of gh --jq

Copy link
Copy Markdown
Contributor

@wheels-bot wheels-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer A response (round 1)

TL;DR: B's round 1 was written against the placeholder/aborted run, not against the full review I submitted at 06:19Z — three minutes after B's comment. Even so, B's three findings are all worth engaging with directly: I concede the wrong-SHA marker (it broke downstream idempotency and the full review corrects it), I confirm the idempotency-condition finding (I flagged the same issue as C-2 independently), and I partially concede the concurrency finding (real design concern, but B's stated failure mode is imprecise). My updated position adds the concurrency issue as a new finding and maintains CHANGES_REQUESTED as a follow-up tracker.


Conceded points

  • Wrong SHA in A's marker (B finding 1). B is correct. The placeholder review was submitted on commit 9c4dae6e63a6fd8845248a92ef2902b4cac230e0 but its marker reads <!-- wheels-bot:review-a:2540:ff2ea95cd1d7cfa96abaf135706df101bc4817b7 -->. Any downstream skip-check matching against the current SHA in the marker body would miss this review entirely and fire a duplicate initial-review run. This is now corrected in my full review (marker: <!-- wheels-bot:review-a:2540:9c4dae6e63a6fd8845248a92ef2902b4cac230e0 -->), but the wrong-SHA placeholder persists in the review history and could confuse tools that scan all reviews.

  • Broken idempotency condition in review-the-review.md (B finding 2, confirmed as C-2 in my full review). B is correct. Step 1 of the new .claude/commands/review-the-review.md text conditions exit-silently on comment_count_on_sha == review_id. review_id is a large GitHub integer (e.g. 2083714294); the comment count is 1–10. These can never be equal. The dedup guard is permanently inert. I flagged this independently in my full review as C-2 (High). Full alignment with B here.

  • Concurrency design concern (B finding 3, partial). B correctly identifies that dropping the SHA from the concurrency key groups both initial-review runs and response-mode runs for a given PR into a single slot. There is a real design risk here, though B's stated failure mode is imprecise (see defended points). The concern was not in my full review and is a legitimate miss; I am adding it as a finding below.


Defended points

  • Ordering claim in B's concurrency finding is imprecise. B states: "rapid pushes during the convergence loop can cause a response-mode run to finish after a subsequent initial-review run, posting an out-of-order round marker." Under GitHub Actions' actual cancel-in-progress: false semantics, at most one run queues behind the in-progress run. If a third run arrives while the second is queued, the second (queued) run is silently canceled and the third takes its place. The runs therefore never execute out of order — the worse outcome is that the response-mode run (second) is dropped entirely when a new push (third) comes in. The real failure mode is a silent cancellation, not an ordering inversion.

  • Verdict alignment: B's CHANGES_REQUESTED/zero-findings critique applied to the placeholder, not to my full review. My full review (submitted at 06:19:36Z, three minutes after B's round 1) lists six findings (C-1 through N-1) with explicit severity ratings and a summary table, and uses CHANGES_REQUESTED because the correctness issues are unresolved even in the merged squash commit. That pairing is internally consistent.


New finding added from B's input

[N-2] Concurrency group drops SHA — response-mode runs silently canceled by rapid pushes (LOW-MODERATE)

.github/workflows/bot-review-a.yml:

# Old
group: wheels-bot-review-a-${{ github.event.pull_request.number }}-${{ github.event.pull_request.head.sha }}
cancel-in-progress: true

# New
group: wheels-bot-review-a-${{ github.event.pull_request.number || github.event.issue.number }}
cancel-in-progress: false

With cancel-in-progress: false, GitHub queues at most one run behind the in-progress run. A third run silently cancels the waiting second. Scenario: (1) initial review on SHA1 is in progress, (2) B's round-1 comment triggers a response-mode run → queues, (3) author pushes a fixup commit → triggers initial review on SHA2, which silently cancels the queued response-mode run. The convergence loop stalls: A never responds to round 1, B fires round 2 targeting A's initial review instead of A's round-1 response.

The SHA was dropped because github.event.pull_request.head.sha is unavailable on issue_comment events. A possible fix: for issue_comment events, resolve the current head SHA via gh pr view early in the job and include it in a run-name or output, but concurrency groups cannot reference step outputs — they must use event context. An alternative is to keep the SHA-keyed group for the PR-event path and accept that response-mode runs (issue_comment path) have no SHA in the group key, tolerating the rare silent-cancellation for that path only.


Updated verdict

Still request-changes — the PR is already merged, so these are tracked as required follow-up items. The updated finding list is: C-1 (Critical), C-2 (High), CV-1 (Moderate), CV-2 (Moderate), M-1 (Low), N-1 (Low), N-2 (Low-Moderate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant