Skip to content

QVAC-18612 infra: gate every secret-bearing workflow with label-gate#1997

Merged
Proletter merged 13 commits into
mainfrom
feat/QVAC-18612-label-gate-fanout
May 13, 2026
Merged

QVAC-18612 infra: gate every secret-bearing workflow with label-gate#1997
Proletter merged 13 commits into
mainfrom
feat/QVAC-18612-label-gate-fanout

Conversation

@Proletter
Copy link
Copy Markdown
Contributor

@Proletter Proletter commented May 12, 2026

What problem does this PR solve?

How does it solve it?

  • Adds a label-gate job at the top of every secret-bearing workflow. The job runs the local ./.github/actions/label-gate composite, which fail-closes on any unrecognised event and on any PR-context event whose verified label was not applied by a trusted actor (and strips the label on misuse, per QVAC-18608 fix(label-gate): strip label when non-trusted user applies it #1978). The action itself defaults to the three trusted teams (qvac-internal-dev, qvac-internal-merge, qvac-internal-release) — the per-workflow gate jobs do not override teams:/users:, so the trust policy lives in exactly one place (.github/actions/label-gate/action.yml).
  • Patches every secret-bearing job in those workflows to depend on label-gate and AND its if: with needs.label-gate.outputs.authorised == 'true'. Existing authorize-pr checks are preserved alongside the new gate (belt-and-suspenders during the staged migration the task spec calls out — authorize-pr removal lands in a follow-up).
  • Mechanics: the migration was generated by a throwaway script (not committed) using ruamel.yaml purely to identify line ranges and surrounding YAML semantics; all edits are line-based to preserve comments, ordering, quoting, and indentation exactly. Detection heuristic for "secret-bearing" job: explicit environment:, OR any ${{ secrets.<X> }} other than GITHUB_TOKEN, OR secrets: inherit on a workflow_call, OR a non-empty per-call secrets: mapping. Jobs hard-disabled with if: false are skipped (gating is a no-op). The two approval-machinery workflows (see below) are skipped via an explicit exemption.
  • 108 workflow files modified by the fan-out (the two approval-machinery workflows below are intentionally untouched; total target was 110). Every secret-bearing job in the repo now passes through label-gate before any secret-touching step runs.

Approval-machinery exemption

Two workflows are intentionally NOT gated and remain byte-identical to main:

  • .github/workflows/approval-worker.yml
  • .github/workflows/approval-check-worker.yml

These are the tier-based approval bot itself — triggered by issue_comment / pull_request_review to compute the Tier-based Approval Check status check. Gating them with label-gate produces a deadlock incident:

  • The PR cannot get the required Tier-based Approval Check status until it has the verified label.
  • In normal flow verified is applied after human review.
  • So verified would never be requested (no signal to reviewers that the PR is ready), and even if it were applied, the approval bot would only re-evaluate on the next review/comment event.

These workflows are part of the gate machinery itself and must run ungated, the same way authorize-pr runs ungated.

if: false jobs

Jobs with a hard-disabled if: false (e.g. the notify job in docs-deploy-notify.yml) are deliberately not rewritten — the gate guard would have clobbered the inline explanation comment for zero behavioural change. If a workflow's only secret-bearing job is hard-disabled, the file is left untouched (no orphan label-gate job inserted).

Net workflow scope

  • approval-worker.yml + approval-check-worker.yml → byte-identical to main (deadlock prevention).
  • docs-post-merge-sync.yml + docs-release-pipeline.yml → gated (consume DOCS_SYNC_PAT / AI_AUGMENT_API_KEY).
  • docs-deploy-notify.yml → untouched (only secret-bearing job is if: false).
  • Total gated workflows: 108 (out of an originally-scoped 110, minus the 2 approval workflows).

yamlfmt cleanup (commit cf1311f1, narrowed by 4a8700ff and 15949925)

The label-gate fan-out touches .github/workflows/*onnx*.yml, which matches on-pr-onnx.yml's paths: filter. That workflow runs yamlfmt v0.17.0 against the entire repo at default workdir, so any pre-existing workflow yamlfmt drift would block this PR from merging. The cleanup was therefore bundled in for the workflows directory only.

  • cf1311f1 — yamlfmt v0.17.0 -formatter retain_line_breaks_single=true across .github/workflows/ and .github/actions/.
  • 4a8700ff — reverts the composite-action portion of cf1311f1. The retain_line_breaks_single=true formatter inserts #magic___^_^___line markers throughout collapsed folded scalars and folds multi-line description: blocks into long single lines — review noise unrelated to the gate change.
  • 15949925 — reverts a 4-file packages/transcription-whispercpp/ sweep that had been included for the same yamlfmt-CI reason. Touching package paths triggers expensive build pipelines wholly unrelated to this PR.

Composite-action and packages drift remains pre-existing on main and will be addressed in a dedicated yamlfmt cleanup PR. Trade-off: the on-pr-onnx yamlfmt sub-check may flag the unchanged drift in those areas — identical to pre-existing CI noise on main, not a regression introduced by this PR.

Net diff scope: 123 files, all under .github/workflows/. Zero composite actions, zero packages, zero scripts.

How was it tested?

  • actionlint clean against the full .github/workflows/ tree post-migration. Remaining warnings (shellcheck reported, runner-label, composite-action type key, workflow-call signature mismatches) all exist on origin/main — verified by diffing the warning sets. Zero new lint warnings introduced by the migration or the yamlfmt cleanup.
  • Idempotency: re-running the (uncommitted) migration script on the migrated tree exits 0 with zero MIGRATED lines.
  • Spot-checked every diff pattern manually: simple Pattern A (trigger-docs-translation-nmtcpp.yml), Pattern B with peer authorize-pr (publish-sdk.yml, on-pr-bci-whispercpp.yml), reusable workflow_call with secrets: inherit (test-android-sdk.yml, cpp-lint.yaml), folded if: >- blocks (trigger-reusable-lib-cli.yml, benchmark-ocr-onnx.yml), if: always() cleanup paths (test-android-sdk.yml's cleanup-device-farm), and the inline-step authorize-pr outlier (on-pr-test-sdk.yml — the script gates the downstream run-tests workflow_call correctly without touching the inline step).
  • if: ${{ ... }}-wrapping bug caught and fixed during validation. GHA evaluates if: <bare> && ${{ <expr> }} as a string literal (always-true). The script now unwraps the outer ${{ }} from the original expression before composing, so the resulting if: is fully bare.
  • Live canary PR QVAC-18612 infra: repurpose vulkaninfo as label-gate safety canary #1971 (vulkaninfo.yml) currently authorises and skips correctly under the rebased label-gate behaviour. Full live validation against THIS PR will use the 5-step matrix (no-label deny / label authorise / push-while-labeled / unlabel-then-push deny / full actionlint).

Action pinning

  • actions/checkout: each new label-gate job pins to de0fac2e4500dabe0009e67214ff5f5447ce83dd # 6.0.2 — the same SHA every other workflow in this repo already uses. No version drift.

Permissions changes

  • Scope: per-job, only on the new label-gate job in each of the 108 gated workflow files.
  • Before: n/a (the job did not exist; the rest of each workflow's permissions: blocks are unchanged).
  • After:
    permissions:
      contents: read
      pull-requests: write
  • Justification: contents: read for the sparse-checkout of .github/actions/label-gate; pull-requests: write because the gate strips the verified label when a non-trusted actor applies it (per QVAC-18608 fix(label-gate): strip label when non-trusted user applies it #1978). Workflows that already declare a top-level permissions: block keep that block untouched — the per-job grant on label-gate does not propagate to other jobs.

Known pre-existing CI noise (not introduced by this PR)

  • Tier-based Approval Check failing — the bot reports ❌ Tier 1 requirements not met. Need: 1 Team Member (0/1) + 1 TL/Management (0/1). This is the expected state for a PR with zero approvals; it will go green once reviewers approve.
  • 6 CodeQL critical alerts flagged on this PR for .github/actions/run-lint-and-unit-tests/action.yaml (actions/untrusted-checkout/critical, ×4) and .github/workflows/publish-sdk.yml (actions/artifact-poisoning/critical, ×2). All 6 are pre-existing open alerts on main (created 2026-04-20 and 2026-05-01) — verified via gh api repos/tetherto/qvac/code-scanning/alerts. CodeQL re-fires them on this PR because the surrounding lines moved (yamlfmt + label-gate insertion shifted line numbers); the underlying alerts will resolve as code-scanning duplicates against main once a reviewer attests. This PR introduces zero new CodeQL findings.

Proletter and others added 2 commits May 12, 2026 10:40
Throwaway helper that gates every secret-bearing job in a workflow on
the local `label-gate` composite action. Used to generate the diff in
the next commit; committed for reviewer reproducibility (apply HEAD~1,
run the script on the same target list, diff against HEAD).

Detection heuristic ("secret-bearing job"):
  - explicit `environment:`, OR
  - any `${{ secrets.<X> }}` other than `GITHUB_TOKEN`, OR
  - `secrets: inherit` on a `workflow_call`, OR
  - a non-empty per-call `secrets:` mapping.

Implementation notes:
  - ruamel.yaml is used only to identify line ranges and YAML semantics;
    all edits are line-based to preserve comments, quoting, indentation,
    and ordering exactly.
  - Idempotent: re-runs detect the existing `label-gate:` job and skip.
  - Folded `if: >-` scalars are collapsed to a single line; original
    `${{ ... }}`-wrapped expressions are unwrapped before composing
    (GHA evaluates `if: <bare> && ${{ <expr> }}` as an always-true
    string literal).

Co-authored-by: Cursor <cursoragent@cursor.com>
Inserts a `label-gate` job at the top of `jobs:` in every secret-bearing
workflow and updates each downstream secret-bearing job to require
`needs: [..., label-gate]` and `if: needs.label-gate.outputs.authorised
== 'true' && <existing>`.

110 workflow files migrated via the throwaway scripts/migrate_label_gate.py
introduced in the previous commit. Net: 2,809 insertions, 695 deletions.

Coverage: every job in this repo that sets `environment:`, references
`${{ secrets.<X> }}` (other than `GITHUB_TOKEN`), uses `secrets: inherit`
on a `workflow_call`, or maps secrets explicitly into a reusable
workflow now passes through `label-gate` before any secret-touching step
runs.

Pre-existing `authorize-pr` peer jobs (16 workflows) are preserved
alongside the new gate; both must authorise for downstream jobs to run
(belt-and-suspenders during the staged migration). Removal of the
authorize-pr layer lands in a follow-up.

actionlint clean post-migration; zero new warnings introduced.
Idempotent: running the migration script again is a no-op.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread scripts/migrate_label_gate.py Fixed
Proletter and others added 4 commits May 12, 2026 11:16
CodeQL py/redos flagged `(?:\s*\n|\s*#.*\n)*` in the secrets-block
detector as exponentially backtracking on long sequences of blank
lines (the two alternatives both consume `\n`, giving the engine
ambiguous parses).

Replace `\s*` with `[ \t]*` (horizontal whitespace only) so each
iteration of the outer `(?:...)*` group consumes exactly one line and
the alternatives no longer overlap. Linear in input size; semantically
identical for the YAML inputs the script targets.

Verified: rerunning the script on the post-migration tree is still a
no-op (same target set, same diff).

Co-authored-by: Cursor <cursoragent@cursor.com>
…lse jobs

Two correctness fixes to scripts/migrate_label_gate.py and the resulting
workflow set:

1. EXEMPT_WORKFLOWS for `approval-worker.yml` + `approval-check-worker.yml`.
   These workflows compute the tier-based approval status (issue_comment +
   pull_request_review). Gating them with label-gate creates a deadlock:
   - The PR cannot get the required `Tier-based Approval Check` status
     until it has the `verified` label.
   - In the typical flow `verified` is applied AFTER human review, so the
     approval signal would never fire and the PR would be unmergeable.
   These two files are part of the gate machinery itself and must run
   ungated, the same way `authorize-pr` runs ungated.

2. Skip jobs with `if: false`. They are an explicit, permanent disable
   (see `docs-deploy-notify.yml`'s `notify` job). Wrapping the literal
   in our gate guard would clobber the inline explanation comment for
   zero behavioural change. If no other job in the file is gateable,
   the file is now a no-op (no orphan label-gate job inserted).

3. Bonus fix in render_if_with_gate: handle YAML bool `if:` values
   instead of raising ValueError. (Surfaced by docs-deploy-notify.)

Net workflow scope vs the previous fan-out commit:
  - approval-worker.yml + approval-check-worker.yml -> reverted to main
  - docs-post-merge-sync.yml + docs-release-pipeline.yml -> newly gated
    (these consume DOCS_SYNC_PAT / AI_AUGMENT_API_KEY and were missed
    by the previous detection pass)
  - docs-deploy-notify.yml -> intentionally left untouched (only
    secret-bearing job is `if: false`)

Total gated workflows: still 110.

Co-authored-by: Cursor <cursoragent@cursor.com>
Run \`yamlfmt -formatter retain_line_breaks_single=true\` on all workflow
and composite-action YAML files to bring the tree in line with what the
\`.github/actions/yamlfmt\` action enforces in CI (yamlfmt v0.17.0).

The drift was pre-existing on \`main\` -- the formatter wants to collapse
multi-line folded scalars (\`>-\`) for short \`description:\` blocks into
single-line plain scalars, drop two spaces before inline comments, etc.
None of these changes are introduced by the label-gate fan-out; they were
just hidden because most existing PRs only touch a single package and the
yamlfmt check runs scoped to that package's \`workdir\`.

This fan-out PR touches \`.github/workflows/on-pr-onnx.yml\` (matches the
\`*onnx*\` path filter on \`on-pr-onnx.yml\`), which calls yamlfmt with the
default repo-root \`workdir\`, so the drift becomes a hard failure in CI
for #1997. Folding the cleanup in here unblocks the gate-fanout merge.

Scope: - 66 workflows under .github/workflows/
  - 20 composite actions under .github/actions/
  - Pure formatting; no semantic changes.
Co-authored-by: Cursor <cursoragent@cursor.com>
… configs

Same yamlfmt v0.17.0 sweep applied to the 4 benchmark client configs
under packages/transcription-whispercpp/. Pure inline-comment spacing
fixes (\`"  # comment"\` -> \`" # comment"\`).

Bundled in this PR (rather than a separate cleanup PR) because the
on-pr-onnx.yml CI job runs yamlfmt against the whole repo, so it would
otherwise block the label-gate fan-out merge.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
Comment thread .github/actions/run-lint-and-unit-tests/action.yaml Fixed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

Proletter and others added 5 commits May 12, 2026 12:13
Drops the 20 composite-action files from the yamlfmt cleanup in
cf1311f. The yamlfmt v0.17 \`retain_line_breaks_single=true\` formatter
inserts \`#magic___^_^___line\` comment markers throughout collapsed
folded scalars and folds multi-line description blocks into very long
single lines, which is pure review noise on this PR -- none of those
composite actions are touched by the label-gate fan-out itself.

Reviewers should see only:
  - .github/workflows/ -- the actual fan-out (label-gate insertion +
    workflow-level yamlfmt cleanup, since on-pr-onnx CI runs yamlfmt
    against \`.\` and the workflow drift would block merge).
  - scripts/migrate_label_gate.py -- the throwaway migration script.

Composite-action drift remains pre-existing on \`main\` (it has been
there since at least 2026-04-20). It can be addressed in a dedicated
yamlfmt cleanup PR scoped to .github/actions/, where reviewers can
make an informed choice about the magic-line markers without it being
mixed into a security-critical gate change.

Trade-off: on-pr-onnx.yml's repo-wide yamlfmt sanity-check may flag
the unchanged composite-action drift again -- this is identical to
the pre-existing CI noise that already exists on main and is not a
regression introduced by this PR.

Co-authored-by: Cursor <cursoragent@cursor.com>
- \`scripts/migrate_label_gate.py\`: removed. The script was added for
  reviewer reproducibility but is throwaway tooling that does not belong
  in the codebase. The 110 workflow edits stand on their own diff.
- \`.github/workflows/approval-worker.yml\` +
  \`.github/workflows/approval-check-worker.yml\`: re-reverted to
  origin/main. The yamlfmt cleanup commit cf1311f had re-touched both
  (CRLF -> LF on approval-worker.yml; trailing-whitespace strip on 4
  blank lines in approval-check-worker.yml's inline script). Both are
  the gate machinery itself and must not appear in this PR's diff to
  avoid implying any behavioural change.

After this commit the PR diff vs main is exactly:
  - 108 .github/workflows/*.yml -- label-gate fan-out (110 originally
    intended, minus the 2 approval workflows now exempted)
  - workflow-only yamlfmt cleanup overlapping with the above
  - 4 packages/transcription-whispercpp/ benchmark configs
  - 0 composite actions
  - 0 throwaway scripts

Co-authored-by: Cursor <cursoragent@cursor.com>
Reverts the 4 transcription-whispercpp benchmark config files added
in b09c8b9. Touching package source paths triggers the
\`on-pr-transcription-whispercpp.yml\` build pipeline (and any other
workflow that watches \`packages/transcription-whispercpp/**\`),
which is expensive and wholly unrelated to the label-gate fan-out.

Trade-off: the on-pr-onnx yamlfmt sub-check (which scans the whole
repo) may flag the unchanged benchmark-config drift again -- this is
identical to pre-existing CI noise on \`main\` and not a regression
introduced by this PR. The drift will be addressed in a separate
yamlfmt cleanup PR.

After this commit the PR diff is exclusively under .github/workflows/.

Co-authored-by: Cursor <cursoragent@cursor.com>
Pins \`actions/checkout\` on every inserted label-gate job to
\`ref: \${{ github.event.repository.default_branch }}\` so the gate
code is always loaded from the trusted base, never from a
\`pull_request\` merge commit (which contains the PR's tree for
same-repo branch PRs).

Threat model
============

For \`pull_request_target\` events GITHUB_REF resolves to the base ref
(main), and the previous default checkout was already safe. For
\`pull_request\` events triggered by a same-repo branch (the typical
internal-team flow), GITHUB_REF resolves to \`refs/pull/<n>/merge\` --
i.e. PR HEAD merged with base. An internal user with branch-create
rights could push a branch that modifies
\`.github/actions/label-gate/src/gate.mjs\` to always return
\`{ authorised: true }\`, open a PR, and every gated workflow's
secret-bearing jobs would run with full secrets despite no \`verified\`
label being applied.

This is the same class of bypass the Tanstack supply-chain breach
exploited (compromised actor pushes code that runs in a trusted CI
context). Without a pin, label-gate's whole purpose -- requiring
\`qvac-internal-release\` approval before secrets are reachable from
PR contexts -- is bypassable by any user with branch-push permission.

Fix
===

Pin \`ref: \${{ github.event.repository.default_branch }}\` on the
sparse-checkout step. The default-branch indirection (vs hardcoding
\`main\`) keeps the pin correct if the repo's default branch ever
changes.

For \`pull_request_target\` workflows the pin is a no-op (default
already resolves to the base ref). For \`pull_request\` workflows it
is the actual fix. For trusted events (push, workflow_dispatch,
schedule, release, workflow_call) the gate short-circuits before any
checkout content matters anyway.

Scope
=====

110 inserted label-gate jobs across .github/workflows/, all idempotent
+1-line additions of the \`ref:\` line inside the existing \`with:\`
block. No other change. Mechanical sed-class diff.

Co-authored-by: Cursor <cursoragent@cursor.com>
@kinsta
Copy link
Copy Markdown

kinsta Bot commented May 13, 2026

Preview deployments for qvac-docs-staging ⚡️

Status Branch preview Commit preview
🔁 Deploying... N/A N/A

Commit: dbe3c850fe48e30834a16aa3f53df23cae56df2b

Deployment ID: d152d008-44e1-4297-bab2-c591352f12a5

Static site name: qvac-docs-staging-fazwv

@Proletter
Copy link
Copy Markdown
Contributor Author

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants