Skip to content

fix: batch encryption-key rotation into one git-sync job#9355

Merged
rubenfiszel merged 11 commits into
mainfrom
ruben/win-1994-fix-httpsgithubcomwindmill-labswindmillissues9344
May 29, 2026
Merged

fix: batch encryption-key rotation into one git-sync job#9355
rubenfiszel merged 11 commits into
mainfrom
ruben/win-1994-fix-httpsgithubcomwindmill-labswindmillissues9344

Conversation

@rubenfiszel
Copy link
Copy Markdown
Contributor

@rubenfiszel rubenfiszel commented May 27, 2026

Summary

Workspace encryption key rotation re-encrypts every secret variable
inside one transaction, but the post-commit git-sync dispatch only fired
a DeployedObject::Key event for encryption_key.yaml. Repos with
Secrets sync enabled were left with the old ciphertexts on disk until
each variable was individually deployed again.

This PR:

  • Adds handle_deployment_metadata_batch to windmill-git-sync (EE
    implementation in the companion PR, OSS no-op shim). It looks up the
    repo settings once, runs each DeployedObject through the existing
    path/type filters, and pushes one git-sync job per
    (repo, debounce_key) bucket carrying every item in the items
    array.
    • Workspace-wide mode: one job for the whole batch.
    • Promotion + group_by_folder: one job per folder.
    • Promotion + per-item: one job per item (same as today, but no
      reliance on the debouncer's 5–15s merge window).
  • Sets skip_secret once per repo whenever the batch contains at least
    one Variable, so the hub script knows whether to write decrypted
    values for any of the included secrets.
  • Switches set_encryption_key to build a single batch
    [Key, Variable(p1), Variable(p2), …] and dispatch it through the
    new entry point.

The single-object handle_deployment_metadata path is untouched.

Companion EE PR

windmill-ee-private branch: ruben/win-1994-fix-httpsgithubcomwindmill-labswindmillissues9344
HEAD: 108b0e2257d2df18b043802f88e945f9d3f0edf1

Test plan

  • cargo check --features enterprise,private — clean (3 unrelated pre-existing warnings only)
  • New integration test: cargo test -p windmill-api-integration-tests --test workspace_encryption_key_git_sync --features enterprise,private
    • Configures Variable + Secret + Key in include_type and inserts 3 secret variables
    • POSTs to /workspaces/encryption_key with a new key
    • Asserts exactly one deployment-callback job is queued whose items array contains the key entry + all 3 variable entries with skip_secret=false
  • Existing tests still green: workspaces, workspace_dependencies_git_sync

Fixes WIN-1994
Fixes #9344

🤖 Generated with Claude Code

When changing a workspace encryption key, the secret variables get
re-encrypted with the new key, but the git sync was only dispatched for
the encryption_key.yaml metadata file. Repos with Secrets sync enabled
were left with stale ciphertexts until the next per-variable deployment.

Now, after the transaction commits, we also dispatch a Variable git sync
event for each re-encrypted secret so the new encrypted values are
pushed to the configured repos. Errors are logged but don't roll back
the key rotation.

Fixes WIN-1994
Fixes #9344

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 27, 2026

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5c6efd1
Status: ✅  Deploy successful!
Preview URL: https://089b09bf.windmill.pages.dev
Branch Preview URL: https://ruben-win-1994-fix-httpsgith.windmill.pages.dev

View logs

@rubenfiszel rubenfiszel marked this pull request as ready for review May 27, 2026 20:30
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 27, 2026

Claude finished @rubenfiszel's task in 2m 30s —— View job


Verdict

Mergeable, but should ideally address nits: dead error-handling branch + minor path.clone() duplication.

The fix is correct: when an encryption key rotates and skip_reencrypt is false, each secret variable is re-encrypted in-tx and a DeployedObject::Variable git-sync event is now fired per path after commit. The parent_path: Some(path.clone()) shape and skip_db_insert: true match the existing variable update/delete patterns in backend/windmill-store/src/variables.rs (lines 498, 652, 822, 1171), so the new ciphertexts will reach repos that have Secrets sync enabled.

Findings

P2 — dead if let Err(e) = … arm around handle_deployment_metadata (backend/windmill-api-workspaces/src/workspaces.rs:3477-3496)

handle_deployment_metadata is fire-and-forget in both flavors:

  • EE (backend/windmill-git-sync/src/git_sync_ee.rs:339-395): spawns a tokio::spawn and returns Ok(()) synchronously, logging any internal errors itself.
  • OSS (backend/windmill-git-sync/src/git_sync_oss.rs:12-24): returns Ok(()) immediately.

So the Err branch and the tracing::error!("Failed to trigger git sync for re-encrypted variable …") line are unreachable today, and the comment "Errors are logged but not surfaced so the key rotation itself isn't rolled back" suggests semantics the function doesn't actually have. Every other call site in the file (tx.commit()handle_deployment_metadata(...).await?) uses .await? for symmetry. Suggest dropping the match and following the same pattern:

for path in reencrypted_secret_paths {
    handle_deployment_metadata(
        &authed.email,
        &authed.username,
        &db,
        &w_id,
        DeployedObject::Variable { path: path.clone(), parent_path: Some(path) },
        Some("Variable re-encrypted after workspace key rotation".to_string()),
        true,
        None,
    )
    .await?;
}

This is consistent with AGENTS.md: "Don't add error handling, fallbacks, or validation for scenarios that can't happen."

P2 — redundant path.clone() (backend/windmill-api-workspaces/src/workspaces.rs:3483)

The for path in reencrypted_secret_paths loop consumes the Vec, so path is owned and not reused after the call. path: path.clone(), parent_path: Some(path.clone()) does two allocations where one would do — pass Some(path) as the second arg.

AGENTS.md compliance

  • Rust patterns from rust-backend skill: follows existing handle_deployment_metadata shape, no banned patterns in the diff.
  • The "Don't add error handling … for scenarios that can't happen" rule is the relevant one for the dead-arm nit above.
  • No pub fn introduced; auth check (require_super_admin) was already in place upstream.

Test coverage

  • Backend (Rust): no unit tests added. The function is a multi-step handler that talks to the DB, the crypt helpers, and handle_deployment_metadata, so a unit test is awkward without integration scaffolding. The existing test suite doesn't cover set_encryption_key, so this is consistent with the rest of the codebase but worth flagging.
  • Manual verification: the PR description's checklist is unchecked. Before merge, exercise the path the bug actually affects: in a workspace with Git Sync + Secrets sync enabled, rotate the encryption key from /#superadmin-settings (or the workspace settings page), then confirm in the configured git repo that (a) encryption_key.yaml updates and (b) each secret variable's value.yaml (or equivalent) is re-pushed with the new ciphertext. Spot-check that decrypting one of the re-pushed ciphertexts with the new key returns the original cleartext.

  • Read AGENTS.md and the changed file
  • Inspect the diff against origin/main
  • Look for bugs / race conditions / missed events / ordering issues
  • Validate AGENTS.md compliance
  • Assess test coverage and post review
    · Branch

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

Good to merge


What this PR does

When a workspace encryption key is rotated, all secret variables are re-encrypted inside a single transaction. Before this change, only an encryption_key.yaml git-sync event was dispatched — repos with Secrets sync enabled retained stale ciphertexts until a variable was individually deployed. This PR collects each re-encrypted secret's path during the transaction and dispatches a DeployedObject::Variable event per path after commit.

Findings

No blocking issues or nits.

The diff is clean and well-contained:

  • Transaction safety: reencrypted_secret_paths is populated inside the same transaction (&mut *tx) that performs the re-encryption, and the deployment-metadata calls fire only after tx.commit().await?. No stale-state window.
  • parent_path convention: The call uses parent_path: Some(path.clone()), which matches the existing convention for variable updates, deletes, and renames elsewhere in the codebase (e.g., windmill-store/src/variables.rs:652, :822).
  • Error strategy: Variable-sync failures are logged via tracing::error! but never propagated — matching the PR's stated design that git-sync errors should not roll back the key rotation itself. The pre-existing Key event above it uses .await? (which is a no-op in practice since handle_deployment_metadata spawns the real work and immediately returns Ok(())), so both paths are safe.
  • skip_db_insert: true: Consistent with all other DeployedObject::Variable calls in the codebase — avoids polluting the deployment history with synthetic re-encryption events.
  • No new auth surface: The change is entirely inside an existing handler (set_encryption_key) that was already gated by workspace admin access.

Test coverage

This is a backend handler enhancement with no new utility functions to unit-test and no API surface change. Integration-level testing of the git-sync dispatch loop would require a full git-sync EE stack, which is impractical for a unit test.

Manual verification needed (as the PR's test plan already outlines): In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm that one git-sync job is dispatched per re-encrypted secret variable (plus one for encryption_key.yaml), and that the remote repo receives the new ciphertexts.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

Good to merge

No issues found. Checked for bugs, security issues, performance regressions, and AGENTS.md compliance.

Test coverage

No automated tests were added in this diff. For this backend git-sync behavior, the main manual verification is rotating a workspace encryption key with Git Sync configured for variables and secrets, then confirming the resulting sync job includes the re-encrypted secret variable ciphertexts. There is no separate frontend surface changed by this PR.

Workspace encryption key rotation now re-encrypts every secret variable
and then dispatches a single batched git-sync job carrying the Key event
plus one Variable item per re-encrypted secret. Repos with Secrets sync
enabled receive every new ciphertext in one commit instead of nothing
(previously only `encryption_key.yaml` was pushed) — and instead of N
separate jobs the debouncer might or might not merge.

Wires through the new `handle_deployment_metadata_batch` entry point
added in the companion EE PR; OSS has a no-op shim so the build stays
green.

Adds an integration test (`workspace_encryption_key_git_sync`) asserting
that rotating the key with 3 secret variables in scope produces exactly
one deployment-callback job whose `items` array contains the Key event
+ all 3 variable entries and `skip_secret=false`.

Fixes WIN-1994
Fixes #9344

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenfiszel rubenfiszel changed the title fix: trigger git sync for re-encrypted secrets on encryption key change [ee] fix: batch encryption-key rotation into one git-sync job May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

Good to merge


What this PR does

When a workspace encryption key is rotated, all secret variables are re-encrypted inside a single transaction. Before this change, only an encryption_key.yaml git-sync event was dispatched — repos with Secrets sync enabled retained stale ciphertexts until a variable was individually deployed. This PR collects each re-encrypted secret's path during the transaction and, after commit, dispatches them as a single batched git-sync job via handle_deployment_metadata_batch.

Findings

No blocking issues or nits.

The diff is well-contained across two commits:

  1. Initial fix (7ad22140) — tracked re-encrypted paths and dispatched individual events per path.
  2. Batch refinement (446ff676) — switched from individual handle_deployment_metadata calls to a batched handle_deployment_metadata_batch entrypoint, so all items (Key + Variables) flow through a single git-sync job per repo. This also addressed the earlier Claude Code review's P2 nit about dead error-handling branches.

Key points from the current state of the diff:

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that performs re-encryption), and the batch dispatch fires only after tx.commit().await?. No stale-state window.
  • parent_path convention: DeployedObject::Variable { path: path.clone(), parent_path: Some(path) } matches the established pattern for variable deletions and renames in windmill-store/src/variables.rs (lines 652 and 822 use parent_path: Some(path.to_string()) with the same value for both fields). The path.clone() is necessary here because path moves into Some(path) and is also needed for the path field.
  • OSS gating: Both git_sync_oss.rs and git_sync_ee.rs provide handle_deployment_metadata_batch. The OSS version is a #[cfg(not(feature = "private"))] no-op stub returning Ok(()). The EE version is linked via ee-repo-ref.txt (updated to commit 108b0e2). lib.rs re-exports are correctly feature-gated.
  • No new auth surface: The change is inside set_encryption_key, which already gates on require_super_admin.
  • No new pub fn: All changes are internal plumbing within existing functions.

AGENTS.md compliance

  • Rust patterns follow existing handle_deployment_metadata shapes in the codebase. No banned patterns.
  • No frontend changes — the Svelte skill does not apply.
  • No default_value / $bindable anti-patterns in scope.

Test coverage

  • Backend (Rust): A comprehensive integration test is added at backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs (271 lines). It sets up git-sync config with Secrets sync enabled, inserts 3 secret variables, rotates the encryption key via the API, and asserts that the resulting deployment-callback job carries exactly the Key entry + all 3 re-encrypted variable paths in a single batch, with skip_secret: false. The test is gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and uses the existing fixtures("base") and migrations = "../migrations" conventions consistent with all other integration tests in the crate.

  • Manual verification (from the PR's unchecked test plan): In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one git-sync job is dispatched per re-encrypted secret variable plus one for encryption_key.yaml, and that the remote repo receives the new ciphertexts.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/windmill-api-workspaces/src/workspaces.rs">

<violation number="1" location="backend/windmill-api-workspaces/src/workspaces.rs:3474">
P2: Do not propagate post-commit git-sync batch errors; log them instead so key rotation success is not reported as a failed request.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment on lines +3474 to 3482
handle_deployment_metadata_batch(
&authed.email,
&authed.username,
&db,
&w_id,
windmill_git_sync::DeployedObject::Key { key_type: "encryption_key".to_string() },
batch,
Some("Encryption key updated".to_string()),
false,
None,
)
.await?;
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Do not propagate post-commit git-sync batch errors; log them instead so key rotation success is not reported as a failed request.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/windmill-api-workspaces/src/workspaces.rs, line 3474:

<comment>Do not propagate post-commit git-sync batch errors; log them instead so key rotation success is not reported as a failed request.</comment>

<file context>
@@ -3458,43 +3461,26 @@ async fn set_encryption_key(
+        batch.push(DeployedObject::Variable { path: path.clone(), parent_path: Some(path) });
+    }
+
+    handle_deployment_metadata_batch(
         &authed.email,
         &authed.username,
</file context>
Suggested change
handle_deployment_metadata_batch(
&authed.email,
&authed.username,
&db,
&w_id,
windmill_git_sync::DeployedObject::Key { key_type: "encryption_key".to_string() },
batch,
Some("Encryption key updated".to_string()),
false,
None,
)
.await?;
if let Err(e) = handle_deployment_metadata_batch(
&authed.email,
&authed.username,
&db,
&w_id,
batch,
Some("Encryption key updated".to_string()),
)
.await
{
tracing::error!(
"Failed to trigger git sync batch after workspace key rotation for {}: {}",
w_id,
e
);
}
Fix with Cubic

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

cc @rubenfiszel

Should address issues before merging: git sync skipped without debouncing support

Found 1 issue:

  1. [P1] backend/windmill-api-workspaces/src/workspaces.rs:3474 - Key rotation now always dispatches through the new batch path, but the EE batch implementation skips the repo entirely when the sync script is older than 28103 or MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING is false (backend/windmill-git-sync/src/git_sync_ee.rs:761). The previous single-object handle_deployment_metadata path still queued a legacy non-debounced callback in that case (backend/windmill-git-sync/src/git_sync_ee.rs:1458), so existing workspaces with older sync scripts at least pushed encryption_key.yaml. With this PR, rotating the key can return 200 while queuing no git-sync job at all, leaving the repo stale. Preserve a fallback, or fall back to per-object dispatch when batch debouncing is unavailable.

Test coverage

Backend coverage improved with an enterprise/private integration test for the debounced happy path: one callback job containing the key plus all re-encrypted secret variables. The missing coverage is the compatibility path above: a repo using an older sync script, or an instance where sync-job debouncing is not available.

Manual verification before merge should include rotating a workspace encryption key with Git Sync enabled on an older sync script and confirming a deployment callback is still queued and the repo receives the updated encryption_key.yaml plus re-encrypted secret variable ciphertexts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

cc @rubenfiszel

Should address issues before merging: git sync skipped without debouncing support

Found 1 issue:

  1. [P1] backend/windmill-api-workspaces/src/workspaces.rs:3474 - Key rotation now always uses handle_deployment_metadata_batch, but the current EE batch implementation still skips the repo entirely when the sync script is older than 28103 or sync-job debouncing is unavailable (backend/windmill-git-sync/src/git_sync_ee.rs:753-760). The single-object path still has a non-debounced fallback (backend/windmill-git-sync/src/git_sync_ee.rs:1190-1204), so this regresses existing workspaces with older sync scripts: rotating the key can return 200 while queuing no git-sync job at all. Fall back to per-object dispatch or preserve a legacy non-debounced path for the batch case.

Test coverage

Backend coverage now includes the enterprise/private happy path where one debounced callback job carries the key plus all re-encrypted variables. The missing coverage is the compatibility path above: an older sync script or workers without sync-job debouncing support.

Manual verification before merge should include rotating a workspace encryption key with Git Sync enabled on an older sync script and confirming a deployment callback is still queued and the repo receives the updated encryption_key.yaml plus re-encrypted secret variable ciphertexts.

@github-actions
Copy link
Copy Markdown
Contributor

Now I have all the context I need. Let me write the final review.

Pi Review

cc @rubenfiszel

Should address issues before merging: batch path silently drops git sync for repos lacking debouncing support

Findings

P1 — handle_deployment_metadata_batch_inner skips repos without debouncing, erasing the encryption_key sync that the old path always delivered (backend/windmill-git-sync/src/git_sync_ee.rs:753-761)

Before this PR, set_encryption_key dispatched a single DeployedObject::Key through handle_deployment_metadatahandle_deployment_metadata_inner. That inner function checks debouncing support, but when it's unavailable it falls through to an else branch (line ~1192) that passes flat (path_type, path, parent_path, commit_msg) args — still queuing a deployment callback job, just without the newer items-array debouncing format. All repos got at least encryption_key.yaml pushed.

After this PR, set_encryption_key builds a [Key, Variable1, Variable2, …] batch and dispatches through handle_deployment_metadata_batchhandle_deployment_metadata_batch_inner. That function checks:

let supports_debouncing = workspace_git_repo.is_script_meets_min_version(28103)?
    && MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING.met().await;
if !supports_debouncing {
    tracing::warn!("Batch git sync requires sync-job debouncing support …; skipping batch dispatch");
    continue;  // repo completely skipped — no job queued
}

There is no fallback else branch. For any repo whose sync script is older than min-version 28103 or whose workers don't meet MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING, the key rotation returns HTTP 200 but queues zero git-sync jobs — neither encryption_key.yaml nor the re-encrypted secret variables. This is a regression vs. the pre-PR behavior where those repos at least received the key update.

The integration test (workspace_encryption_key_git_sync.rs) only exercises the happy path with a script in folder 28103, so CI won't catch this.

Suggested approach: restore a fallback when debouncing is unavailable. Options, in order of preference:

  1. Fall back to per-object handle_deployment_metadata calls for non-debouncing repos (exactly what the old path did), so encryption-key rotation still produces a git-sync job for the key (and, if backwards compatibility for the secret-variable re-encryption matters, one per variable).
  2. If the intent is that the batch-only path is acceptable for newer users but the old encryption_key event must not be lost, at minimum push a single DeployedObject::Key through the existing handle_deployment_metadata codepath before or alongside the batch call when debouncing isn't supported.

AGENTS.md compliance

No pub fn violations, no svelte changes, no $bindable anti-patterns. The rust-backend patterns (tx safety, error handling via tokio::spawn fire-and-forget) are consistent with the existing codebase.

Test coverage

  • Backend (Rust): A thorough 271-line integration test covers the happy path: setup with sync script under f/28103/, 3 secret variables, key rotation via the API, and assertion of one batched deployment-callback job carrying the key + all 3 variables with skip_secret=false. The test is gated on enterprise + private features and follows existing test conventions.
    • Missing: no test for the non-debouncing fallback path. If the suggested fallback is added, a companion test with a sync script outside f/28103/ would prevent future regressions.
  • Frontend: no changes.
  • Manual verification: In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one git-sync job is dispatched carrying every re-encrypted variable plus the encryption key entry, and that the remote repo receives the new ciphertexts. Additionally, verify the same flow on a workspace whose sync script predates min-version 28103 — a deployment callback should still be queued (not silently skipped).

@rubenfiszel
Copy link
Copy Markdown
Contributor Author

/ai review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 28, 2026

Claude finished @rubenfiszel's task in 3m 29s —— View job


Review: PR #9355

Verdict: Request changes — the batch-only dispatch introduces a real regression for repos without sync-job debouncing support (the P1 Codex and Pi have flagged across the last three review rounds is still unaddressed by the ee-repo-ref bump in 88dd417).

The core fix is correct and well-scoped: reencrypted_secret_paths is collected inside the same &mut *tx that re-encrypts, the batch is built after tx.commit().await?, and the DeployedObject::Variable { path, parent_path: Some(path) } shape matches the existing variable convention. The OSS shim + lib.rs feature gating are correct. The problem is purely in how the batch entrypoint behaves when debouncing isn't available.


P1 (blocking) — key rotation can now queue zero git-sync jobs on older sync scripts

backend/windmill-api-workspaces/src/workspaces.rs:3474

Before this PR, set_encryption_key dispatched DeployedObject::Key through handle_deployment_metadatahandle_deployment_metadata_inner, which has an else fallback that queues a legacy (non-debounced) deployment callback when debouncing is unavailable. So every repo with Git Sync enabled received at least encryption_key.yaml.

After this PR, everything flows through handle_deployment_metadata_batchhandle_deployment_metadata_batch_inner, which (per the EE impl cited by Codex/Pi at git_sync_ee.rs:753-761) does:

let supports_debouncing = workspace_git_repo.is_script_meets_min_version(28103)?
    && MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING.met().await;
if !supports_debouncing {
    tracing::warn!("... skipping batch dispatch");
    continue; // repo skipped entirely — no job queued
}

There is no else branch. For any repo whose effective sync script is older than hub version 28103 (is_script_meets_min_version, windmill-common/src/workspaces.rs:254) or whose workers don't meet MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING (1.602.0, windmill-common/src/min_version.rs:8), the rotation returns HTTP 200 while pushing nothing — neither the key nor the re-encrypted ciphertexts. That's strictly worse than pre-PR behavior, where the key at least synced.

Note: git_sync_ee.rs is not in this OSS checkout (it's pinned by SHA 4b222bf in ee-repo-ref.txt), so I can't read the inner directly. This analysis rests on the OSS-visible single→batch switch, the integration test only covering f/28103, and three independent reviewer citations of the same lines. Please confirm against the EE branch before merging.

Suggested fix: restore a fallback in the EE handle_deployment_metadata_batch_inner for non-debouncing repos — iterate the batch and dispatch each item via the existing single-object handle_deployment_metadata path. That preserves both the key sync and the new per-variable re-encryption sync for older scripts. Fix this →


P2 — post-commit batch error propagated as a request failure

backend/windmill-api-workspaces/src/workspaces.rs:3474 (cubic also flagged this)

handle_deployment_metadata_batch(...).await? runs after tx.commit().await?. If the batch entrypoint does any synchronous fallible work (it looks up repo settings before dispatching, per the PR description), an error here returns 500 even though the key rotation already committed — the user sees a failed request for a successful rotation.

Caveat: the old code also used .await? here, and the single-object EE path spawns its work and returns Ok(()) immediately (so it effectively never errored). Whether this is a new risk depends on whether the batch entrypoint similarly spawns or does the settings lookup inline. If inline, prefer logging over propagating:

if let Err(e) = handle_deployment_metadata_batch(...).await {
    tracing::error!("Failed to trigger git sync batch after key rotation for {w_id}: {e}");
}

Test coverage

The new integration test (workspace_encryption_key_git_sync.rs) is solid for the happy path — it asserts a single callback job carrying the key entry + all 3 variable paths with skip_secret=false. But it anchors the sync script at f/28103/... (line 184), i.e. exactly the min-version that enables debouncing, so CI never exercises the regression path above. A companion test with a script folder below 28103 (asserting a callback is still queued) would lock in the fix.


  • Read AGENTS.md & changed files
  • Inspect diff against origin/main
  • Verify the P1 (non-debouncing fallback regression) — still present after 88dd417
  • Check OSS shim / lib.rs gating & batch dispatch — correct
  • Assess test coverage — happy path only, regression path untested
  • Post review
    · Branch

Adds a regression test exercising a workspace whose sync script predates
hub version 28103: the rotation must still queue a legacy-format
deployment-callback job per item (encryption_key + each re-encrypted
secret) instead of silently skipping the repo. Bumps ee-repo-ref to the
EE fallback fix.

Addresses the P1 raised in the PR review (Codex/Pi/Claude): batch path
dropped git sync entirely for repos without sync-job debouncing support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rubenfiszel
Copy link
Copy Markdown
Contributor Author

Addressed the review findings:

P1 (blocking) — batch path skipped repos without debouncing support → key rotation queued zero jobs on older sync scripts. Fixed in the EE handle_deployment_metadata_batch_inner (companion PR, now at 760dd3c). When supports_debouncing is false it no longer continues past the repo; instead it falls back to the legacy one-job-per-item format (flat path_type/path/parent_path/commit_msg args, default debouncing settings) — identical to what the single-object path's non-debounced else branch produces. So a repo on a sync script older than hub 28103 (or workers below 1.602.0) again receives a git-sync job for encryption_key.yaml plus one per re-encrypted secret.

Added a regression test test_encryption_key_rotation_falls_back_without_debouncing that anchors the sync script in f/28000/ (below 28103), rotates the key, and asserts the rotation queues key + N variable legacy-format jobs (asserting items is absent and each job carries flat path_type/path). Both this and the happy-path test pass, as do the existing workspace_dependencies_git_sync tests.

Also extracted a shared push_git_sync_callback helper so the debounced and legacy branches don't duplicate the long windmill_queue::push signature.

P2 — post-commit handle_deployment_metadata_batch(...).await? propagating as a request failure. This is a no-op in practice: like the single-object handle_deployment_metadata used at every other call site, the batch entrypoint clones its inputs, tokio::spawns the real work (which logs its own errors), and returns Ok(()) synchronously — there's no inline fallible work before the spawn, so ? can't surface a post-commit error. Kept .await? for consistency with the surrounding call sites rather than adding an if let Err guard for a branch that can't be taken (per AGENTS.md "don't add error handling for scenarios that can't happen").

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

cc @rubenfiszel
Should address issues before merging: missing SQLx offline cache for new integration test queries

Found 1 issue:

  1. [P1] backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs:119 adds new sqlx::query! / query_as! calls, including this insert and the deployment-callback select at line 154, but the PR does not update backend/.sqlx. Backend CI runs cargo test --all with SQLX_OFFLINE=true, so these uncached checked queries will fail macro expansion in CI. This also violates the repo rule in AGENTS.md: 4. **Validate**: After every change, run the appropriate checks per docs/validation.md; docs/validation.md then requires ./update_sqlx.sh after code changes. Please regenerate and commit the SQLx cache entries.

Test coverage

Backend: the PR adds integration coverage for both batched git-sync dispatch and the legacy fallback path, which is the right layer for this behavior. The remaining blocker is making those tests compile under the repository’s offline SQLx CI setup.

Manual verification still worth doing before merge: rotate a workspace encryption key with Git Sync configured for variables, secrets, and keys, then confirm the queued deployment callback includes the key plus every re-encrypted secret variable and that the target repo receives the updated ciphertexts.

…tpsgithubcomwindmill-labswindmillissues9344

# Conflicts:
#	backend/ee-repo-ref.txt
@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for new integration test queries

Found 1 issue:

  1. [P1] backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs:117 adds new sqlx::query! / query_as! macros, including the secret-variable insert here and the deployment-callback select at line 151, but the PR does not add matching backend/.sqlx cache files. I checked the diff and there are no .sqlx changes, and the existing cache does not contain these query strings. CI with SQLx offline cache will fail macro expansion for these checked queries. This also violates the repo validation rule: 4. **Validate**: After every change, run the appropriate checks per docs/validation.md; docs/validation.md then says: After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache.

Test coverage

Backend coverage is now at the right layer: the PR adds enterprise/private integration tests for both the batched git-sync path and the legacy non-debounced fallback path. The remaining blocker is making those tests compile under the repository’s SQLx cache requirements.

Manual verification before merge should still cover rotating a workspace encryption key with Git Sync configured for variables, secrets, and keys, then confirming the queued callback contains the encryption key plus every re-encrypted secret and that the target repo receives the updated ciphertexts.

@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for new integration test queries

Findings

P1 — query!/query_as! macros in the new integration test lack SQLx offline cache entries (backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs)

The new test file uses compile-time-checked SQL macros that require .sqlx/ offline cache entries when SQLX_OFFLINE=true. CI is configured with SQLX_OFFLINE: true (.github/workflows/backend-test.yml:240), and cargo test --all (which includes enterprise,private features) will fail to expand these macros.

Two queries are unaccounted for:

  1. insert_secret_variablessqlx::query! with:

    INSERT INTO variable (workspace_id, path, value, is_secret, description, extra_perms, account)
    VALUES ($1, $2, $3, true, '', '{}'::jsonb, NULL)
    ON CONFLICT (workspace_id, path) DO UPDATE SET value = EXCLUDED.value

    No existing cache entry matches this exact column list (extra_perms is not in any cached INSERT).

  2. wait_for_deployment_callbackssqlx::query_as! with:

    SELECT j.id, j.args
    FROM v2_job j
    JOIN v2_job_queue q ON j.id = q.id
    WHERE j.runnable_path = $1
      AND j.kind = 'deploymentcallback'
      AND j.workspace_id = 'test-workspace'
    ORDER BY j.created_at DESC

    The closest cached query (0d4f28ca) selects j.id, j.runnable_path, j.args, j.kind::text and omits the j.workspace_id clause — a different hash.

(The setup_git_sync_config query! for UPDATE workspace_settings SET git_sync = $1 WHERE workspace_id = $2 is already cached at ec4f70d9.)

This also violates AGENTS.md step 4 (docs/validation.md): "After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache."

Fix: run ./update_sqlx.sh from backend/ and commit the resulting .sqlx/ changes.

AGENTS.md compliance

The code changes are clean:

  • reencrypted_secret_paths is populated inside the &mut *tx transaction block, and the batch dispatch fires after tx.commit().await? — no stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) } matches the existing convention for variable updates/deletes elsewhere in the codebase.
  • OSS handle_deployment_metadata_batch shim + lib.rs feature-gated re-exports are correct.
  • No $bindable violations, no new pub fn, no frontend changes.
  • The prior P1 about batch dispatch skipping repos without debouncing has been resolved: the EE companion PR (now at 760dd3c) adds a fallback, and the regression test test_encryption_key_rotation_falls_back_without_debouncing covers the legacy path.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover both the debounced happy path (single job carrying key + all 3 re-encrypted variables) and the legacy non-debouncing fallback (one job per item for key + 2 variables). The tests are gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and follow existing #[sqlx::test] conventions. The only blocker is the missing SQLx cache entries described above.

  • Manual verification (from the PR's unchecked test plan): In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one git-sync job is dispatched carrying every re-encrypted variable plus the encryption key entry, and that the remote repo receives the new ciphertexts. Additionally, confirm the same flow works on a workspace with an older sync script (pre-28103) — per-legacy-item jobs should still be queued.

@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for 2 new checked queries in integration test

Findings

P1 — query!/query_as! macros in the new integration test lack backend/.sqlx/ offline cache entries (backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs)

The new test file uses compile-time-checked sqlx::query! and sqlx::query_as! macros, but the PR diff contains zero changes under backend/.sqlx/. CI runs cargo test --all with SQLX_OFFLINE=true (per .github/workflows/backend-test.yml:240), so these macros will fail expansion.

Two queries need new cache entries:

  1. insert_secret_variables at line 127 — sqlx::query! with:

    INSERT INTO variable (workspace_id, path, value, is_secret, description, extra_perms, account)
    VALUES ($1, $2, $3, true, '', '{}'::jsonb, NULL)
    ON CONFLICT (workspace_id, path) DO UPDATE SET value = EXCLUDED.value

    No existing cache entry includes the extra_perms column in an INSERT. The 9 existing INSERT cache entries all use column lists without extra_perms (query-1e426c8a, query-295a8807, query-2b34ae32, etc.).

  2. wait_for_deployment_callbacks at line 154 — sqlx::query_as!(DeploymentCallbackJob, ...) with:

    SELECT j.id, j.args
    FROM v2_job j
    JOIN v2_job_queue q ON j.id = q.id
    WHERE j.runnable_path = $1
      AND j.kind = 'deploymentcallback'
      AND j.workspace_id = 'test-workspace'
    ORDER BY j.created_at DESC

    The closest existing cache entry (query-0d4f28ca) selects j.id, j.runnable_path, j.args, j.kind::text and omits the j.workspace_id clause — a different hash.

  3. (The setup_git_sync_config query! for UPDATE workspace_settings SET git_sync = $1 WHERE workspace_id = $2 is already cached at query-ec4f70d9.)

This violates AGENTS.md step 4, which defers to docs/validation.md: "After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache."

Fix: run ./update_sqlx.sh from backend/ and commit the new .sqlx/ cache files.

Code quality assessment

The logic changes are solid:

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx, and the batch dispatch fires after tx.commit().await? — no stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary here because path is consumed by Some(path) in the second field and still needed for the path field.
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is correctly gated behind #[cfg(not(feature = "private"))] and returns Ok(()) immediately. The lib.rs re-exports are correctly feature-gated per cfg(feature).
  • No new public surface: the change is internal to set_encryption_key, which was already gated by require_super_admin. No new pub fn introduced.
  • Prior P1 resolved: the batch dispatch fallback for repos without debouncing support has been addressed in the EE companion PR (per the prior discussion and the regression test test_encryption_key_rotation_falls_back_without_debouncing).

No AGENTS.md style violations, no $bindable anti-patterns, no rust-backend skill violations.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover both the debounced happy path (f/28103/ sync script, one batched job with key + 3 variables) and the legacy non-debouncing fallback (f/28000/ sync script, one job per item). The tests are correctly gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and follow existing #[sqlx::test] conventions. The only blocker is the SQLx cache entries described above.

  • Frontend: no changes.

  • Manual verification (from the PR's unchecked test plan): In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key at /api/w/{workspace}/workspaces/encryption_key and confirm one git-sync job is dispatched per repo carrying the encryption_key entry plus every re-encrypted secret variable, with skip_secret=false, and that the remote repo receives the new ciphertexts. Additionally, verify the legacy fallback path on a workspace with an older sync script (pre-hub-version 28103) — per-item jobs should still be queued.

@rubenfiszel
Copy link
Copy Markdown
Contributor Author

git_sync_e2e failure is pre-existing, not caused by this PR

The failing test is test_workspace_fork_creates_branch ("Expected a wm-fork branch in the repo after forking, got: ['origin/HEAD', 'origin/main']"). I reproduced it locally and bisected it:

  • On this branch (local EE build + Gitea + license): the fork test fails.
  • On origin/main's git-sync code (reverted my git_sync_ee.rs / workspaces.rs / git_sync_oss.rs / lib.rs to origin/main, rebuilt, re-ran the exact same test): it fails identically.

So the failure is independent of this PR. It's normally invisible because git-sync-test.yml is path-gated and the git_sync_e2e job is skipped on essentially every main push (main rarely touches backend/windmill-git-sync/**); this PR changes git-sync files, so the suite actually runs and surfaces the pre-existing breakage. Only this one test fails — every other git-sync e2e test passes.

Root cause (for a separate fix/issue): the fork-branch sync job is queued with correct args (only_create_branch: true, workspace_id: wm-fork-…, repo_url_resource_path, the "Creating branch" item) and the job completes successfully, but the hub sync script (hub/28236/sync-script-to-git-repo-windmill) creates no wm-fork/... branch in the repo. That's in the fork-branch-creation path (handle_fork_branch_creationobj=None), which this PR does not touch — the args it produces are byte-identical to main.

Recommend tracking the fork-branch e2e failure separately; it should not block the encryption-key git-sync change here.

…tpsgithubcomwindmill-labswindmillissues9344

# Conflicts:
#	backend/ee-repo-ref.txt
@github-actions
Copy link
Copy Markdown
Contributor

I have all the context needed. Let me write the final review.

Pi Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for 2 new checked queries in integration test

Findings

P1 — query!/query_as! macros in new integration test lack backend/.sqlx/ offline cache entries (backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs:127,154)

The new test file uses compile-time-checked SQL macros (sqlx::query! and sqlx::query_as!), but the PR diff contains zero changes under backend/.sqlx/. CI runs cargo test --all with SQLX_OFFLINE=true (per .github/workflows/backend-test.yml:240), so these macros will fail expansion in CI.

Two queries need new cache entries:

  1. insert_secret_variables at line 127 — sqlx::query!:

    INSERT INTO variable (workspace_id, path, value, is_secret, description, extra_perms, account)
    VALUES ($1, $2, $3, true, '', '{}'::jsonb, NULL)
    ON CONFLICT (workspace_id, path) DO UPDATE SET value = EXCLUDED.value

    No existing .sqlx/ entry includes extra_perms in an INSERT column list. The 9 existing variable-INSERT cache entries use column lists without extra_perms.

  2. wait_for_deployment_callbacks at line 154 — sqlx::query_as!(DeploymentCallbackJob, ...):

    SELECT j.id, j.args
    FROM v2_job j
    JOIN v2_job_queue q ON j.id = q.id
    WHERE j.runnable_path = $1
      AND j.kind = 'deploymentcallback'
      AND j.workspace_id = 'test-workspace'
    ORDER BY j.created_at DESC

    The closest cached query (query-0d4f28ca) selects j.id, j.runnable_path, j.args, j.kind::text and omits the j.workspace_id clause — a different hash.

(The setup_git_sync_config query! for UPDATE workspace_settings SET git_sync = $1 WHERE workspace_id = $2 is already cached at query-ec4f70d9.)

This violates docs/validation.md as referenced by AGENTS.md step 4: "After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache."

Fix: run ./update_sqlx.sh from backend/ and commit the resulting .sqlx/ cache files.

Code quality assessment

The logic changes are correct and well-structured:

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that performs re-encryption), and the batch dispatch fires only after tx.commit().await?. No stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary — path is consumed by Some(path) in the parent_path field and still needed for the path field. Not redundant.
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is #[cfg(not(feature = "private"))] and returns Ok(()) immediately. The lib.rs re-exports are correctly feature-gated per cfg(feature).
  • No new public surface: the change is internal to set_encryption_key, which was already gated by require_super_admin. No new pub fn introduced.
  • Prior P1 resolved: the batch dispatch fallback for repos without debouncing support has been addressed in the EE companion PR (dc4888d in ee-repo-ref.txt), and the regression test test_encryption_key_rotation_falls_back_without_debouncing covers the legacy per-item fallback path with sync script f/28000/.

No AGENTS.md style violations, no $bindable anti-patterns, no rust-backend skill violations.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover both the debounced happy path (f/28103/ sync script, one batched job with key + 3 variables, skip_secret=false) and the legacy non-debouncing fallback (f/28000/ sync script, one job per item, flat args with no items array). Both are gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and follow #[sqlx::test] conventions. The only blocker is the missing SQLx cache entries described above.
  • Frontend: no changes.
  • Manual verification (from the PR's unchecked test plan): In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one batched git-sync job is dispatched carrying the key entry plus every re-encrypted variable, and the remote repo receives the new ciphertexts. Also confirm the same flow with an older sync script (pre-28103) still queues per-item legacy jobs.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for new integration test queries

Found 1 issue:

  1. [P1] backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs:117 adds new sqlx::query! / query_as! macros, but the PR does not add matching backend/.sqlx cache entries. The new secret-variable insert at line 117 and deployment-callback select at line 151 are not present in the existing cache, and the diff contains no .sqlx changes. CI runs backend tests with SQLX_OFFLINE=true, so these checked queries will fail macro expansion. This also violates the AGENTS.md validation rule: 4. **Validate**: After every change, run the appropriate checks per docs/validation.md; docs/validation.md then requires: After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache.

Test coverage

Backend coverage is otherwise well targeted: the PR adds enterprise/private integration tests for both the batched git-sync path and the legacy non-debounced fallback path. The remaining blocker is making those tests compile under the repository’s SQLx offline-cache CI setup.

Manual verification should still cover rotating a workspace encryption key with Git Sync configured for variables, secrets, and keys, then confirming the queued callback includes the encryption key plus every re-encrypted secret variable and that the target repo receives the updated ciphertexts.

@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for 2 new checked queries in integration test

Findings

P1 — sqlx::query!/sqlx::query_as! macros in the new integration test lack backend/.sqlx/ offline cache entries (backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs:117,152)

The new test uses compile-time-checked SQL macros, but the PR diff contains zero changes under backend/.sqlx/. CI runs backend tests with SQLX_OFFLINE=true (.github/workflows/backend-test.yml:240), so these macros will fail expansion.

Two queries need new cache entries:

  1. insert_secret_variables (line 117) — sqlx::query!:

    INSERT INTO variable (workspace_id, path, value, is_secret, description, extra_perms, account)
    VALUES ($1, $2, $3, true, '', '{}'::jsonb, NULL)
    ON CONFLICT (workspace_id, path) DO UPDATE SET value = EXCLUDED.value

    None of the 9 existing variable-INSERT cache entries include extra_perms in the column list — the existing entries use subsets like (workspace_id, path, value, is_secret, account, is_oauth). This query produces a different SQL hash and needs its own cache file.

  2. wait_for_deployment_callbacks (line 152) — sqlx::query_as!(DeploymentCallbackJob, ...):

    SELECT j.id, j.args
    FROM v2_job j
    JOIN v2_job_queue q ON j.id = q.id
    WHERE j.runnable_path = $1
      AND j.kind = 'deploymentcallback'
      AND j.workspace_id = 'test-workspace'
    ORDER BY j.created_at DESC

    The closest existing cache entry (query-0d4f28ca) selects j.id, j.runnable_path, j.args, j.kind::text and omits the j.workspace_id clause — different SQL, different hash.

  3. The setup_git_sync_config query! (UPDATE workspace_settings SET git_sync = $1 WHERE workspace_id = $2) is already cached at query-ec4f70d9, so it's fine.

This violates docs/validation.md as referenced by AGENTS.md step 4: "After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache."

Fix: run ./update_sqlx.sh from backend/ and commit the resulting .sqlx/ cache files.

Code quality assessment

The logic changes are correct and well-structured:

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that performs re-encryption), and handle_deployment_metadata_batch fires only after tx.commit().await?. No stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary — path is consumed by Some(path) in the parent_path field and still needed for the path field. This matches the existing convention for variable updates/deletes elsewhere in the codebase.
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is correctly gated behind #[cfg(not(feature = "private"))] and returns Ok(()) immediately. The lib.rs re-exports are correctly feature-gated per cfg(feature).
  • No new public surface: the change is internal to set_encryption_key, which was already gated by require_super_admin. No new pub fn introduced.
  • Prior P1 resolved: the batch dispatch fallback for repos without debouncing support has been addressed in the EE companion PR (ee-repo-ref bumped to dc4888d), and the regression test test_encryption_key_rotation_falls_back_without_debouncing covers the legacy per-item fallback path with sync script f/28000/.

No AGENTS.md style violations, no $bindable anti-patterns, no rust-backend skill violations.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover both the debounced happy path (f/28103/ sync script, one batched job carrying the key + 3 re-encrypted variables with skip_secret=false) and the legacy non-debouncing fallback (f/28000/ sync script, one flat-format job per item, asserting no items array). Both are gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and follow #[sqlx::test] conventions. The only blocker is the missing SQLx cache described above.

  • Manual verification (from the PR's unchecked test plan): In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one git-sync job is dispatched carrying every re-encrypted variable plus the encryption key entry, and that the remote repo receives the new ciphertexts. Additionally, verify the same flow on a workspace whose sync script predates hub version 28103 — legacy per-item jobs should still be queued for the key + each variable.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

cc @rubenfiszel

Should address issues before merging: missing SQLx offline cache for new integration test queries

Found 1 issue:

  1. [P1] backend/windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs:117 adds new sqlx::query! / query_as! macros, but the PR does not add matching backend/.sqlx cache entries. The secret-variable insert at line 117 and deployment-callback select at line 151 are not present in the existing cache, and the diff contains no .sqlx changes. CI runs backend tests with SQLX_OFFLINE=true, so these checked queries will fail macro expansion. This also violates the AGENTS.md validation rule: 4. **Validate**: After every change, run the appropriate checks per docs/validation.md; docs/validation.md then requires: After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache.

Test coverage

Backend coverage is otherwise well targeted: the PR adds enterprise/private integration tests for both the batched git-sync path and the legacy non-debounced fallback path. The remaining blocker is making those tests compile under the repository’s SQLx offline-cache CI setup.

Manual verification should still cover rotating a workspace encryption key with Git Sync configured for variables, secrets, and keys, then confirming the queued callback includes the encryption key plus every re-encrypted secret variable and that the target repo receives the updated ciphertexts.

The cargo_test CI job builds with SQLX_OFFLINE=true; the two new
sqlx::query!/query_as! calls in
windmill-api-integration-tests/tests/workspace_encryption_key_git_sync.rs
had no cached entries, failing the build with E0282. Regenerated and
added only the two new query caches (no EE/feature cache loss).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Now I have all the context. Let me write the final review.

Pi Review

Good to merge


What this PR does

When a workspace encryption key is rotated, all secret variables are re-encrypted inside a single transaction. Before this change, only an encryption_key.yaml git-sync event was dispatched — repos with Secrets sync enabled retained stale ciphertexts until a variable was individually deployed. This PR collects each re-encrypted secret's path during the transaction and, after commit, dispatches them as a single batched git-sync job via handle_deployment_metadata_batch.

Findings

No blocking issues or nits.

The diff is well-contained across 7 commits, and all issues raised in prior review rounds have been addressed:

  • P1 (debouncing fallback) — resolved: The EE companion (dc4888d in ee-repo-ref.txt) now falls back to per-item legacy dispatch when is_script_meets_min_version(28103) is false or MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING is unmet. The regression test test_encryption_key_rotation_falls_back_without_debouncing covers this path with a f/28000/ sync script, asserting the correct number of legacy-format jobs are queued.
  • P1 (missing SQLx offline cache) — resolved: Commit c84b28060 added two cache files under backend/.sqlx/ covering the new INSERT INTO variable (query-b876f26c…) and SELECT j.id, j.args FROM v2_job … (query-bf601a91…) queries in the integration test. The pre-existing query-ec4f70d9… already covers the UPDATE workspace_settings SET git_sync query.

Code quality assessment:

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that performs re-encryption), and handle_deployment_metadata_batch fires only after tx.commit().await?. No stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary — path is consumed by Some(path) in the parent_path field and still needed for the path field. This matches the existing convention for variable updates/deletes elsewhere in the codebase (windmill-store/src/variables.rs lines 652, 822).
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is correctly gated behind #[cfg(not(feature = "private"))] and returns Ok(()) immediately. The lib.rs re-exports are correctly feature-gated per cfg(feature).
  • No new public surface: the change is internal to set_encryption_key, which was already gated by require_super_admin. No new pub fn introduced.

AGENTS.md compliance

  • No $bindable anti-patterns, no frontend changes, no new pub fn.
  • Rust patterns follow existing handle_deployment_metadata shapes in the codebase.
  • Validation: docs/validation.md → SQLx cache was regenerated (./update_sqlx.sh) per the commit message. cargo check --features enterprise,private is green per the PR body.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover both the debounced happy path (f/28103/ sync script, one batched job with key + 3 variables, skip_secret=false) and the legacy non-debouncing fallback (f/28000/ sync script, one job per item, flat args with no items array). Both are correctly gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and follow existing #[sqlx::test] conventions. The SQLx offline cache is present for all query!/query_as! macros. Existing tests (workspaces, workspace_dependencies_git_sync) remain green per the PR body.
  • Frontend: no changes.
  • Manual verification: In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one git-sync job is dispatched carrying the encryption key plus every re-encrypted secret variable, and that the remote repo receives the new ciphertexts. Additionally, verify on a workspace with an older sync script (pre-28103) that one deployment callback per item is still queued.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

Good to merge

No issues found. Checked for bugs, security issues, performance regressions, prior Codex findings, AGENTS.md compliance, and the current head diff.

Test coverage

Backend coverage is included via the new enterprise/private integration tests for both the batched git-sync path and the non-debouncing legacy fallback, with SQLx offline cache entries added for the new queries.

Manual verification before merge should rotate a workspace encryption key with Git Sync and Secrets sync enabled, then confirm the configured repo receives encryption_key.yaml plus the re-encrypted secret variable ciphertexts. No frontend surface is changed.

@github-actions
Copy link
Copy Markdown
Contributor

I have thoroughly reviewed the full diff, the EE implementation of handle_deployment_metadata_batch_inner, and confirmed all prior review findings are resolved. Here is the final review:

Pi Review

Good to merge


What this PR does

When a workspace encryption key is rotated, all secret variables are re-encrypted inside a single transaction. Before this change, only an encryption_key.yaml git-sync event was dispatched — repos with Secrets sync enabled retained stale ciphertexts until a variable was individually deployed. This PR collects each re-encrypted secret's path during the transaction and, after commit, dispatches them as a single batched git-sync job via handle_deployment_metadata_batch.

Prior issues — all resolved

  • P1 (debouncing fallback) — resolved in the EE companion PR (dc4888d in ee-repo-ref.txt). When is_script_meets_min_version(28103) is false or MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING is unmet, handle_deployment_metadata_batch_inner (lines ~840-894 of git_sync_ee.rs) falls back to per-item legacy dispatch — pushing one job per item with flat path_type/path/parent_path/commit_msg args, matching the single-object path's non-debounced branch. The integration test test_encryption_key_rotation_falls_back_without_debouncing covers this with f/28000/.
  • P1 (missing SQLx offline cache) — resolved in commit c84b28060. Two cache files (query-b876f26c… and query-bf601a91…) cover the new INSERT INTO variable and deployment-callback SELECT queries.

Code quality

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that re-encrypts), and handle_deployment_metadata_batch fires only after tx.commit().await?. No stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary — path is consumed by Some(path) in the second field and still needed for the path field. Matches the existing convention for variable updates/deletes elsewhere in the codebase.
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is correctly gated behind #[cfg(not(feature = "private"))] and returns Ok(()) immediately.
  • lib.rs re-exports: both EE and OSS re-exports include handle_deployment_metadata_batch, correctly feature-gated.
  • EE implementation: handle_deployment_metadata_batch clones inputs, tokio::spawns the real work (tallies changes per object, then delegates to handle_deployment_metadata_batch_inner), and returns Ok(()) — matching the existing single-object handle_deployment_metadata fire-and-forget pattern. The inner function correctly filters by path/type, buckets items by debounce_key, sets skip_secret per-repo based on effective_include_types, and chooses between debounced batch jobs and legacy per-item jobs.
  • No new public surface: the change is internal to set_encryption_key, which was already gated by require_super_admin. No new pub fn introduced.

AGENTS.md compliance

  • No $bindable anti-patterns, no frontend changes, no new pub fn.
  • Rust patterns follow existing handle_deployment_metadata shapes in the codebase.
  • Validation (docs/validation.md): SQLx cache regenerated via ./update_sqlx.sh. The PR body confirms cargo check --features enterprise,private is green.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover the debounced happy path (f/28103/, one batched job with key + 3 variables, skip_secret=false) and the legacy non-debouncing fallback (f/28000/, per-item jobs with flat args, no items array). Both are gated behind #[cfg(all(feature = "enterprise", feature = "private"))], follow #[sqlx::test] conventions, and have matching SQLx offline cache entries. Existing tests (workspaces, workspace_dependencies_git_sync) remain green per the PR body.
  • Frontend: no changes.
  • Manual verification: In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm one git-sync job is dispatched carrying the encryption key plus every re-encrypted secret variable, and that the remote repo receives the new ciphertexts. Also verify on a workspace whose sync script predates hub version 28103 that per-item legacy-format jobs are still queued for the key and each re-encrypted variable.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

Good to merge.

No issues found. Checked the current head diff, AGENTS.md/REVIEW.md policy, the prior non-debouncing finding, and the added SQLx cache entries.

Test coverage

Backend coverage is well targeted: the PR adds enterprise/private integration tests for both the debounced batched git-sync path and the legacy non-debouncing fallback, with SQLx offline cache entries now present for the new checked queries.

Manual verification before merge should rotate a workspace encryption key with Git Sync configured for keys, variables, and secrets, then confirm the repo receives encryption_key.yaml plus the re-encrypted secret variable ciphertexts. No frontend surface is changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

Good to merge


What this PR does

When a workspace encryption key is rotated, all secret variables are re-encrypted inside a single transaction. Before this change, only an encryption_key.yaml git-sync event was dispatched — repos with Secrets sync enabled retained stale ciphertexts until a variable was individually deployed. This PR collects each re-encrypted secret's path during the transaction and, after commit, dispatches them as a single batched git-sync job via handle_deployment_metadata_batch.

Prior issues — all resolved

  • P1 (debouncing fallback) — The EE companion PR (ee-repo-ref at 08e3b9b) now falls back to per-item legacy dispatch when is_script_meets_min_version(28103) is false or MIN_VERSION_SUPPORTS_SYNC_JOBS_DEBOUNCING is unmet. The regression test test_encryption_key_rotation_falls_back_without_debouncing covers this with a f/28000/ sync script, asserting per-item legacy-format jobs are queued.
  • P1 (missing SQLx offline cache) — Resolved: two new .sqlx/ cache files cover the INSERT INTO variable (query-b876f26c…) and SELECT j.id, j.args FROM v2_job … (query-bf601a91…) queries in the integration test.

Code quality assessment

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that re-encrypts), and handle_deployment_metadata_batch fires only after tx.commit().await?. No stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary — path is consumed by Some(path) in the parent_path field and still needed for the path field. Matches the existing convention for variable updates/deletes elsewhere in the codebase.
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is correctly gated behind #[cfg(not(feature = "private"))] and returns Ok(()) immediately.
  • lib.rs re-exports: both EE and OSS re-exports include handle_deployment_metadata_batch, correctly feature-gated per cfg(feature).
  • Auth: set_encryption_key gates on require_super_admin at line 3388 — unchanged by this PR. The new batch dispatch is internal to that function. No new public surface.
  • Input validation: the existing WORKSPACE_KEY_REGEXP check (alphanumeric, 64 chars) on request.new_key is unchanged.

AGENTS.md compliance

  • No $bindable anti-patterns, no frontend changes, no new pub fn with missing auth.
  • Rust patterns follow existing handle_deployment_metadata shapes in the codebase.
  • docs/validation.md → SQLx cache was regenerated per ./update_sqlx.sh.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover the batched happy path (f/28103/ sync script, one job with key + 3 variables, skip_secret=false) and the legacy non-debouncing fallback (f/28000/ sync script, per-item legacy-format jobs, no items array). Both are gated behind #[cfg(all(feature = "enterprise", feature = "private"))] and follow #[sqlx::test] conventions. The SQLx offline cache entries are present for all query!/query_as! macros.
  • Frontend: no changes.
  • Manual verification: In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm the configured repo receives one git-sync job carrying encryption_key.yaml plus every re-encrypted secret variable with correct ciphertexts. On a workspace whose sync script predates hub version 28103, confirm per-item legacy jobs are still queued rather than silently skipped.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

Good to merge

No issues found. The prior P1 is addressed: the EE batch path now falls back to legacy per-item jobs when sync-job debouncing is unavailable, and the added regression test covers that path.

Test coverage

Backend coverage looks appropriate: the PR adds enterprise/private integration tests for both the batched debouncing path and the older-script fallback path. I did not rerun the test suite during this read-only review.

Manual verification before merge should rotate a workspace encryption key with Git Sync + Secrets sync enabled and confirm the repo receives encryption_key.yaml plus re-encrypted secret variables, including on a sync script older than 28103.

@rubenfiszel
Copy link
Copy Markdown
Contributor Author

/ai review this PR

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 29, 2026

Claude finished @rubenfiszel's task in 2m 58s —— View job


Review: PR #9355

Verdict: Good to merge. The fix is correct and well-scoped, and every blocking issue raised across the prior review rounds is now resolved. The one still-open inline comment (cubic P2 on the post-commit .await?) is a judgment call that I agree with the author's resolution on — see below.


Correctness

  • Transaction safetyreencrypted_secret_paths is populated inside the same &mut *tx that re-encrypts each secret (workspaces.rs:3455), and the batch is built + dispatched only after tx.commit().await? (:3459). No stale-state window: the ciphertexts are committed before the git-sync events fire.
  • Batch shape[Key, Variable(p1), Variable(p2), …] is built at workspaces.rs:3468-3472. DeployedObject::Key/Variable match the enum (lib.rs:33,49), and Variable { path: path.clone(), parent_path: Some(path) } matches the established variable convention elsewhere in the codebase. The path.clone() is necessary, not redundant — path is moved into Some(path) and still needed for the path field.
  • OSS shim + gatinghandle_deployment_metadata_batch in git_sync_oss.rs is #[cfg(not(feature = "private"))] and returns Ok(()). Its signature (email, created_by, db, w_id, Vec<DeployedObject>, Option<String>) matches the call site arg-for-arg. lib.rs re-exports the new symbol on both the private and not(private) arms (lib.rs:15-23). Correct.
  • No new public surface — change is internal to set_encryption_key, already gated by require_super_admin (:3388); WORKSPACE_KEY_REGEXP validation unchanged.

Prior P1s — both resolved

  • Debouncing fallbackee-repo-ref.txt now at 08e3b9b. The EE handle_deployment_metadata_batch_inner falls back to legacy per-item dispatch when the sync script predates hub 28103 / debouncing is unavailable, instead of continue-ing past the repo. Covered by the new test_encryption_key_rotation_falls_back_without_debouncing (sync script f/28000/), which asserts key + N flat-format jobs are queued and that no items array is present.
  • SQLx offline cache — both new checked queries now have cache files. I verified the hashes are genuine (not stale/hand-edited): SHA-256 of each cached query string equals its filename hash.
    • query-b876f26c… → the INSERT INTO variable (… extra_perms, account) in insert_secret_variables
    • query-bf601a91… → the SELECT j.id, j.args … kind = 'deploymentcallback' in wait_for_deployment_callbacks
    • The UPDATE workspace_settings SET git_sync query was already cached (query-ec4f70d9…).

Residual discussion point (non-blocking)

cubic P2 — post-commit handle_deployment_metadata_batch(...).await? (workspaces.rs:3474). The concern is that an error after tx.commit() would surface as a 500 for an already-committed rotation. I agree with @rubenfiszel's decision to keep .await?:

  • The OSS shim provably returns Ok(()) with no fallible work, so OSS builds can never hit this.
  • Every other call site in this file dispatches the fire-and-forget single-object handle_deployment_metadata with .await?; keeping the batch call symmetric is the right call, and wrapping it in if let Err for a branch that can't be taken would itself violate AGENTS.md "don't add error handling for scenarios that can't happen."
  • Caveat: this rests on the EE handle_deployment_metadata_batch cloning its inputs and tokio::spawn-ing before any fallible DB work (as the author confirmed). I can't verify that directly — git_sync_ee.rs is pinned by SHA and not in this OSS checkout — but it's consistent with the sibling single-object entrypoint.

Test coverage

Two integration tests gated behind #[cfg(all(feature = "enterprise", feature = "private"))], both following #[sqlx::test] conventions: the batched happy path (f/28103/, one job carrying key + 3 variables, skip_secret=false) and the legacy fallback (f/28000/, per-item flat-format jobs). This is the right layer for the behavior. No frontend changes.

Note: I reviewed against the OSS-visible diff only; git_sync_ee.rs is pinned by ee-repo-ref and not present here, so the EE-side analysis relies on the integration tests + the author's description. I did not run a full cargo compile (the PR reports cargo check --features enterprise,private clean).


  • Read AGENTS.md & changed files
  • Inspect diff against origin/main
  • Verify prior P1s resolved — debouncing fallback (test present) + SQLx cache (hashes verified)
  • Check OSS shim / lib.rs gating & batch dispatch — correct
  • Assess test coverage — happy path + fallback path both covered
  • Post review
    · Branch

@github-actions
Copy link
Copy Markdown
Contributor

Pi Review

Good to merge


What this PR does

When a workspace encryption key is rotated, all secret variables are re-encrypted inside a single transaction. Before this change, only an encryption_key.yaml git-sync event was dispatched — repos with Secrets sync enabled retained stale ciphertexts until variables were individually deployed. This PR collects each re-encrypted secret's path during the transaction and, after commit, dispatches a single batched git-sync job carrying the key plus every re-encrypted variable via handle_deployment_metadata_batch.

Prior issues — all resolved

  • P1 (debouncing fallback): The EE companion PR (ee-repo-ref at 08e3b9b) now falls back to per-item legacy dispatch when is_script_meets_min_version(28103) is false. The regression test test_encryption_key_rotation_falls_back_without_debouncing covers this with a f/28000/ sync script, asserting per-item legacy-format jobs are queued.
  • P1 (missing SQLx offline cache): Two new .sqlx/ cache files were added in commit c84b28060, covering the INSERT INTO variable (query-b876f26c…) and SELECT j.id, j.args FROM v2_job … (query-bf601a91…) queries in the integration test.

Code quality assessment

  • Transaction safety: reencrypted_secret_paths is populated inside &mut *tx (the same transaction that performs re-encryption), and handle_deployment_metadata_batch fires only after tx.commit().await?. No stale-state window.
  • DeployedObject::Variable { path: path.clone(), parent_path: Some(path) }: the path.clone() is necessary — path is consumed by Some(path) in the parent_path field and still needed for the path field. Matches the existing convention for variable updates/deletes elsewhere in windmill-store/src/variables.rs.
  • OSS shim: handle_deployment_metadata_batch in git_sync_oss.rs is correctly gated behind #[cfg(not(feature = "private"))] and returns Ok(()) immediately.
  • lib.rs re-exports: both #[cfg(feature = "private")] and #[cfg(not(feature = "private"))] paths include handle_deployment_metadata_batch, correctly feature-gated.
  • Auth: set_encryption_key gates on require_super_admin at line 3388 — unchanged by this PR. The batch dispatch is internal to that function. No new public surface.
  • Input validation: the existing WORKSPACE_KEY_REGEXP check (alphanumeric, 64 chars) on request.new_key is unchanged.

AGENTS.md compliance

  • No $bindable anti-patterns, no frontend changes, no new pub fn with missing auth.
  • Rust patterns follow existing handle_deployment_metadata shapes in the codebase.
  • AGENTS.md step 4 → docs/validation.md: "After all code changes are done, run ./update_sqlx.sh from backend/ to regenerate the offline query cache." The .sqlx/ cache files are present and match the queries in the integration test.

Test coverage

  • Backend (Rust): Two comprehensive integration tests cover the debounced happy path (f/28103/ sync script, one batched job with key + 3 variables, skip_secret=false) and the legacy non-debouncing fallback (f/28000/ sync script, per-item legacy-format jobs, no items array). Both are gated behind #[cfg(all(feature = "enterprise", feature = "private"))], follow #[sqlx::test] conventions, and have matching SQLx offline cache entries. Existing tests (workspaces, workspace_dependencies_git_sync) remain green per the PR body.
  • Frontend: no changes.
  • Manual verification: In a workspace with Git Sync + Secrets sync enabled, rotate the encryption key and confirm the configured repo receives one git-sync job carrying encryption_key.yaml plus every re-encrypted secret variable with correct ciphertexts. On a workspace whose sync script predates hub version 28103, confirm per-item legacy jobs are still queued rather than silently skipped.

@github-actions
Copy link
Copy Markdown
Contributor

Codex Review

Good to merge

No issues found. Checked the current head diff, prior review discussion, AGENTS.md/REVIEW.md compliance, and the EE batch fallback reflected by the updated ee-repo-ref.

Test coverage

Backend coverage is appropriate: the PR adds enterprise/private integration tests for both the batched git-sync path and the legacy non-debouncing fallback, and the SQLx offline cache entries for the new checked queries are now present.

I did not rerun the test suite during this read-only review. Manual verification before merge should rotate a workspace encryption key with Git Sync + Secrets sync enabled and confirm the repo receives encryption_key.yaml plus the re-encrypted secret variable ciphertexts, including on a sync script older than 28103. No frontend surface is changed.

@rubenfiszel rubenfiszel changed the title [ee] fix: batch encryption-key rotation into one git-sync job fix: batch encryption-key rotation into one git-sync job May 29, 2026
@rubenfiszel rubenfiszel merged commit 04a0897 into main May 29, 2026
15 checks passed
@rubenfiszel rubenfiszel deleted the ruben/win-1994-fix-httpsgithubcomwindmill-labswindmillissues9344 branch May 29, 2026 05:12
@github-actions github-actions Bot locked and limited conversation to collaborators May 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Workspace re-encryption does not trigger Git sync

1 participant