fix: add admission control to block finalizers removal not made by the operator service account#2575
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
How to use the Graphite Merge QueueAdd the label main-merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has required the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Claude finished @assafgi's task in 2m 26s —— View job Claude Code Review
SummaryThe PR wires a raw admission handler that rejects UPDATEs which drop Findings posted inline — summarised here:
Recommendation: address #1 and #2 before merging — both materially affect correctness/security of the guard. The rest can be follow-ups. |
Graphite Automations"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (05/31/26)2 reviewers were added to this PR based on Anton Bykov's automation. |
| // Check whether the caller is the operator's own SA. | ||
| saPrefix := fmt.Sprintf("system:serviceaccount:%s:", config.Config.OperatorPodNamespace) | ||
| if strings.HasPrefix(req.UserInfo.Username, saPrefix) { | ||
| return admission.Allowed("") | ||
| } |
There was a problem hiding this comment.
Bug — bypass is too permissive. This prefix check matches any service account in the operator's namespace, not just the operator's own SA. The operator namespace also hosts (at minimum) {{prefix}}-maintenance (see charts/weka-operator/templates/maintenance_service_account.yaml), the CSI controller SA, and the CSI node SA. Anything running under those identities could remove the protected finalizers and defeat the guard.
Match the exact operator SA instead, e.g.:
expected := fmt.Sprintf("system:serviceaccount:%s:%s",
config.Config.OperatorPodNamespace, config.Config.OperatorServiceAccountName)
if req.UserInfo.Username == expected {
return admission.Allowed("")
}(OperatorServiceAccountName would need to be plumbed through env the same way MaintenanceSaName already is — see internal/config/env.go:392.)
| // Non-operator caller removed a protected finalizer — deny. | ||
| for _, f := range removed { | ||
| return admission.Denied( | ||
| fmt.Sprintf("removal of finalizer %q is not allowed; it is managed by the weka-operator", f)) | ||
| } | ||
|
|
||
| return admission.Allowed("") |
There was a problem hiding this comment.
Bug — dead code / wrong message when multiple finalizers are removed. The for body unconditionally returns on the first iteration, so the loop is a no-op (only removed[0] is ever surfaced) and the trailing return admission.Allowed("") on line 68 is unreachable. If a caller strips both weka.weka.io/finalizer and do-not-delete.weka.io/unsafe in a single PATCH, only one of them is mentioned in the deny message.
| // Non-operator caller removed a protected finalizer — deny. | |
| for _, f := range removed { | |
| return admission.Denied( | |
| fmt.Sprintf("removal of finalizer %q is not allowed; it is managed by the weka-operator", f)) | |
| } | |
| return admission.Allowed("") | |
| // Non-operator caller removed a protected finalizer — deny. | |
| return admission.Denied(fmt.Sprintf( | |
| "removal of finalizer(s) %v is not allowed; they are managed by the weka-operator", | |
| removed)) |
| } | ||
|
|
||
| // removedFinalizers returns protected finalizers present in old but absent in new. | ||
| func removedFinalizers(old, new []string) []string { |
There was a problem hiding this comment.
Minor: new shadows Go's builtin new. Rename to newer/current to keep the linter quiet and the call site readable.
| } | ||
|
|
||
| // Check whether the caller is the operator's own SA. | ||
| saPrefix := fmt.Sprintf("system:serviceaccount:%s:", config.Config.OperatorPodNamespace) |
There was a problem hiding this comment.
config.Config.OperatorPodNamespace is loaded via os.Getenv("POD_NAMESPACE") (internal/config/env.go:386) — there's no getEnvOrFail, so if the env is unset the prefix collapses to "system:serviceaccount::". That's defense-in-depth in the wrong direction: combined with the namespace-only check below, an unset env effectively whitelists anything whose username happens to start with system:serviceaccount:: (including arguably some malformed identities). Worth either failing fast at startup if empty, or doing an explicit empty-check here and denying.
| { | ||
| Name: "protect-finalizers.wekacluster.weka.io", | ||
| AdmissionReviewVersions: []string{"v1"}, | ||
| SideEffects: &sideEffects, | ||
| FailurePolicy: &failurePolicy, | ||
| TimeoutSeconds: &timeoutSeconds, | ||
| ClientConfig: admissionregistrationv1.WebhookClientConfig{ | ||
| Service: &admissionregistrationv1.ServiceReference{ | ||
| Namespace: m.namespace, | ||
| Name: m.config.ServiceName, | ||
| Path: &finalizerPath, | ||
| }, | ||
| CABundle: caBundle, | ||
| }, | ||
| Rules: []admissionregistrationv1.RuleWithOperations{ | ||
| { | ||
| Operations: []admissionregistrationv1.OperationType{ | ||
| admissionregistrationv1.Update, | ||
| }, | ||
| Rule: admissionregistrationv1.Rule{ | ||
| APIGroups: []string{"weka.weka.io"}, | ||
| APIVersions: []string{"v1alpha1"}, | ||
| Resources: []string{"wekaclusters"}, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| { | ||
| Name: "protect-finalizers.wekaclient.weka.io", | ||
| AdmissionReviewVersions: []string{"v1"}, | ||
| SideEffects: &sideEffects, | ||
| FailurePolicy: &failurePolicy, | ||
| TimeoutSeconds: &timeoutSeconds, | ||
| ClientConfig: admissionregistrationv1.WebhookClientConfig{ | ||
| Service: &admissionregistrationv1.ServiceReference{ | ||
| Namespace: m.namespace, | ||
| Name: m.config.ServiceName, | ||
| Path: &finalizerPath, | ||
| }, | ||
| CABundle: caBundle, | ||
| }, | ||
| Rules: []admissionregistrationv1.RuleWithOperations{ | ||
| { | ||
| Operations: []admissionregistrationv1.OperationType{ | ||
| admissionregistrationv1.Update, | ||
| }, | ||
| Rule: admissionregistrationv1.Rule{ | ||
| APIGroups: []string{"weka.weka.io"}, | ||
| APIVersions: []string{"v1alpha1"}, | ||
| Resources: []string{"wekaclients"}, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| { | ||
| Name: "protect-finalizers.wekacontainer.weka.io", | ||
| AdmissionReviewVersions: []string{"v1"}, | ||
| SideEffects: &sideEffects, | ||
| FailurePolicy: &failurePolicy, | ||
| TimeoutSeconds: &timeoutSeconds, | ||
| ClientConfig: admissionregistrationv1.WebhookClientConfig{ | ||
| Service: &admissionregistrationv1.ServiceReference{ | ||
| Namespace: m.namespace, | ||
| Name: m.config.ServiceName, | ||
| Path: &finalizerPath, | ||
| }, | ||
| CABundle: caBundle, | ||
| }, | ||
| Rules: []admissionregistrationv1.RuleWithOperations{ | ||
| { | ||
| Operations: []admissionregistrationv1.OperationType{ | ||
| admissionregistrationv1.Update, | ||
| }, | ||
| Rule: admissionregistrationv1.Rule{ | ||
| APIGroups: []string{"weka.weka.io"}, | ||
| APIVersions: []string{"v1alpha1"}, | ||
| Resources: []string{"wekacontainers"}, | ||
| }, | ||
| }, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Availability risk worth calling out. All three finalizer-protection webhooks use failurePolicy: Fail and have no objectSelector escape hatch (by design — the comment on line 465 is explicit). The combination means: if the webhook pod is unreachable, every UPDATE to wekacluster/wekaclient/wekacontainer is rejected — including the operator's own status writes and finalizer-removal updates. The only documented recovery path is the operator-level enableAdmissionControl: false toggle (CleanupIfExists), which is a deliberate operator action.
This is probably the intended tradeoff, but please confirm the runbook covers it — in particular, a webhook-down scenario during a cluster teardown will block the operator's own deletion flow, not just unauthorized actors. A namespaceSelector excluding the operator's own namespace, or a separate webhook entry for the operator SA with Ignore, are options if the risk is unacceptable.
Three near-identical webhook entries also beg for a small loop — for _, res := range []string{"wekaclusters", "wekaclients", "wekacontainers"} would cut ~60 lines without losing clarity.
5293e20 to
da86c0d
Compare
6ab69e3 to
599708c
Compare
…e operator service account
599708c to
c9d3508
Compare
|
Claude finished @assafgi's task in 1m 28s —— View job Claude Code Re-Review (post-update)
SummaryThe latest revision addresses almost all of the previous review feedback. Nice turnaround.
Verification
Remaining notes (non-blocking)
Recommendation: the blocking correctness/security issues are resolved. Adding the unit test would be the last thing I'd want before merge given this guards a deletion-protection boundary, but it's reasonable as a fast follow-up. |
There was a problem hiding this comment.
Pull request overview
This PR adds an admission-control safeguard to prevent removal of operator-managed finalizers unless the request is made by the operator’s own Kubernetes ServiceAccount, strengthening protection against forced cleanup by other identities.
Changes:
- Introduces a new validating admission webhook endpoint (
/protect-finalizers) that denies protected finalizer removal by non-operator service accounts. - Extends the ValidatingWebhookConfiguration to include finalizer-protection webhooks for Weka CRDs without an objectSelector escape hatch.
- Adds a new environment/config value for the operator ServiceAccount name and wires it via the Helm chart.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| internal/config/env.go | Adds config/env binding for operator ServiceAccount name used for webhook caller identity verification. |
| internal/admission/manager.go | Adds /protect-finalizers webhook path and appends new finalizer-protection webhooks into the VWC. |
| internal/admission/finalizer_protection.go | Implements and registers the admission handler that enforces finalizer-removal restrictions. |
| cmd/manager/main.go | Registers the new finalizer-protection webhook when admission is enabled. |
| charts/weka-operator/templates/manager.yaml | Injects WEKA_OPERATOR_SERVICE_ACCOUNT_NAME into the manager Deployment env. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Config.BindAddress.Metrics = getEnvOrFail("OPERATOR_METRICS_BIND_ADDRESS") | ||
| Config.BindAddress.HealthProbe = getEnvOrFail("HEALTH_PROBE_BIND_ADDRESS") | ||
| Config.MaintenanceSaName = getEnvOrFail("WEKA_OPERATOR_MAINTENANCE_SA_NAME") | ||
| Config.OperatorServiceAccountName = getEnvOrFail("WEKA_OPERATOR_SERVICE_ACCOUNT_NAME") | ||
| Config.OcpCompatibility.DriverToolkitSecretName = getEnvOrFail("WEKA_OCP_PULL_SECRET") |

No description provided.