-
Notifications
You must be signed in to change notification settings - Fork 312
feat: Use RequiresRepublish for secret rotation #1622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @dargudear-google. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach broadly LGTM! IIUC, once this PR is submitted, auto-rotation won't work for k8s clusters < 1.21. What is the process to announce and manage this breaking change?
Discussed in the last community call that we should publish that "For clusters < 1.21, please use v.1.4.5 or earlier." |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1622 +/- ##
==========================================
- Coverage 35.83% 32.07% -3.77%
==========================================
Files 63 57 -6
Lines 3759 3838 +79
==========================================
- Hits 1347 1231 -116
- Misses 2268 2501 +233
+ Partials 144 106 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
/ok-to-test |
aa6ffcb
to
8d332b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass.
In bats tests, where sleep duration/timeout is increased, could you re-evaluate if it's necessary? If yes then could you add a note to make sure we have context?
9ca55a9
to
a7f4796
Compare
acf1519
to
23e8f92
Compare
/test pull-secrets-store-csi-driver-e2e-gcp |
Tested the newly added e2e test for GCP provider. @aramase Can you please take a look? |
@aramase are you still planning on reviewing this PR? I'd like to see this functionality get in |
yes, we're currently working on the next release for the driver. I'll review this week after the release is complete. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dargudear-google, jainsuyogj The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another pass.
if isRemountRequest { | ||
// Mask error until fix available for https://github.com/kubernetes/kubernetes/issues/121271 | ||
return &csi.NodePublishVolumeResponse{}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the remount request scenario, if we reach this point, it means the rotation succeeded (the files were updated in the mount). However, if we failed to update the SecretProviderClassPodStatus, it's inconsistent—the SPCPS still reflects the old versions of the objects, while the files in the mount are newer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This inconsistency is temporary and should be gone once kubernetes/kubernetes#121271 get fixed.
@dargudear-google: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/retest-required |
Yes, this will need to be merged before we can merge #1755. We want to include both changes in the next minor release. I still need to review the recent updates. I'm currently in the middle of the Kubernetes v1.34 enhancements freeze but will take a look soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my main suggestion would be to read the CSIDriver into a local cache and have NodePubilshVolume()
read the requiresRepublish
value from that local cache. That way users can avoid a CSI Driver restart if they change the value
tokenRequests: | ||
{{- toYaml .Values.tokenRequests | nindent 2 }} | ||
requiresRepublish: {{ .Values.requiresRepublish }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the hard-coded entry above this since its conditional here?
@@ -239,6 +239,10 @@ tokenRequests: [] | |||
# - audience: aud1 | |||
# - audience: aud2 | |||
|
|||
# To set the requiresRepublish which can be used to refresh the mounted secret periodically | |||
# refer to https://kubernetes-csi.github.io/docs/token-requests.html for more details. | |||
# Supported only for Kubernetes v1.20+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K8s 1.20 has long-since been deprecated. Can we scrub references to it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aramase WDYT?
@@ -190,14 +201,7 @@ func (ns *nodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublis | |||
// and send it to the provider in the parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment block? its no longer needed
// rotationConfig stores the information required to rotate the secrets. | ||
type rotationConfig struct { | ||
enabled bool | ||
rotationPollInterval time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a poll interval? it seems more like a cache time for NodePublishVolume()
to not act any sooner than this duration.
@@ -47,7 +44,7 @@ type nodeServer struct { | |||
// This should be used sparingly and only when the client does not fit the use case. | |||
reader client.Reader | |||
providerClients *PluginClientBuilder | |||
tokenClient *k8s.TokenClient | |||
rotationConfig *rotationConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of instead of having a rotationConfig that isn't thread-safe or configurable, using a (wrapped) K8s client to dynamically read this value off the CSIDriver?
Right now, if a user changes requiresRepublish
on their driver configuration, they'll have to restart every CSI driver pod to update if this is enabled, because you're setting this value once in main.go
.
In my PR I created a CSIDriver client that would just watch the one CSIDriver resource, and NodePublishVolume()
would read from the local cache.
Remove the rotation controller and rely exclusively on RequiresRepublish for secret rotation.