Skip to content

feat: Use RequiresRepublish for secret rotation #1622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

dargudear-google
Copy link
Contributor

Remove the rotation controller and rely exclusively on RequiresRepublish for secret rotation.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @dargudear-google. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 1, 2024
@dargudear-google dargudear-google changed the title Use in secret rotation Use RequiresRepublish in secret rotation. Sep 7, 2024
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 7, 2024
@dargudear-google dargudear-google marked this pull request as ready for review September 10, 2024 13:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 10, 2024
@k8s-ci-robot k8s-ci-robot requested a review from ritazh September 10, 2024 13:30
Copy link

@amitmodak amitmodak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach broadly LGTM! IIUC, once this PR is submitted, auto-rotation won't work for k8s clusters < 1.21. What is the process to announce and manage this breaking change?

@dargudear-google
Copy link
Contributor Author

Approach broadly LGTM! IIUC, once this PR is submitted, auto-rotation won't work for k8s clusters < 1.21. What is the process to announce and manage this breaking change?

Discussed in the last community call that we should publish that "For clusters < 1.21, please use v.1.4.5 or earlier."

@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 32.55814% with 58 lines in your changes missing coverage. Please review.

Project coverage is 32.07%. Comparing base (87f51ec) to head (76e6600).
Report is 195 commits behind head on main.

Files with missing lines Patch % Lines
...rollers/secretproviderclasspodstatus_controller.go 13.15% 31 Missing and 2 partials ⚠️
pkg/secrets-store/nodeserver.go 58.06% 10 Missing and 3 partials ⚠️
pkg/secrets-store/secrets-store.go 20.00% 8 Missing ⚠️
pkg/secrets-store/utils.go 50.00% 2 Missing and 1 partial ⚠️
cmd/secrets-store-csi-driver/main.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1622      +/-   ##
==========================================
- Coverage   35.83%   32.07%   -3.77%     
==========================================
  Files          63       57       -6     
  Lines        3759     3838      +79     
==========================================
- Hits         1347     1231     -116     
- Misses       2268     2501     +233     
+ Partials      144      106      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nilekhc
Copy link
Contributor

nilekhc commented Oct 3, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 3, 2024
@dargudear-google dargudear-google force-pushed the rotation branch 7 times, most recently from aa6ffcb to 8d332b7 Compare October 7, 2024 09:20
Copy link
Contributor

@nilekhc nilekhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.

In bats tests, where sleep duration/timeout is increased, could you re-evaluate if it's necessary? If yes then could you add a note to make sure we have context?

@dargudear-google
Copy link
Contributor Author

/test pull-secrets-store-csi-driver-e2e-gcp

@dargudear-google
Copy link
Contributor Author

Tested the newly added e2e test for GCP provider. @aramase Can you please take a look?

@micahhausler
Copy link
Member

@aramase are you still planning on reviewing this PR? I'd like to see this functionality get in

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2025
@aramase
Copy link
Member

aramase commented Apr 8, 2025

@aramase are you still planning on reviewing this PR? I'd like to see this functionality get in

yes, we're currently working on the next release for the driver. I'll review this week after the release is complete.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dargudear-google, jainsuyogj
Once this PR has been reviewed and has the lgtm label, please ask for approval from aramase. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pass.

Comment on lines +258 to +261
if isRemountRequest {
// Mask error until fix available for https://github.com/kubernetes/kubernetes/issues/121271
return &csi.NodePublishVolumeResponse{}, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the remount request scenario, if we reach this point, it means the rotation succeeded (the files were updated in the mount). However, if we failed to update the SecretProviderClassPodStatus, it's inconsistent—the SPCPS still reflects the old versions of the objects, while the files in the mount are newer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inconsistency is temporary and should be gone once kubernetes/kubernetes#121271 get fixed.

@enj enj moved this to Subprojects - Needs Triage in SIG Auth Apr 29, 2025
@enj enj added this to SIG Auth Apr 29, 2025
@aramase aramase moved this from Subprojects - Needs Triage to Changes Requested in SIG Auth Apr 29, 2025
@dargudear-google dargudear-google requested a review from aramase May 14, 2025 18:29
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 14, 2025

@dargudear-google: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
release-secrets-store-csi-driver-e2e-azure 50ac633 link true /test release-secrets-store-csi-driver-e2e-azure

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dargudear-google
Copy link
Contributor Author

/retest-required

@ThirdEyeSqueegee
Copy link

Hi @aramase, are y'all still expecting to include this and #1755 in the next release? We'd love to see both these features make it into the driver to support our use case.

@aramase
Copy link
Member

aramase commented Jun 11, 2025

Hi @aramase, are y'all still expecting to include this and #1755 in the next release? We'd love to see both these features make it into the driver to support our use case.

Yes, this will need to be merged before we can merge #1755. We want to include both changes in the next minor release. I still need to review the recent updates. I'm currently in the middle of the Kubernetes v1.34 enhancements freeze but will take a look soon.

Copy link
Member

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main suggestion would be to read the CSIDriver into a local cache and have NodePubilshVolume() read the requiresRepublish value from that local cache. That way users can avoid a CSI Driver restart if they change the value

tokenRequests:
{{- toYaml .Values.tokenRequests | nindent 2 }}
requiresRepublish: {{ .Values.requiresRepublish }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the hard-coded entry above this since its conditional here?

@@ -239,6 +239,10 @@ tokenRequests: []
# - audience: aud1
# - audience: aud2

# To set the requiresRepublish which can be used to refresh the mounted secret periodically
# refer to https://kubernetes-csi.github.io/docs/token-requests.html for more details.
# Supported only for Kubernetes v1.20+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s 1.20 has long-since been deprecated. Can we scrub references to it now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aramase WDYT?

@@ -190,14 +201,7 @@ func (ns *nodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublis
// and send it to the provider in the parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment block? its no longer needed

// rotationConfig stores the information required to rotate the secrets.
type rotationConfig struct {
enabled bool
rotationPollInterval time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a poll interval? it seems more like a cache time for NodePublishVolume() to not act any sooner than this duration.

@@ -47,7 +44,7 @@ type nodeServer struct {
// This should be used sparingly and only when the client does not fit the use case.
reader client.Reader
providerClients *PluginClientBuilder
tokenClient *k8s.TokenClient
rotationConfig *rotationConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of instead of having a rotationConfig that isn't thread-safe or configurable, using a (wrapped) K8s client to dynamically read this value off the CSIDriver?

Right now, if a user changes requiresRepublish on their driver configuration, they'll have to restart every CSI driver pod to update if this is enabled, because you're setting this value once in main.go.

In my PR I created a CSIDriver client that would just watch the one CSIDriver resource, and NodePublishVolume() would read from the local cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.