KEP 5339 - ClusterProfile Credentials external providers #5338

corentone · 2025-05-23T20:40:59Z

Introduce a way to out-of-controller-tree load and use credentials when using ClusterProfile

Issue link: N/A

Other comments:
- see slack thread: https://kubernetes.slack.com/archives/C09R1PJR3/p1747786073248159

…nhancements into clusterprofile-creds

k8s-ci-robot · 2025-05-23T20:41:09Z

Hi @corentone. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

michaelawyu · 2025-05-26T15:03:30Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

+
+In order to populate the Cluster object that the exec provider requires, we standardize the following properties.
+
+Property: `multicluster.sigs.io/execprovider-cluster`


Hi corentone! This information would be exposed to all readers of the Cluster Profile object, which I fear might incur some security concerns?

Another part is that different plugins might require different cluster-specific information -> would this be a free-form property?

Each plugin would also need some application specific information to make the authentication work. Those obviously are not ideal to be exposed directly on Cluster Profile; I am wondering if we could/should formulate a way where application specific information can be discovered as well.

Regarding privacy of the endpoint, I'm wondering if it is really something we should hide. I don't think we should tbh. ClusterProfiles would still be protected by RBAC so I think the endpoint is not a sensitive thing to store.

Regarding the cluster-specific info, could you share more details? I do suggest the ClusterManager provides the full data struct that would contain all the fields that are in the Cluster object defined above. Do you think we should allow modifications to it?

My assumptions is that the plugin would be installed or setup in such a way that it would be able to use the pod's volumes, service accounts etc. So they would have access to the KSA especially and other envvars that may help identify the controller.

My concern is mostly that, Cluster Profile objects are something that many different agents would access; and a lot of such agents do not use the API for authentication (they could be doing scheduling work, observability work, etc.). RBAC can help, but the thing is, with the current RBAC rules we only have object-level granularity so if the user grants an agent access to Cluster Profiles, it will have access to the auth-related data, even if the agent does not need it. Using object references, on the other hand, would allow users to set RBAC rules for the auth data specifically. Though I do agree too that data such as API server address/CA bundle are not as sensitive as tokens.

As for the cluster-specific and application-specific data, we might have gone over this brief on the Slack channel; basically, depending on how the authn workflow is implemented (there are obviously a variety of options), each plugin would need its own set of inputs.

For example:

for a plugin that, let's just assume can magically (and securely) acquires a service account token out of thin air, it would need the following items to complete the auth flow:

(cluster-specific info.): API server address and CA bundle

(application-specific info.): the service account name that represents the application (that invokes the plugin)

for a plugin that uses federated identities, it would need:

(cluster-specific info.): API server address, CA bundle, expected audience for token exchange, and some platform specific information (on AKS -> tenant ID, on GKE -> workload identity provider full ID)

(application-specific info.): (on AKS -> client ID)

for a plugin that uses SPIRE, it would need:

(cluster-specific info.): API server address, CA bundle, expected audience

(application-specific info.): spiffe ID

Considering flexibility reasons, we might need to use something free-form I assume? Of course we could just assume that everything is read from local setup (mounted volumes, configMaps, etc.). but it kinds of defeats the benefits of having cluster profiles (esp. for the cluster-specific information part). If we could have application-specific information linked with Cluster Profiles objects (this one must be done via object references, for obvious reasons), it could further simplify the flow I guess.

This might be more relevant to another comment I posted, but I am wondering if we could have something like (modifying the Cluster Profile API):

... spec: authProviders: - type: aksoidc clusterInfo: # we could do either inline (arbitrary map)... caData: ... server: ... audience: ... tenantID: ... # or use object refs objectRef: .... targetApplications: - name: argo applicationInfo: objectRef: ... - name: multi-kueue applicationInfo: objectRef: ...

This way each type of plugins could work (exec typed or libraries, not suggesting that this KEP should touch the latter, though), and it saves the trouble of users having to mount a bunch of information manually just for auth purposes, esp. considering that the info would be cluster-specific. RBAC could be applied to the ref'd objects for better security (and less cluttering?).

Of course the example is just an initial thought for demonstration purposes.

application specific info are provided by the controller when setting up the plugin, never by cluster profile. ClusterProfile is not aware of the applications using it. Cluster profile says what "plugin types" work with it.
Now yes the application will have to provide that data to the plugin. That will likely be done via volume, configmaps, envvars etc. Given the plugin is exec'ed, it has a access to the same data as the controller itself (volumes/FS, environment, KSA...).
IMO having application-specific information is part of what doomed the previous KEP on credentials and we can't make that mistake again. It would require ClusterProfile to know about all the applications that can use it and how they are setup. I don't see that approach working.

The clusterprofile needs to provide all the cluster-specific information (endpoints and connectivity as well as plugin types and pointers to what authN mechanism is recommended).

Let me do another pass at the KEP to provide a new ClusterProfile example as well as a Controller example.
Fwiw, for GKE, it would leverage the federated identity, which can be obtained just by calling the metadata server, so clusterprofile doesn't need to provide us with anything.

Having the application specific info via object refs could help simplify the process, but I would understand if it's something that we want to leave for the application itself to set up, i.e., taking an opinionated approach on the API part and making sure that any sensitive info cannot be added). To me personally, the reason why the last push wasn't successful is mostly that we were trying to encourage a practice (passing tokens across cluster boundaries) that has security concerns (even though it is commonly used) 😞.

Looking forward to seeing the new pass. It would be really, really nice if we could:

on the API level having neutral names so that no limits are implicitly imposed on how the flow is implemented (of course for the KEP, and the current efforts in the WG, I totally understand that we are taking a exec plugin focused approach, but on the API side we probably shouldn't enforce this?)

we would have a field that allows placement of arbitrary cluster-specific authentication info as different authn workflows might have different requirements (though some fields might be very commonly used). I fear that not all platforms would handle the flow as GKE does (metadata server), especially for those who have a hybrid environment.

I do think we should enforce exec for now. at least for the first pass. Then we can see how people react and bake in some flows or allow people to bake in their own flow.
Regarding the other flows, thats the beauty of it, the ClusterManager could write multiple flows for the same CP. Also the plugin type allows to pick different binaries if needed for the same set of config.

michaelawyu · 2025-05-26T15:04:23Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

+The plugin is designated by the clusterprofile, which designates the type of provider that is expected to be called.
+The clusterprofile has a property to designate the plugin.
+
+Property: `multicluster.sigs.io/execprovider-name`


Hi corentone! I assume that here we implicitly require that one cluster could have only one provider?

yes but the controller could decide what plugin to use. Which also means they could use a different binary for a given provider name.

For ex, the cluster could be GKE, but as a controller I could use the official-gke-fleet plugin or official-gke-as-user plugin or i could write my own: gke-custom-plugin mechanism.
Any cluster seen as GKE would use the chosen plugin.

michaelawyu · 2025-05-26T15:21:23Z

I am wondering if we should, since the provider might not necessarily be exec plugin based, use more neutral names for the flow. It also feels a bit that the feature might warrant its own field in the API rather than living as an extensible properties...

corentone · 2025-05-27T06:00:34Z

I am wondering if we should, since the provider might not necessarily be exec plugin based, use more neutral names for the flow. It also feels a bit that the feature might warrant its own field in the API rather than living as an extensible properties...

I do think it should be only exec-based, that way we have a single path for now and directly start out of tree.
I do agree on the dedicated field, thats a good point. Let's see what others lean towards and Ill make the changes

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

stlaz

It's not clear to me how the proposed library retrieves configuration for the exec credential plugins it is supposed to use. Is the expectation for the plugins to be configured in the controller's default kubeconfig?

There are likely to be subtle differences for each controller (e.g. client ID/secret in OIDC) but the base would be the same.

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

stlaz · 2025-05-29T15:15:43Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

+to attach a different binary name or path for the the binary. It is expected that the library will have a mapping from
+its supported plugin names to the expected binary to call. It is expected that the mapping is provided via a flag on the controller.


Wouldn't mapping binaries make this vendor/implementation specific? It's probably not useful to just hardcode the binary if you don't know how it's configured, right? I.e. hardcoding a binary only specifies the Command part of the ExecConfig.

I added clarification between the string and the plugin.

The string to map to plugin is the "credentials type", then the controller has to setup a plugin for that "credentials type".

A credentials Type could be: google, azure, direct and then the controller would have to plug the right plugin (there may be multiple that works for a given credentials).

For google, I'm guessing one could use workload identity and therefore use a plugin called google-with-wi which would read local files/env to use WI that is a google creds. That same, google identity could map to a google-user which would use an interactive login?

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

corentone · 2025-05-29T23:31:59Z

I closed discussions that I think I addressed. Please reopen if you feel I didn't address or open a new comment if you have new questions following the changes. thanks!

mikeshng · 2025-06-02T18:40:52Z

CC @zhujian7 if you want to take a look. Thanks. POC work in here: kubernetes-sigs/cluster-inventory-api#17

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

lauralorenz · 2025-06-03T16:41:03Z

Triage notes: Still looking for more feedback as of this point and continue conversation on existing comments. @corentone intending to look into a POC. Close to being something we want hard lgtms and approves on so please look at it with that energy in mind.

stlaz

I took a little back to think about it outside what's suggested in this KEP.

I'm curious - what would be an example of a controller that should run outside its target cluster?

Today, we usually use a kubeconfig to represents client settings to connect to a cluster, though in this case we cannot use any kind of credential references because there shall be no shared creds, and so using its hollowed-out form like you're suggesting here seems correct.

The part I do not like too much though is that we're referencing some plugins that we do not define at all - neither their configuration, nor their delivery into the end pods.

It is very likely that a ClusterManager knows both the spoke cluster's endpoint information and the proper way to authenticate to it from the outside. It would feel natural that the ClusterManager would therefore play role in the credential exchange for these spoke clusters, be it as a mediator (providing an endpoint to exchange a controller's identity for target's cluster creds using a defined API) or as the authoritative source for remote cluster authentication configuration (publishing it in the ClusterProfile in some shape or form), including instructions for retrieval of the exec plugin binary (e.g. image name and mount path for Image Volumes).

As per one of the discussions in this PR, the latter suggestion that still would use the exec creds plugins might not work too well since delivering the binary into the pods might be tricky.

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

corentone · 2025-06-09T21:55:23Z

@stlaz

I'm curious - what would be an example of a controller that should run outside its target cluster?

I'm not sure what you mean.

If you mean, the ClsuterProfile points to a different target cluster.

This is the most common case. For example, we would have a bunch of cluster profiles in a hub cluster and the central controller on the hub clsuter would write resources to the target clsuters... which it doesn't run on. Hence to achieve that it needs credentials.

If you mean, the clusterProfile is accessed OUTSIDE of a cluster, for example, when debugging a controller that uses clusterProfile (and maybe reading CPs from a hub cluster the user has a access to). I think that's where the pattern described in the KEP comes in handy, we may be able to reuse the plugins that are designed for user kubeconfigs.

bsalamat

/lgtm

k8s-ci-robot · 2025-06-09T22:03:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bsalamat, corentone
Once this PR has been reviewed and has the lgtm label, please assign jeremyot for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-multicluster/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

qiujian16 · 2025-06-10T01:40:01Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

 	// InsecureSkipTLSVerify skips the validity check for the server's certificate.
 	// This will make your HTTPS connections insecure.
 	// +optional
-	InsecureSkipTLSVerify bool
+	InsecureSkipTLSVerify bool `json:"insecure-skip-tls-verify,omitempty"`
 	// CAData contains PEM-encoded certificate authority certificates.
 	// If empty, system roots should be used.
 	// +listType=atomic


We cannot put this into CRD because the listType setting here.

why?

// Example: // // +listType=atomic // // Atomic lists will be entirely replaced when updated. This tag may be // used on any type of list (struct, scalar, ...). // // Using this tag will generate the following OpenAPI extension: // // "x-kubernetes-list-type": "atomic"

whats wrong with this? I think it applies only to CertificateAuthorityData field.

the field is a []byte, not a list or scalar. CRD generation will fail with the error must apply listType to an array, found string

ah bummer. maybe we could check with the original library, this may be a miss on their end?

See the comment in the original file:

To ensure that this struct contains everything someone would need to communicate
with a kubernetes cluster (just like they would via a kubeconfig), the fields
should shadow "k8s.io/client-go/tools/clientcmd/api/v1".Cluster, with the exception
of CertificateAuthority, since CA data will always be passed to the plugin as bytes.

The data type exported to clients is https://github.com/kubernetes/kubernetes/blob/a34c07971b610eb33908a743eadb4c61beeecc50/staging/src/k8s.io/client-go/pkg/apis/clientauthentication/v1/types.go#L78 which doesn’t have CertificateAuthorityData.

https://github.com/kubernetes/client-go/blob/master/tools/clientcmd/api/types.go#L69-L106

Then we should use this one :) Hopefully it works for CRDs.

qiujian16 · 2025-06-10T01:42:21Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

-  credentialProviders map[string]CredentialsConfig // mapping of credentials types to their config. In some cases the cluster may recognize different identity types and they may have different endpoints or TLS config.
+  // +listType=map
+  // +listMapKey=name
+  credentialProviders []CredentialsConfig // mapping of credentials types to their config. In some cases the cluster may recognize different identity types and they may have different endpoints or TLS config.


Since in clusterProfile spec, there is clusterManager field which means a clusterPofile can only be owned by a single clusterManager. Do we still need this to be a list here, such that controller has to select one of the config from it. Or it is clusterManager's job should be responsible to provide a single config?

I think even if its a single ClusterManager, there could be multiple paths to reach the cluster.
For example, a GKE cluster could be accessed via GKE Connect Gateway (the fleet endpoint) or via GKFE (the normal endpoint). Another example would be a cluster that is accessible from onprem (but requires proxy and special login etc etc) and another path via a proxy that does special checks. Also in case of transitions... a platform admin may want to offer alternative endpoints and offer a way to slowly migrate.

In general it doesn't cost us much to support a list and I think it adds flexibility.

ryanzhang-oss · 2025-06-10T06:52:51Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

+The library implementation flow is expected to be as follow:
+
+1. Build the endpoint details of the cluster by reading properties of the ClusterProfile
+2. Call the CredentialsExternalProviders, following the same flow defined in [KEP 541](https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/541-external-credential-providers/README.md)


similar question as @stlaz, since the kubectl has built-in exec mechanism, I wonder if it's enough to just return the exec config in a kubeConfig file? The library basically runs the code from the auth package, I wonder if there is any advantage of this approach?

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

k8s-ci-robot · 2025-06-10T18:37:26Z

New changes are detected. LGTM label has been removed.

ryanzhang-oss · 2025-06-16T23:43:15Z

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

+
+* Provide a library for controllers to obtain credentials for a cluster represented by a ClusterProfile
+* Allow cluster managers to provide a method to obtain credentials that doesn't require to be embedded into the controller code
+  and recompiling.


I wonder how can the controller call the library without "recompiling"? aka, who is going to call BuildConfigFromClusterProfile?

also, I wonder if this is actually a goal or good to have

Imagine you are a controller writer and you want to share your controller, for example ArgoCD or MultiKueue.
You don't want to compile the controller for ALL providers. You want to ship a single docker image (or binary version/package).

If recompilation was required, that means that any divergence in credentials provider would require to fork, edit and rebuild any controller you want to use. I know for me, I would definitely not want having to recompile argocd to install it on my GKE Fleet. Nor I don't want to make PRs to add GKE support into argocd.

skitt · 2025-06-17T09:04:32Z

/retest

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md

corentone · 2025-06-17T23:08:10Z

/retest

k8s-ci-robot · 2025-06-17T23:08:24Z

@corentone: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

skitt · 2025-06-18T07:27:16Z

/ok-to-test

corentone and others added 4 commits May 21, 2025 18:50

KEP:NNNN ClusterProfile plugin credentials

da743b7

Update kep.yaml

3cf22c4

kep.yaml

a0f36aa

Merge branch 'clusterprofile-creds' of https://github.com/corentone/e…

2e74c3c

…nhancements into clusterprofile-creds

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 23, 2025

k8s-ci-robot requested review from JeremyOT and skitt May 23, 2025 20:41

corentone changed the title ~~KEP XXX - ClusterProfile Credentials external providers~~ KEP 5338 - ClusterProfile Credentials external providers May 23, 2025

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 23, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 23, 2025

add kep number

19eadf2

corentone mentioned this pull request May 23, 2025

sig-multicluster's ClusterProfile should have a way to support credentials issuance #5339

Open

4 tasks

corentone changed the title ~~KEP 5338 - ClusterProfile Credentials external providers~~ KEP 5339 - ClusterProfile Credentials external providers May 23, 2025

corentone added 3 commits May 23, 2025 21:05

kep number

5684dda

comment out missing pieces

6eb49b7

comment out missing pieces

984a758

michaelawyu reviewed May 26, 2025

View reviewed changes

skitt reviewed May 27, 2025

View reviewed changes

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md Outdated Show resolved Hide resolved

skitt reviewed May 27, 2025

View reviewed changes

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md Outdated Show resolved Hide resolved

ryanzhang-oss reviewed May 27, 2025

View reviewed changes

stlaz reviewed May 29, 2025

View reviewed changes

made a field

4c1f5ce

bsalamat reviewed Jun 2, 2025

View reviewed changes

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md Show resolved Hide resolved

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md Outdated Show resolved Hide resolved

qiujian16 reviewed Jun 3, 2025

View reviewed changes

stlaz reviewed Jun 5, 2025

View reviewed changes

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md Outdated Show resolved Hide resolved

add diagram and small edits

9996bd7

bsalamat approved these changes Jun 9, 2025

View reviewed changes

k8s-ci-robot assigned bsalamat Jun 9, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2025

qiujian16 reviewed Jun 10, 2025

View reviewed changes

ryanzhang-oss reviewed Jun 10, 2025

View reviewed changes

more reviews

444d749

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2025

ryanzhang-oss reviewed Jun 17, 2025

View reviewed changes

skitt reviewed Jun 17, 2025

View reviewed changes

keps/sig-multicluster/5339-clusterprofile-plugin-credentials/README.md Outdated Show resolved Hide resolved

ran toc update and fixed couple formatting issues

2650aed

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 18, 2025

corentone mentioned this pull request Jun 19, 2025

✨ Cluster Inventory(ClusterProfile) API Provider kubernetes-sigs/multicluster-runtime#48

Open


		In order to populate the Cluster object that the exec provider requires, we standardize the following properties.

		Property: `multicluster.sigs.io/execprovider-cluster`

		to attach a different binary name or path for the the binary. It is expected that the library will have a mapping from
		its supported plugin names to the expected binary to call. It is expected that the mapping is provided via a flag on the controller.

KEP 5339 - ClusterProfile Credentials external providers #5338

Are you sure you want to change the base?

KEP 5339 - ClusterProfile Credentials external providers #5338

Conversation

corentone commented May 23, 2025

Uh oh!

k8s-ci-robot commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu commented May 26, 2025

Uh oh!

corentone commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stlaz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

corentone commented May 29, 2025

Uh oh!

mikeshng commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lauralorenz commented Jun 3, 2025

Uh oh!

stlaz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

corentone commented Jun 9, 2025

Uh oh!

bsalamat left a comment

michaelawyu May 28, 2025 •

edited

Loading

michaelawyu May 29, 2025 •

edited

Loading

qiujian16 Jun 10, 2025 •

edited

Loading