-
Notifications
You must be signed in to change notification settings - Fork 517
Use container registry for authenticating kubernetes image pull #3771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Use container registry for authenticating kubernetes image pull #3771
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving this, because it (almost) hits all the right notes:
- docker credentials might be temporary, especially if they are coming from service connectors. Your solution always fetches the latest credentials and patches the existing secret instead of configuring the secret once and then forgetting about it. This is one of the challenges I feared we'd hit with this feature and you nailed it.
- you might have different credentials for different pipelines, so you're not attaching the credentials to the service account, you're attaching them directly to the pods. This offers a nice isolation where secrets are only used by the pods that need them.
- the secrets are only needed when the pod starts, then you can forget about them. This is what's still missing: these secrets will pile up (each of them uses the pipeline run ID/name, right) until you have thousands of secrets. You need to clean up the secrets that are no longer in use - e.g. the k8s orchestrator pod could clean up after itself.
Aside from this, there are some design concerns that Claude didn't get right.
src/zenml/integrations/kubernetes/service_connectors/kubernetes_service_connector.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/kubernetes/orchestrators/manifest_utils.py
Outdated
Show resolved
Hide resolved
if registry_uri in ("docker.io", "index.docker.io"): | ||
registry_server = "https://index.docker.io/v1/" | ||
else: | ||
# Extract just the domain part for better image matching | ||
if registry_uri.startswith(("http://", "https://")): | ||
registry_server = registry_uri | ||
else: | ||
# For URIs like 'registry.onstackit.cloud/zenml', use just the domain | ||
domain = registry_uri.split("/")[0] | ||
registry_server = f"https://{domain}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: this should be part of the base container registry class, maybe merge it into _get_container_registry_credentials
.
The rule of thumb is basically this:
- if the code changes from one container registry flavor to another, it belongs in the container registry base class, or better yet, the base class + implementations, if it's something that a subclass might override
- if the code is specific to kubernetes, it belongs here
src/zenml/integrations/kubernetes/orchestrators/manifest_utils.py
Outdated
Show resolved
Hide resolved
- secret_manifests: List of Kubernetes secret manifests to create | ||
- local_object_references: List of V1LocalObjectReference objects for imagePullSecrets | ||
""" | ||
credentials = _get_container_registry_credentials() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a cleaner way to do this from a design perspective. Let me explain the thought process:
- first, you have to ask yourself "what is the main area of concern for this utility python module ?". The answer is "building k8s manifests and nothing else". This should make it oblivious to things like stacks and ZenML Client(s). Basically, if your design is a tree of components linked by dependencies, this module should be a low-level leaf: it should know nothing about anything else in your design and therefore has no external dependencies (manifested here as imports). Or if you use the layered design / onion analogy, this is an outermost layer, with no dependencies.
- next, you realize this new utility function needs container registry credentials. So how do you get them here ?
- option 1. (currently implemented): call out to the container registry and fetch the credentials. But this makes the module dependent on inner details like the Client or active stack and that spaghettifies the design and its dependency tree and we want to avoid that.
- option 2. (recommended): use a "dependency injection" - have the other inner components that use this utility module do the grunt work of extracting the container credentials and injecting them into this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed 100%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i found the local. path function as well there that violates these design principles so I will be fixing that too
You need to do the research to figure this out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a more general note, your change means that the kubernetes orchestrator and step operator are now broken for all users where the service account does not have permissions to create secrets. This feature should probably be configurable, whether its enabled by default is debatable I think.
docs/book/component-guide/service-connectors/connector-types/docker-service-connector.md
Show resolved
Hide resolved
src/zenml/integrations/kubernetes/orchestrators/kubernetes_orchestrator_entrypoint.py
Outdated
Show resolved
Hide resolved
@@ -108,7 +259,9 @@ def build_pod_manifest( | |||
env: Optional[Dict[str, str]] = None, | |||
mount_local_stores: bool = False, | |||
owner_references: Optional[List[k8s_client.V1OwnerReference]] = None, | |||
) -> k8s_client.V1Pod: | |||
auto_generate_image_pull_secrets: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to put this. The purpose of this function is to create a configurable pod manifest. In my opinion, you should be able to pass previously generated image pull secrets into this function, they should not be created as part of it.
Adding on top of what @stefannica already mentioned: Those Docker credentials are short-lived and expire eventually. Currently, you're creating them client-side, which means they should be valid at least until the orchestrator pod spins up. However, let's say I have a sequential pipeline with three steps, each of them taking 20 hours to complete. When we get to running the third step, those credentials might already be expired. I think the solution for that would be to create a new (or refresh the existing) secret right before each step pod gets created. |
@schustmi don't we already need permission to create secrets to store the ZenML API token as a secret ? |
For the same reason as I mentioned here, I made this an optional feature and even disabled it by default. So for most people (which haven't explicitly enabled this), their service accounts for now do not require those permissions. |
…y' of github.com:zenml-io/zenml into feature/populate-imagepullsecret-from-container-registry
) | ||
if secret: | ||
return secret.username, secret.password | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica according to your comment the connector should be checked first, but in the credentials implementation it was the other way around - let me know if I am missing something or if it made sense to flip here
- secret_manifests: List of Kubernetes secret manifests to create | ||
- local_object_references: List of V1LocalObjectReference objects for imagePullSecrets | ||
""" | ||
credentials = _get_container_registry_credentials() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed 100%
- secret_manifests: List of Kubernetes secret manifests to create | ||
- local_object_references: List of V1LocalObjectReference objects for imagePullSecrets | ||
""" | ||
credentials = _get_container_registry_credentials() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i found the local. path function as well there that violates these design principles so I will be fixing that too
class DockerHubContainerRegistry(BaseContainerRegistry): | ||
"""Container registry implementation for DockerHub.""" | ||
|
||
@property | ||
def registry_server_uri(self) -> str: | ||
"""Get the DockerHub registry server URI. | ||
|
||
DockerHub requires authentication against the specific | ||
'https://index.docker.io/v1/' endpoint regardless of how | ||
the registry URI is configured. | ||
|
||
Returns: | ||
The DockerHub registry server URI. | ||
""" | ||
return "https://index.docker.io/v1/" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure on this one as container registries seem like the ugly duckling among our stack components with very little actual implementation of the base classes.
@@ -339,6 +356,16 @@ def finalize_run(node_states: Dict[str, NodeStatus]) -> None: | |||
except k8s_client.rest.ApiException as e: | |||
logger.error(f"Error cleaning up secret {secret_name}: {e}") | |||
|
|||
# Clean up old imagePullSecrets to prevent accumulation | |||
# Only clean up for non-scheduled runs to avoid interfering with running schedules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schustmi you mentioned that htese are temporary - what would I need to do for scheduled pipelines?
I tested it on my use-case and things worked - once we both have some time I would take you up on your offer to test together in more detail @stefannica |
Describe changes
Claude and I implemented the missing feature that allows ZenML to make appropriate credentials available to the kubernetes pods for image pulling.
I have validated that this works for my specific usecase - I'm happy to iterate on this and to discuss decisions.
I am also not sure how this will affect legacy customers that might be setting image pull secrets explicitly
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes