Secret Manager improvements and Seldon Core secret passing #529

stefannica · 2022-04-21T13:15:41Z

Describe changes

This PR updates the Secret Manager implementation to make it more flexible:

allows integrations and user code to more easily register their own secret schemas
mounts the local secret manager inside the local Kubeflow, so it can be accessed by steps if needed (e.g. this is required to deploy models in Seldon Core with a local Kubeflow orchestrator and secret manager, but with a remote Seldon Core installation and an AWS S3 artifact store)
other minor improvements to the secret manager CLI to take into account optional keys

As a result of unifying how the paths of local stack components are mounted inside local Kubeflow containers, the PR also updates the MLflow model deployer so that it can be used with a local Kubeflow orchestrator.

The final goal of this PR is to update the Seldon Core model deployer so that it configures Kubernetes secrets automatically from ZenML secrets. The model deployer component now accepts a new --secret configuration attribute that points to a ZenML secret that contains credentials needed to access the AWS S3, GCS or Azure Blob storage artifact store. Three secret schemas are also included to help with that.

Implementation Details

Formalizing stack components with locally persisted state

The baseStackComponent class gets a new property that all stack components with local folders, like the default Artifact Store, the default Metadata Store and the local Secrets Manager need to implement to provide the path of the local folder where information is persisted:

    @property
    def local_path(self) -> Optional[str]:
        ...

This property is then used in the Kubeflow orchestrator to determine which local folders need to be mounted in the Kubeflow pipeline containers when the local Kubeflow orchestrator is used. The end result is that the local Secrets Manager folder is now automatically mounted in Kubeflow and can be accessed from pipeline steps. This is important for pipeline steps like the Seldon Core model deployment step that must access a ZenML secret with credentials needed to authenticate the Seldon Core model servers to the Artifact Store.

Allowing registration of secret schemas from integrations

Following the same approach used for stack component classes, the BaseSecretSchema type is now a ClassVar attribute of type string:

class BaseSecretSchema(BaseModel, ABC):
    name: str
    TYPE: ClassVar[str]

This allows secret schemas to be more easily registered from any part of the code, ZenML integration or otherwise:

AWS_SECRET_SCHEMA_TYPE = "aws"

@register_secret_schema_class
class AWSSecretSchema(BaseSecretSchema):

    TYPE: ClassVar[str] = AWS_SECRET_SCHEMA_TYPE

    aws_access_key_id: str
    aws_secret_access_key: str
    aws_session_token: Optional[str]

Seldon Core authentication through ZenML secrets

The Seldon Core Model Deployer gets a new secret configuration attribute that points to a ZenML secret. The Model Deployer then converts that ZenML secret into a Kubernetes secret. The Seldon Core integration also provides 3 different secret schemas for the 3 flavors of Artifact Store: AWS, GCP and Azure.

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
If my change requires a change to docs, I have updated the documentation accordingly.
If I have added an integration, I have updated the integrations table.
I have added tests to cover my changes.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

safoinme

LGTM!!

safoinme · 2022-04-21T22:09:51Z

src/zenml/integrations/seldon/secret_schemas/secret_schemas.py

+from zenml.secret import register_secret_schema_class
+from zenml.secret.base_secret import BaseSecretSchema
+
+SELDON_S3_SECRET_SCHEMA_TYPE = "seldon_s3"


Just question about names, shouldn't be Rclone instead of seldon RCLONE_S3_SECRET_SCHEMA_TYPE ?

I would like to keep rclone as an implementation detail and not expose ZenML users to it if possible. To be honest, I am not too happy with naming this seldon either, given that most of these schemas are very similar to other ways of configuring AWS/GCP/Azure credentials for other libraries and tools. ZenML should have a unified way of handling authentication for "the big 3". I'm hoping that's something we can look into later.

safoinme · 2022-04-21T22:16:25Z

src/zenml/integrations/aws/secret_schemas/aws_secret_schema.py

-from zenml.enums import SecretSchemaType
+from zenml.secret import register_secret_schema_class
 from zenml.secret.base_secret import BaseSecretSchema



it's not possible to use this schema unless AWS integration is installed (boto3). shouldn't schemas be something more general that isn't tied to this integration?

Same as my other answer: yes, it should be more general. ZenML should provide a unified authentication/credentials management API for the 3 hyperscalers at least (AWS, Google and Azure), and the integrations should implement adapters that convert from the ZenML API to their own specific methods of handling authentication. However, that will be a bigger effort not strictly required for what I'm solving with this PR. Here I'm fixing what I think was a minor design issue that we had: the core ZenML code (src/zenml/secret/secret_schema_class_registry.py in this case) must not depend on integration code (see the from zenml.integrations.aws.secret_schemas import AWSSecretSchema line I removed). It must always be the other way around.

If we really need the AWS schema to be part of the ZenML core, then we need to move it outside of the AWS integration space.

schustmi

@stefannica I haven't finished my review on this yet (will continue right after I post this comment), however I thought I'd quickly ask for one clarification in the meantime: Spinning up the local k3d cluster and mounting the volume had to be done in two steps when we first implemented it:

Mount the volume into the K3D cluster (inside local_deployment_utils.create_k3d_cluster)
Mount the volume inside the kubeflow pod (which is what your PR handles)

The volume mounted to the K3D cluster right now is the entire global config directory (which I guess is where all the artifact store/metadata store/deployer/secret manager files are located?). As soon as someone creates a custom StackComponent it seems that they could point local_path to anything but it won't work.
Is this something to maybe point out in the local_path docstring (or even validate that it's inside the global config directory and raise an error otherwise?) or do you have any ideas/plans to support arbitrary directories?

schustmi · 2022-04-25T12:19:35Z

src/zenml/integrations/kubeflow/orchestrators/kubeflow_component.py

-
-        if isinstance(metadata_store, SQLiteMetadataStore):
+        for stack_comp in stack.components.values():
+            local_path = stack_comp.local_path


I saw this loop in at least two places, do you think it makes sense to sort of aggregate them in a

@property def local_paths(self) -> Set[str]

or similar on the Stack?

This looked like a good idea initially, but as I started to implement it I realized that I also need the components associated with the paths, for logging related tasks.

schustmi · 2022-04-25T12:24:40Z

src/zenml/integrations/kubeflow/orchestrators/kubeflow_orchestrator.py

            push_docker_image,
        )

+        # if the orchestrator is not running in a local k3d cluster,


As the kubernetes context is completely arbitrary, it might be

pointing to a local k3d cluster

or even more likely: unspecified but still be pointing to a non-local cluster (even I sometimes run pipelines and just set the kubectl context right before)

I don't really have any better solution to differentiate between local/remote and I think the current implementation is fine, just wanted to mention it

As discussed offline, this is going to change in my next PR that implements a new convention for what constitutes "local" Kubeflow orchestrators.

schustmi

Awesome changes, can't wait to integrate that into the experiment tracking components!

stefannica · 2022-04-25T16:54:24Z

@stefannica I haven't finished my review on this yet (will continue right after I post this comment), however I thought I'd quickly ask for one clarification in the meantime: Spinning up the local k3d cluster and mounting the volume had to be done in two steps when we first implemented it:

Mount the volume into the K3D cluster (inside local_deployment_utils.create_k3d_cluster)

Mount the volume inside the kubeflow pod (which is what your PR handles)

The volume mounted to the K3D cluster right now is the entire global config directory (which I guess is where all the artifact store/metadata store/deployer/secret manager files are located?). As soon as someone creates a custom StackComponent it seems that they could point local_path to anything but it won't work. Is this something to maybe point out in the local_path docstring (or even validate that it's inside the global config directory and raise an error otherwise?) or do you have any ideas/plans to support arbitrary directories?

We agreed to enforce the convention that local paths are always relative to the ZenML global configuration directory. This is documented as a docstring now in the local_path method and also validated when the Kubeflow container mounts are computed.

stefannica added 4 commits April 20, 2022 18:14

Formalize local folders that need to be mounted into Kubeflow

f04c4ab

Fix MLflow deployment services so they can run in Kubeflow

6666025

Allow Secret Schemas to be registered from integrations

b97de93

Add Seldon Core Secret Schemas

24c4260

github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Apr 21, 2022

Use a ZenML secret to pass credentials to Seldon Core deployments

137c644

stefannica force-pushed the feature/deployers/eng-779-allow-secretmgr-in-containers branch from 5152c0a to 137c644 Compare April 21, 2022 13:26

stefannica requested review from safoinme, strickvl and wjayesh April 21, 2022 13:26

Fix linter errors and update README with more instructions

894fceb

stefannica force-pushed the feature/deployers/eng-779-allow-secretmgr-in-containers branch from 81dc84b to 894fceb Compare April 21, 2022 18:33

safoinme approved these changes Apr 21, 2022

View reviewed changes

Add warning when remote kubeflow is used with local components

ff993fc

schustmi requested changes Apr 25, 2022

View reviewed changes

schustmi approved these changes Apr 25, 2022

View reviewed changes

stefannica added 2 commits April 25, 2022 18:40

Document and validate local paths convention for components

593b574

Preserve backwards compatibility for Seldon Deployment secrets

c0f57f2

stefannica merged commit c74fa07 into develop Apr 25, 2022

stefannica deleted the feature/deployers/eng-779-allow-secretmgr-in-containers branch April 25, 2022 18:35

Secret Manager improvements and Seldon Core secret passing #529

Secret Manager improvements and Seldon Core secret passing #529

Uh oh!

Conversation

stefannica commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe changes

Implementation Details

Formalizing stack components with locally persisted state

Allowing registration of secret schemas from integrations

Seldon Core authentication through ZenML secrets

Pre-requisites

Types of changes

Uh oh!

safoinme left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schustmi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schustmi left a comment

Choose a reason for hiding this comment

Uh oh!

stefannica commented Apr 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stefannica commented Apr 21, 2022 •

edited

Loading

schustmi left a comment •

edited

Loading