Skip to content

Conversation

@stefannica
Copy link
Contributor

@stefannica stefannica commented Apr 21, 2022

Describe changes

This PR updates the Secret Manager implementation to make it more flexible:

  • allows integrations and user code to more easily register their own secret schemas
  • mounts the local secret manager inside the local Kubeflow, so it can be accessed by steps if needed (e.g. this is required to deploy models in Seldon Core with a local Kubeflow orchestrator and secret manager, but with a remote Seldon Core installation and an AWS S3 artifact store)
  • other minor improvements to the secret manager CLI to take into account optional keys

As a result of unifying how the paths of local stack components are mounted inside local Kubeflow containers, the PR also updates the MLflow model deployer so that it can be used with a local Kubeflow orchestrator.

The final goal of this PR is to update the Seldon Core model deployer so that it configures Kubernetes secrets automatically from ZenML secrets. The model deployer component now accepts a new --secret configuration attribute that points to a ZenML secret that contains credentials needed to access the AWS S3, GCS or Azure Blob storage artifact store. Three secret schemas are also included to help with that.

Implementation Details

Formalizing stack components with locally persisted state

The baseStackComponent class gets a new property that all stack components with local folders, like the default Artifact Store, the default Metadata Store and the local Secrets Manager need to implement to provide the path of the local folder where information is persisted:

    @property
    def local_path(self) -> Optional[str]:
        ...

This property is then used in the Kubeflow orchestrator to determine which local folders need to be mounted in the Kubeflow pipeline containers when the local Kubeflow orchestrator is used. The end result is that the local Secrets Manager folder is now automatically mounted in Kubeflow and can be accessed from pipeline steps. This is important for pipeline steps like the Seldon Core model deployment step that must access a ZenML secret with credentials needed to authenticate the Seldon Core model servers to the Artifact Store.

Allowing registration of secret schemas from integrations

Following the same approach used for stack component classes, the BaseSecretSchema type is now a ClassVar attribute of type string:

class BaseSecretSchema(BaseModel, ABC):
    name: str
    TYPE: ClassVar[str]

This allows secret schemas to be more easily registered from any part of the code, ZenML integration or otherwise:

AWS_SECRET_SCHEMA_TYPE = "aws"

@register_secret_schema_class
class AWSSecretSchema(BaseSecretSchema):

    TYPE: ClassVar[str] = AWS_SECRET_SCHEMA_TYPE

    aws_access_key_id: str
    aws_secret_access_key: str
    aws_session_token: Optional[str]

Seldon Core authentication through ZenML secrets

The Seldon Core Model Deployer gets a new secret configuration attribute that points to a ZenML secret. The Model Deployer then converts that ZenML secret into a Kubernetes secret. The Seldon Core integration also provides 3 different secret schemas for the 3 flavors of Artifact Store: AWS, GCP and Azure.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table.
  • I have added tests to cover my changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Apr 21, 2022
@stefannica stefannica force-pushed the feature/deployers/eng-779-allow-secretmgr-in-containers branch from 5152c0a to 137c644 Compare April 21, 2022 13:26
@stefannica stefannica force-pushed the feature/deployers/eng-779-allow-secretmgr-in-containers branch from 81dc84b to 894fceb Compare April 21, 2022 18:33
Copy link
Contributor

@safoinme safoinme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

from zenml.secret import register_secret_schema_class
from zenml.secret.base_secret import BaseSecretSchema

SELDON_S3_SECRET_SCHEMA_TYPE = "seldon_s3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just question about names, shouldn't be Rclone instead of seldon RCLONE_S3_SECRET_SCHEMA_TYPE ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep rclone as an implementation detail and not expose ZenML users to it if possible. To be honest, I am not too happy with naming this seldon either, given that most of these schemas are very similar to other ways of configuring AWS/GCP/Azure credentials for other libraries and tools. ZenML should have a unified way of handling authentication for "the big 3". I'm hoping that's something we can look into later.

from zenml.enums import SecretSchemaType
from zenml.secret import register_secret_schema_class
from zenml.secret.base_secret import BaseSecretSchema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not possible to use this schema unless AWS integration is installed (boto3). shouldn't schemas be something more general that isn't tied to this integration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my other answer: yes, it should be more general. ZenML should provide a unified authentication/credentials management API for the 3 hyperscalers at least (AWS, Google and Azure), and the integrations should implement adapters that convert from the ZenML API to their own specific methods of handling authentication. However, that will be a bigger effort not strictly required for what I'm solving with this PR. Here I'm fixing what I think was a minor design issue that we had: the core ZenML code (src/zenml/secret/secret_schema_class_registry.py in this case) must not depend on integration code (see the from zenml.integrations.aws.secret_schemas import AWSSecretSchema line I removed). It must always be the other way around.

If we really need the AWS schema to be part of the ZenML core, then we need to move it outside of the AWS integration space.

Copy link
Contributor

@schustmi schustmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefannica I haven't finished my review on this yet (will continue right after I post this comment), however I thought I'd quickly ask for one clarification in the meantime: Spinning up the local k3d cluster and mounting the volume had to be done in two steps when we first implemented it:

  1. Mount the volume into the K3D cluster (inside local_deployment_utils.create_k3d_cluster)
  2. Mount the volume inside the kubeflow pod (which is what your PR handles)

The volume mounted to the K3D cluster right now is the entire global config directory (which I guess is where all the artifact store/metadata store/deployer/secret manager files are located?). As soon as someone creates a custom StackComponent it seems that they could point local_path to anything but it won't work.
Is this something to maybe point out in the local_path docstring (or even validate that it's inside the global config directory and raise an error otherwise?) or do you have any ideas/plans to support arbitrary directories?


if isinstance(metadata_store, SQLiteMetadataStore):
for stack_comp in stack.components.values():
local_path = stack_comp.local_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this loop in at least two places, do you think it makes sense to sort of aggregate them in a

@property
def local_paths(self) -> Set[str]

or similar on the Stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked like a good idea initially, but as I started to implement it I realized that I also need the components associated with the paths, for logging related tasks.

push_docker_image,
)

# if the orchestrator is not running in a local k3d cluster,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the kubernetes context is completely arbitrary, it might be

  • pointing to a local k3d cluster
  • or even more likely: unspecified but still be pointing to a non-local cluster (even I sometimes run pipelines and just set the kubectl context right before)

I don't really have any better solution to differentiate between local/remote and I think the current implementation is fine, just wanted to mention it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this is going to change in my next PR that implements a new convention for what constitutes "local" Kubeflow orchestrators.

Copy link
Contributor

@schustmi schustmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome changes, can't wait to integrate that into the experiment tracking components!

@stefannica
Copy link
Contributor Author

@stefannica I haven't finished my review on this yet (will continue right after I post this comment), however I thought I'd quickly ask for one clarification in the meantime: Spinning up the local k3d cluster and mounting the volume had to be done in two steps when we first implemented it:

  1. Mount the volume into the K3D cluster (inside local_deployment_utils.create_k3d_cluster)
  2. Mount the volume inside the kubeflow pod (which is what your PR handles)

The volume mounted to the K3D cluster right now is the entire global config directory (which I guess is where all the artifact store/metadata store/deployer/secret manager files are located?). As soon as someone creates a custom StackComponent it seems that they could point local_path to anything but it won't work. Is this something to maybe point out in the local_path docstring (or even validate that it's inside the global config directory and raise an error otherwise?) or do you have any ideas/plans to support arbitrary directories?

We agreed to enforce the convention that local paths are always relative to the ZenML global configuration directory. This is documented as a docstring now in the local_path method and also validated when the Kubeflow container mounts are computed.

@stefannica stefannica merged commit c74fa07 into develop Apr 25, 2022
@stefannica stefannica deleted the feature/deployers/eng-779-allow-secretmgr-in-containers branch April 25, 2022 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal To filter out internal PRs and issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants