Skip to content

Latest commit

 

History

History
184 lines (126 loc) · 12.8 KB

cloud-orchestration.md

File metadata and controls

184 lines (126 loc) · 12.8 KB
description
Orchestrate using cloud resources.

Orchestrate on the cloud

Until now, we've only run pipelines locally. The next step is to get free from our local machines and transition our pipelines to execute on the cloud. This will enable you to run your MLOps pipelines in a cloud environment, leveraging the scalability and robustness that cloud platforms offer.

In order to do this, we need to get familiar with two more stack components:

  • The orchestrator manages the workflow and execution of your pipelines.
  • The container registry is a storage and content delivery system that holds your Docker container images.

These, along with remote storage, complete a basic cloud stack where our pipeline is entirely running on the cloud.

{% hint style="info" %} Would you like to skip ahead and deploy a full ZenML cloud stack already?

Check out the in-browser stack deployment wizard, the stack registration wizard, or the ZenML Terraform modules for a shortcut on how to deploy & register a cloud stack. {% endhint %}

Starting with a basic cloud stack

The easiest cloud orchestrator to start with is the Skypilot orchestrator running on a public cloud. The advantage of Skypilot is that it simply provisions a VM to execute the pipeline on your cloud provider.

Coupled with Skypilot, we need a mechanism to package your code and ship it to the cloud for Skypilot to do its thing. ZenML uses Docker to achieve this. Every time you run a pipeline with a remote orchestrator, ZenML builds an image for the entire pipeline (and optionally each step of a pipeline depending on your configuration). This image contains the code, requirements, and everything else needed to run the steps of the pipeline in any environment. ZenML then pushes this image to the container registry configured in your stack, and the orchestrator pulls the image when it's ready to execute a step.

To summarize, here is the broad sequence of events that happen when you run a pipeline with such a cloud stack:

Sequence of events that happen when running a pipeline on a full cloud stack.

  1. The user runs a pipeline on the client machine. This executes the run.py script where ZenML reads the @pipeline function and understands what steps need to be executed.
  2. The client asks the server for the stack info, which returns it with the configuration of the cloud stack.
  3. Based on the stack info and pipeline specification, the client builds and pushes an image to the container registry. The image contains the environment needed to execute the pipeline and the code of the steps.
  4. The client creates a run in the orchestrator. For example, in the case of the Skypilot orchestrator, it creates a virtual machine in the cloud with some commands to pull and run a Docker image from the specified container registry.
  5. The orchestrator pulls the appropriate image from the container registry as it's executing the pipeline (each step has an image).
  6. As each pipeline runs, it stores artifacts physically in the artifact store. Of course, this artifact store needs to be some form of cloud storage.
  7. As each pipeline runs, it reports status back to the ZenML server and optionally queries the server for metadata.

Provisioning and registering an orchestrator alongside a container registry

While there are detailed docs on how to set up a Skypilot orchestrator and a container registry on each public cloud, we have put the most relevant details here for convenience:

{% tabs %} {% tab title="AWS" %} In order to launch a pipeline on AWS with the SkyPilot orchestrator, the first thing that you need to do is to install the AWS and Skypilot integrations:

zenml integration install aws skypilot_aws -y

Before we start registering any components, there is another step that we have to execute. As we explained in the previous section, components such as orchestrators and container registries often require you to set up the right permissions. In ZenML, this process is simplified with the use of Service Connectors. For this example, we need to use the IAM role authentication method of our AWS service connector:

AWS_PROFILE=<AWS_PROFILE> zenml service-connector register cloud_connector --type aws --auto-configure

Once the service connector is set up, we can register a Skypilot orchestrator:

zenml orchestrator register cloud_orchestrator -f vm_aws
zenml orchestrator connect cloud_orchestrator --connector cloud_connector

The next step is to register an AWS container registry. Similar to the orchestrator, we will use our connector as we are setting up the container registry:

zenml container-registry register cloud_container_registry -f aws --uri=<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com
zenml container-registry connect cloud_container_registry --connector cloud_connector

With the components registered, everything is set up for the next steps.

For more information, you can always check the dedicated Skypilot orchestrator guide. {% endtab %}

{% tab title="GCP" %} In order to launch a pipeline on GCP with the SkyPilot orchestrator, the first thing that you need to do is to install the GCP and Skypilot integrations:

zenml integration install gcp skypilot_gcp -y

Before we start registering any components, there is another step that we have to execute. As we explained in the previous section, components such as orchestrators and container registries often require you to set up the right permissions. In ZenML, this process is simplified with the use of Service Connectors. For this example, we need to use the Service Account authentication feature of our GCP service connector:

zenml service-connector register cloud_connector --type gcp --auth-method service-account --service_account_json=@<PATH_TO_SERVICE_ACCOUNT_JSON> --project_id=<PROJECT_ID> --generate_temporary_tokens=False

Once the service connector is set up, we can register a Skypilot orchestrator:

zenml orchestrator register cloud_orchestrator -f vm_gcp 
zenml orchestrator connect cloud_orchestrator --connect cloud_connector

The next step is to register a GCP container registry. Similar to the orchestrator, we will use our connector as we are setting up the container registry:

zenml container-registry register cloud_container_registry -f gcp --uri=gcr.io/<PROJECT_ID>
zenml container-registry connect cloud_container_registry --connector cloud_connector

With the components registered, everything is set up for the next steps.

For more information, you can always check the dedicated Skypilot orchestrator guide. {% endtab %}

{% tab title="Azure" %} As of v0.60.0, alongside the switch to pydantic v2, due to an incompatibility between the new version pydantic and the azurecli, the skypilot[azure] flavor can not be installed at the same time. Therefore, for Azure users, an alternative is to use the Kubernetes Orchestrator. You can easily deploy a Kubernetes cluster in your subscription using the Azure Kubernetes Service.

In order to launch a pipeline on Azure with the Kubernetes orchestrator, the first thing that you need to do is to install the Azure and Kubernetes integrations:

zenml integration install azure kubernetes -y

You should also ensure you have kubectl installed.

Before we start registering any components, there is another step that we have to execute. As we explained in the previous section, components such as orchestrators and container registries often require you to set up the right permissions. In ZenML, this process is simplified with the use of Service Connectors. For this example, we will need to use the Service Principal authentication feature of our Azure service connector:

zenml service-connector register cloud_connector --type azure --auth-method service-principal --tenant_id=<TENANT_ID> --client_id=<CLIENT_ID> --client_secret=<CLIENT_SECRET>

Once the service connector is set up, we can register a Kubernetes orchestrator:

# Ensure your service connector has access to the AKS cluster:
zenml service-connector list-resources --resource-type kubernetes-cluster -e
zenml orchestrator register cloud_orchestrator --flavor kubernetes
zenml orchestrator connect cloud_orchestrator --connect cloud_connector

The next step is to register an Azure container registry. Similar to the orchestrator, we will use our connector as we are setting up the container registry.

zenml container-registry register cloud_container_registry -f azure --uri=<REGISTRY_NAME>.azurecr.io
zenml container-registry connect cloud_container_registry --connector cloud_connector

With the components registered, everything is set up for the next steps.

For more information, you can always check the dedicated Kubernetes orchestrator guide. {% endtab %} {% endtabs %}

{% hint style="info" %} Having trouble with setting up infrastructure? Try reading the stack deployment section of the docs to gain more insight. If that still doesn't work, join the ZenML community and ask! {% endhint %}

Running a pipeline on a cloud stack

Now that we have our orchestrator and container registry registered, we can register a new stack, just like we did in the previous chapter:

{% tabs %} {% tab title="CLI" %}

zenml stack register minimal_cloud_stack -o cloud_orchestrator -a cloud_artifact_store -c cloud_container_registry

{% endtab %} {% endtabs %}

Now, using the code from the previous chapter, we can run a training pipeline. First, set the minimal cloud stack active:

zenml stack set minimal_cloud_stack

and then, run the training pipeline:

python run.py --training-pipeline

You will notice this time your pipeline behaves differently. After it has built the Docker image with all your code, it will push that image, and run a VM on the cloud. Here is where your pipeline will execute, and the logs will be streamed back to you. So with a few commands, we were able to ship our entire code to the cloud!

Curious to see what other stacks you can create? The Component Guide has an exhaustive list of various artifact stores, container registries, and orchestrators that are integrated with ZenML. Try playing around with more stack components to see how easy it is to switch between MLOps stacks with ZenML.

ZenML Scarf