# ZenML: Open-source MLOps Framework for reproducible ML pipelines

![Test](_assets/Logo/zenml.svg)

<div class="alert alert-block alert-danger">
    <b>Note:</b> This lesson is still in progress and some commands may not work as described. Please expect an update until 20th April 2022. 
</div>

In [None]:
from absl import logging as absl_logging
import warnings
warnings.filterwarnings('ignore')
%load_ext autoreload
%autoreload 2
absl_logging.set_verbosity(-10000)

Let's begin by initializing ZenML in our directory. We are going to use a local stack to begin with, for simplicity and then transition to other stacks. This can be achieved in code by executing the following block.

# Initialize ZenML

In [None]:
!rm -rf .zen
!zenml init

# Install integrations

ZenML handles integrations natively, to avoid dependency conflicts, so make sure to use the following command to install the integrations required for this lesson.

![All](_assets/integrations_all.png "All")

In [None]:
!zenml integration install kubeflow seldon s3 -f

# The Concept of MLOps Stacks

The ZenML stack is a concept that describes the union of Metadata Store, Artifact Store and Orchestrator that will be used for all pipeline runs. When you get started with zenml you start off with a default local stack.

In [None]:
!zenml stack list

## The Local Stack

You can imagine the local stack to look like this. Within the diagram we show how a generic pipeline interacts with the local stack.

![LocalStack](_assets/localstack.png "LocalStack")

## The Kubeflow Pipelines stack

We will now use the Kubeflow integration to extend the concept of stacks

Now we want to transition to a kubeflow stack that will look a little bit like this. Note that for kubeflow pipelines we also need a registry where the docker images for each step are registered. 

![KubeflowStack](_assets/aws_stack_redesigned.png "KubeflowStack")

But we have good news! You barely have to do anything to transition.

# Transitioning to Production with Kubeflow on AWS

There are two steps to follow in order to continue.

- Set up the neccessary cloud resources on the provider of your choice
- Configure ZenML with a new stack to be able to communicate with these resources

## Set up using the cloud guide

In order to continue, it is best to follow the updated cloud guide for ZenML found [here](https://docs.zenml.io/features/guide-aws-gcp-azure). Please return after finishing the `pre-requisites` section.

It is recommended you use AWS as your cloud provider to follow along the lesson. However, if you were to select GCP or Azure, it should not so hard to actually modify the below commands to work accordingly.

You will also need Seldon Core to be installed in the same Kubernetes cluster as Kubeflow. Some brief instructions on how to install Seldon Core in AWS EKS can be found [here](https://github.com/zenml-io/zenml/tree/main/examples/seldon_deployment#installing-seldon-core-eg-in-an-eks-cluster). It is also advisable to read the official [Seldon Core documentation](https://docs.seldon.io/projects/seldon-core/en/latest/workflow/install.html).

## Create your AWS Kubeflow Stack

Now we can configure a new stack that points to your newly created resources on the cloud

If you remember from the main README, Kubernetes and Docker are a pre-requisite to this part of the guide. Please make sure you have them installed. You also need to install the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) to move forward.

In [None]:
# Replace the following with your own configuration. Use the below as exemplary.

AWS_EKS_CLUSTER="zenhacks-cluster"
AWS_REGION="us-east-1"
ECR_REGISTRY_NAME="715803424590.dkr.ecr.us-east-1.amazonaws.com"
S3_BUCKET_NAME="s3://zenbytes-bucket"
KUBEFLOW_NAMESPACE="kubeflow"

First, set up local access to the AWS EKS cluster and the AWS ECR registry.

In [None]:
# Point Docker to the ECR registry
!aws ecr get-login-password --region {AWS_REGION} | docker login --username AWS --password-stdin {ECR_REGISTRY_NAME}

# Create a Kubernetes configuration context that points to the EKS cluster
!aws eks --region {AWS_REGION} update-kubeconfig --name {AWS_EKS_CLUSTER}

Next, we need to set up a Kubernetes Secret to give Seldon Core access to the AWS S3 artifact store in the configured namespace.

NOTE: this is based on the assumption that Seldon Core is running in an EKS cluster that already has IAM access enabled and doesn't need any explicit AWS credentials. If that is not the case, you will need to set up credentials differently. Please look up the variables relevant to your use-case in the official [Seldon Core documentation](https://docs.seldon.io/projects/seldon-core/en/latest/servers/overview.html#handling-credentials).

In [None]:
!kubectl -n {KUBEFLOW_NAMESPACE} create secret generic seldon-init-container-secret --from-literal=RCLONE_CONFIG_S3_PROVIDER='aws' --from-literal=RCLONE_CONFIG_S3_TYPE='s3' --from-literal=RCLONE_CONFIG_S3_ENV_AUTH=true

Extract the base URL that will be used by Seldon Core to expose all model servers:

In [None]:
INGRESS_HOST = ! echo $(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
INGRESS_HOST[0]

Finally, register the ZenML Stack:

In [None]:
# Register container registry
!zenml container-registry register ecr_registry --type=default --uri={ECR_REGISTRY_NAME}

# Register orchestrator (Kubeflow on AWS)
!zenml orchestrator register eks_orchestrator --type=kubeflow 

# Register metadata store and artifact store
!zenml metadata-store register kubeflow_metadata_store --type=kubeflow
!zenml artifact-store register s3_store --type=s3 --path={S3_BUCKET_NAME}

# Register the Seldon Core model deployer (Seldon on AWS)
!zenml model-deployer register eks_seldon --type=seldon --kubernetes_namespace={KUBEFLOW_NAMESPACE} --base_url=http://{INGRESS_HOST[0]}

# Register a secret manager
!zenml secrets-manager register local_secret_manager --type=local

# Register the aws_kubeflow_stack
!zenml stack register aws_kubeflow_stack -m kubeflow_metadata_store -a s3_store -o eks_orchestrator -c ecr_registry -d eks_seldon -x local_secret_manager

## Transition to Production (Run on the Cloud)

Once the stack is configured, all that is left to do is to set it active and to run a pipeline. Note that the code itself DOES NOT need to change, only the active stack.

ZenML will detect that the stack has changed, and instead of running your pipeline locally, will build a Docker Image, push it to the container registry with your requirements, and deploy the pipeline with that image on Kubeflow Pipelines. This whole process is usually very painful but simplified with ZenML, and is completely customizable.

For now, try it out! It might take a few minutes to build and push the image, but after that you'd see your pipeline in the cloud!

<div class="alert alert-block alert-info">
    <b>Note:</b> Currently running pipelines defined within a jupyter notebook cell is
    not supported. To get around this you can run the train pipeline within this repo. 
</div>

In [None]:
!zenml stack set aws_kubeflow_stack
!zenml integration uninstall -f mlflow

# Let's train within kubeflow pipelines - this will deploy the pipeline
!python run.py --deploy --predict # --interval-second=300

In order to see the pipeline run, you should port-forward Kubeflow Pipelines to: [http://localhost:8080/](http://localhost:8080/). You might want to try this is a seperate shell:

```
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
```

In [None]:
# Do this only if the port forward from `zenml stack up` did not work. 
!kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

And thats it you have successfully transitioned from local to production by simply switching you ZenML stack. This is just scratching the surface!

Next up, more about stacks, running pipelines on a schedule, and much more coming soon!