# Lesson 2.2: Deployment and Inference with MLflow

[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zenml-io/zenbytes/blob/main/2-2_Local_Deployment.ipynb)

***Key Concepts:*** *Model Deployer, MLflow*

In the last lesson, we learned how to use MLflow and Weights & Biases to track our experiments and compare models. In the end, we found which hyperparameters produced the best-performing model on our validation dataset. How do we make this model available to our customers/users and enable them to query it?

Setting up a dynamically scalable, highly-available, and reliable model service is a complex problem, and many companies hire large MLOps teams to build and maintain such services. With ZenML, we can build sophisticated ML services in a matter of minutes. In this lesson, we will start with a very basic model deployment, where we will use the [MLflow Models](https://mlflow.org/docs/latest/models.html) component to deploy our model as a local application that we can interact with via REST API. 

The beauty of ZenML is that our code can stay the same, no matter what tools or infrastructure we use. In a later chapter, we will see how this enables us to deploy the code we write here as a dynamically-scalable serverless microservice in the cloud. But more on that later.

First, let's setup zenml and import some of the core steps we have created in previous lessons:

In [None]:
%pip install zenml matplotlib
!zenml integration install sklearn mlflow -y
%pip install pyparsing==2.4.2  # required for Colab

import IPython

# automatically restart kernel
IPython.Application.instance().kernel.do_shutdown(restart=True)

In [None]:
# COLAB ONLY setup
try:
    import google.colab

    IN_COLAB = True

    # clone zenbytes repo to get source code of previous lessons
    !git clone https://github.com/zenml-io/zenbytes.git  # noqa
    !mv zenbytes/steps .
    !mv zenbytes/pipelines .

except ModuleNotFoundError as err:
    IN_COLAB = False

In [None]:
from steps.evaluator import evaluator
from steps.importer import importer
from steps.mlflow_trainer import svc_trainer_mlflow

ZenML provides a standard step for deployment to MLflow, so we don't need to write any code ourselves. To deploy our model after training it, all we need to do is to add the `mlflow_model_deployer_step` into our pipeline. In addition to the trained model, this step expects a boolean argument of whether to deploy the model or not. This is very useful in practice, as it allows you to define some requirements for deploying your models, i.e., that it performs better than the currently deployed model or that no data drift is happening. For now, let us define a `deployment_trigger` that only deploys a model if the test accuracy is over 90%:

In [None]:
from zenml.integrations.mlflow.steps import mlflow_model_deployer_step
from zenml.pipelines import pipeline
from zenml.steps import step


@step
def deployment_trigger(test_acc: float) -> bool:
    """Only deploy if the test accuracy > 90%."""
    return test_acc > 0.9


@pipeline(enable_cache=False)
def train_evaluate_deploy_pipeline(
    importer,
    trainer,
    evaluator,
    deployment_trigger,
    model_deployer,
):
    """Train and deploy a model with MLflow."""
    X_train, X_test, y_train, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    test_acc = evaluator(X_test=X_test, y_test=y_test, model=model)
    deployment_decision = deployment_trigger(test_acc)  # new
    model_deployer(deployment_decision, model)  # new


mlflow_pipeline = train_evaluate_deploy_pipeline(
    importer=importer(),
    trainer=svc_trainer_mlflow(),
    evaluator=evaluator(),
    deployment_trigger=deployment_trigger(),  # new
    model_deployer=mlflow_model_deployer_step(),  # new
)

Since we are using a new MLOps stack component, we need to register it with ZenML again before being able to run our pipeline. 
Similar to registering the experiment tracker in the last notebook, we first define a new model deployer, then add it to our ZenML stack.

In [None]:
# Change back to our default MLOps stack (in case W&B stack is still active)
!zenml stack set default

# Define MLflow experiment tracker from last lesson
!zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Register the MLflow model deployer
!zenml model-deployer register mlflow --flavor=mlflow

# Add the MLflow components into our default stack
!zenml stack update default -d mlflow -e mlflow_tracker

Executing `pipeline.run()` will now automatically deploy our model using MLflow. Let's try it out:

In [None]:
mlflow_pipeline.run()

Let's run the following command to get a list of all models currently deployed with our ZenML stack:

In [None]:
!zenml served-models list

If you see a checkmark under status, the model was correctly deployed. Congrats!

To interact with our deployed model in Python, we can use the `find_model_server()` method of ZenMLs model-deployer stack component:

In [None]:
from zenml.repository import Repository

repo = Repository()
model_deployer = repo.active_stack.model_deployer
services = model_deployer.find_model_server(
    pipeline_name="train_evaluate_deploy_pipeline",
    pipeline_step_name="mlflow_model_deployer_step",
    running=True,
)
service = services[0]
service.check_status()

Let's play with our model service a bit and send it a request. 

First, let's query the artifact store to get a sample from the test set of our last run.

In [None]:
p = repo.get_pipeline("train_evaluate_deploy_pipeline")
last_run = p.runs[-1]
X_test = last_run.steps[0].outputs["X_test"].read()
y_test = last_run.steps[0].outputs["y_test"].read()

Let's use matplotlib to plot the sample and see what our model would predict:

In [None]:
import matplotlib.pyplot as plt

plt.axis("off")
plt.imshow(X_test[0].reshape(8, 8), cmap=plt.cm.gray_r, interpolation="nearest")
pred0 = service.predict(X_test[0:1])
print(f"Model predicted {pred0}, true label was {y_test[0]}")

And that's it, we have deployed our first ML pipeline and learned how to interact with it. In practice, you would, of course, not query the model service manually but automatically send samples to it as new data comes in. That is what we will do in the [next lesson](2-3_Inference_Pipelines.ipynb), where we will build a basic inference pipeline. See you there!