# Lesson 2-3: Inference Pipelines

***Key Concepts:*** *Inference Pipelines*

In the last lesson we have learned how to add model deployment as a step in our ML pipeline, which allows us to automatically deploy models into production after training them. We also saw how to manually interact with the served model.

In practice, querying the model is just one of many steps you would have to perform at inference time. Whenever you receive a request, you might need to preprocess the data you received, and you might also have some postprocessing code that you want to run after your model, like converting outputs to a different format, sending alerts, etc.

That is why it makes sense to not only use ML pipelines for model training, but for inference as well.

![Training and Inference Pipelines GIF](_assets/2-3/training_inference_pipelines.gif)

In this notebook we will build a very basic inference pipeline to interact with our served model. 
The pipeline will consist of the following three steps:
1. Load a data sample
2. Load the model (prediction service)
3. Inference the model on the data sample

Let's define such a pipeline in code:

In [None]:
from zenml.pipelines import pipeline


@pipeline
def inference_pipeline(
    inference_data_loader,
    prediction_service_loader,
    predictor,
):
    """Basic inference pipeline."""
    inference_data = inference_data_loader()
    model_deployment_service = prediction_service_loader()
    predictor(model_deployment_service, inference_data)

In practice, the inference data loader might receive a single sample from an API request, or it might load a batch of data from a data lake or similar. For simplicity, we will mock this component for now and just load an 8x8 random noise image.

In [None]:
import numpy as np
from zenml.steps import step


@step
def inference_data_loader() -> np.ndarray:
    """Load some inference data."""
    return np.random.rand(1, 64)  # flattened 8x8 random noise image

Next, let's define the `prediction_service_loader` step. We can use the exact same code here that we used for manually querying the model service in the last lesson, just wrapped in a ZenML step:

In [None]:
from zenml.steps import step, Output
from zenml.services import BaseService
from zenml.repository import Repository


@step(enable_cache=False)
def prediction_service_loader() -> BaseService:
    """Load the model service of our train_evaluate_deploy_pipeline."""
    repo = Repository(skip_repository_check=True)
    model_deployer = repo.active_stack.model_deployer
    services = model_deployer.find_model_server(
        pipeline_name="train_evaluate_deploy_pipeline",
        pipeline_step_name="mlflow_model_deployer_step",
        running=True,
    )
    service = services[0]
    return service

Finally, let's write the `predictor` step that will inference our served model on the inference data sample. This step will simply start the service, call its `predict()` endpoint to get logits, then performs an `argmax` operation to retrieve the class with highest predicted probability.

In [None]:
@step
def predictor(
    service: BaseService,
    data: np.ndarray,
) -> Output(predictions=list):
    """Run a inference request against a prediction service"""
    service.start(timeout=10)  # should be a NOP if already started
    prediction = service.predict(data)
    prediction = prediction.argmax(axis=-1)
    print(f"Prediction is: {[prediction.tolist()]}")
    return [prediction.tolist()]

Let's put it all together to initialize and run our inference pipeline:

In [None]:
# Initialize an inference pipeline run
my_inference_pipeline = inference_pipeline(
    inference_data_loader=inference_data_loader(),
    prediction_service_loader=prediction_service_loader(),
    predictor=predictor(),
)

my_inference_pipeline.run()

And that completes our second ZenBytes chapter on deployment and inference. Our training and deployment pipelines are of course still fairly basic, but we will add in more and more features over the coming lessons.

In the next chapter on data management, we will add additional steps for data validation and drift detection to our pipelines, which are important steps to ensure that our model receives the kind of data we expect and does not exhibit training-serving skew.