# Lesson 3.1: Drift Detection with Evidently

***Key Concepts:*** *Data Drift, Alerts, Evidently, Whylabs, Discord*

[Evidently](https://evidentlyai.com/) is an open source tool that allows you to easily compute drift on your data. [Here](https://blog.zenml.io/zenml-loves-evidently/) is a little blog post of ours that explains the evidently integration in a bit more detail. 

At its core, Evidently’s drift detection calculation functions take in a reference data set and compare it with a separate comparison dataset. These are both passed in as Pandas dataframes, though CSV inputs are also possible. ZenML implements this functionality in the form of several standardized steps along with an easy way to use the visualization tools also provided along with Evidently as ‘Dashboards’.


If you’re working on any kind of machine learning problem that has an ongoing training loop that takes in new data, you’ll want to guard against drift. Machine learning pipelines are built on top of data inputs, so it is worth checking for drift if you have a model that was trained on a certain distribution of data. The incoming data is something you have less control over and since things often change out in the real world, you should have a plan for knowing when things have shifted. Evidently offers a [growing set of features](https://github.com/evidentlyai/evidently) that help you monitor not only data drift but other key aspects like target drift and so on.

If you haven't done so, install Evidently by running the following cell, then restart your notebook kernel:

In [None]:
zenml integration install evidently -f

## Define a new pipeline with drift detector

![Pipeline2](_assets/chapter_1/second_pipeline.png "Pipeline")

In [None]:
import numpy as np
import pandas as pd
from zenml.steps import step, Output
from zenml.pipelines import pipeline
from src.steps.importer import importer
from src.steps.evaluator import evaluator
from src.steps.mlflow_trainer import svc_trainer_mlflow

In [None]:
@pipeline(enable_cache=False)
def digits_pipeline_with_drift(
    importer,
    trainer,
    evaluator,
    get_reference_data,
    drift_detector,
):
    """Links all the steps together in a pipeline"""
    X_train, X_test, y_train, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    evaluator(X_test=X_test, y_test=y_test, model=model)
    reference, comparison = get_reference_data(X_train, X_test)
    drift_detector(reference, comparison)

## Import the standard evidently step

In [None]:
from zenml.integrations.evidently.steps import (
    EvidentlyProfileConfig,
    EvidentlyProfileStep,
)

In [None]:
@step
def get_reference_data(
    X_train: np.ndarray,
    X_test: np.ndarray,
) -> Output(reference=pd.DataFrame, comparison=pd.DataFrame):
    """Splits data for drift detection."""
    # X_train = _add_awgn(X_train)
    columns = [str(x) for x in list(range(X_train.shape[1]))]
    return pd.DataFrame(X_test, columns=columns), pd.DataFrame(
        X_train, columns=columns
    )

In [None]:
evidently_profile_config = EvidentlyProfileConfig(
    column_mapping=None, profile_sections=["datadrift"]
)

second_pipeline = digits_pipeline_with_drift(
    importer=importer(),
    trainer=svc_trainer_mlflow(),
    evaluator=evaluator(),
    # EvidentlyProfileStep takes reference_dataset and comparison dataset
    get_reference_data=get_reference_data(),
    drift_detector=EvidentlyProfileStep(config=evidently_profile_config),
)
second_pipeline.run()

In [None]:
from zenml.integrations.evidently.visualizers import EvidentlyVisualizer
from zenml.repository import Repository

repo = Repository()
p = repo.get_pipeline("digits_pipeline_with_drift")
last_run = p.runs[-1]

drift_detection_step = last_run.get_step(name="drift_detector")
evidently_outputs = drift_detection_step

EvidentlyVisualizer().visualize(evidently_outputs)

TODO: Mention Whylabs integration

## Add alerts with Discord

MLOps promotes giving more visibility to your team about runs of pipelines. A good way to do that is to add a ChatOps step to your pipeline to ping some relevant results every time the pipeline is run. You can use a Discord webhook in your step for this.

## Create an alerter step in your pipeline

![Pipeline3](_assets/chapter_1/third_pipeline.png "Pipeline")

In [None]:
@pipeline
def digits_pipeline_with_drift_alert(
    importer,
    trainer,
    evaluator,
    get_reference_data,
    drift_detector,
    alerter,
):
    """Links all the steps together in a pipeline"""
    X_train, X_test, y_train, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    evaluator(X_test=X_test, y_test=y_test, model=model)

    reference, comparison = get_reference_data(X_train, X_test)
    drift_report, _ = drift_detector(reference, comparison)

    alerter(drift_report)

In [None]:
import requests
from zenml.steps import step
from zenml.environment import Environment

# This is a private ZenML Discord channel. We will get notified if you use
# this, but you won't be able to see it. Feel free to create a new Discord
# [webhook](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks)
# and replace this one!
DISCORD_URL = (
    "https://discord.com/api/webhooks/935835443826659339/Q32jTwmqc"
    "GJAUr-r_J3ouO-zkNQPchJHqTuwJ7dK4wiFzawT2Gu97f6ACt58UKFCxEO9"
)


@step(enable_cache=False)
def discord_alert(drift_report: dict) -> None:
    """Send a message to the discord channel to report drift.
    Args:
        deployment_decision: True if drift detected; false otherwise.
    """
    drift = drift_report["data_drift"]["data"]["metrics"]["dataset_drift"]
    url = DISCORD_URL

    env = Environment().step_environment
    env.pipeline_name, env.pipeline_run_id, env.step_name

    content = f"Message from pipeline: **{env.pipeline_name}**, run: **{env.pipeline_run_id}**, step: **{env.step_name}**"
    content += "\n\n"
    content += "Drift Detected!" if drift else "No Drift Detected!"

    data = {
        "content": content,
        "username": "Drift Bot",
    }
    result = requests.post(url, json=data)

    try:
        result.raise_for_status()
    except requests.exceptions.HTTPError as err:
        print(err)
    else:
        print(
            "Posted to discord successfully, code {}.".format(
                result.status_code
            )
        )
    print("Drift detected" if drift else "No Drift detected")

In [None]:
evidently_profile_config = EvidentlyProfileConfig(
    column_mapping=None, profile_sections=["datadrift"]
)

third_pipeline = digits_pipeline_with_drift_alert(
    importer=importer(),
    trainer=svc_trainer_mlflow(),
    evaluator=evaluator(),
    get_reference_data=get_reference_data(),
    drift_detector=EvidentlyProfileStep(config=evidently_profile_config),
    alerter=discord_alert(),
)
third_pipeline.run()

Let us recap what we have achieved in this chapter: We have learned how to add experiment trackers, model deployment, data drift detection, and automated chat alerts into our ML pipelines and have built our first MLOps stack that combines all these tools with the help of ZenML.

![MLflow Evidently Discord Pipeline](_assets/evidently+discord+mlflow.png)

The question that still remains is how we will now get this pipeline into a real production setting. That is what we will learn in the next chapter on advanced deployment, where we add additional components to our MLOps stack that will, in the end, allow us to run our entire pipeline as a serverless microservice in the cloud.