<a href="https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ZenML Quickstart Guide

Our goal here is to help you to get the first practical experience with our tool and give you a brief overview on some basic functionalities of ZenML. We'll create a training pipeline for the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and then later the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset developed by Zalando.

If you want to run this notebook in an interactive environment, feel free to run it in a [Google Colab](https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb) or view it on [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/quickstart) directly.


## Purpose

This quickstart guide is designed to provide a practical introduction to some of the main concepts and paradigms used by the ZenML framework. If you want more detail, our [full documentation](https://docs.zenml.io/) provides more on the concepts and how to implement them.

## Using Google Colab

You will want to use a GPU for this example. If you are following this quickstart in Google's Colab, follow these steps:

- Before running anything, you need to tell Colab that you want to use a GPU. You can do this by clicking on the ‘Runtime’ tab and selecting ‘Change runtime type’. A pop-up window will open up with a drop-down menu.
- Select ‘GPU’ from the menu and click ‘Save’.
- It may ask if you want to restart the runtime. If so, go ahead and do that.

<!-- The code for the MNIST training borrows heavily from [this](https://www.tensorflow.org/datasets/keras_example) -->

## Relation to quickstart.py
This notebook is a variant of [quickstart.py](https://github.com/zenml-io/zenml/blob/main/examples/quickstart/quickstart.py) which is shown off in the [ZenML Docs](https://docs.zenml.io). The core difference being it adds a modular aspect of the importer step and shows how to fetch pipelines, runs, and artifacts in the post-execution workflow.

## Install libraries

In [None]:
!zenml profile create quickstart
!zenml profile set quickstart

In [None]:
# Install the ZenML CLI tool and Tensorflow
!pip install zenml matplotlib

In [None]:
!zenml integration install sklearn mlflow evidently facets -f

In [None]:
from utils import display_restart_button

display_restart_button()

Once the installation is completed, you can go ahead and create your first ZenML repository for your project. As ZenML repositories are built on top of Git repositories, you can create yours in a desired empty directory through:

In [None]:
# Initialize a ZenML repository
!zenml init

Now, the setup is completed. For the next steps, just make sure that you are executing the code within your ZenML repository.

# Create a MLOps Stack

In [None]:
# Visualization of a stack

In [None]:
# Register the MLflow experiment tracker
!zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Add the MLflow experiment tracker into our default stack
!zenml stack update default -e mlflow_tracker

# Register the MLflow model deployer
!zenml model-deployer register mlflow_deployer --flavor=mlflow

# Add the MLflow model deployer into our default stack
!zenml stack update default -d mlflow_deployer

# Set it to default
!zenml stack set default

# Visualize it
!zenml stack describe

# Define Pipeline

In [None]:
# Visualization of a pipeline

## Import relevant packages

We will use pipelines and steps to train our model.

In [None]:
import numpy as np
import pandas as pd
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC
from zenml.integrations.sklearn.helpers.digits import get_digits
from zenml.steps import step, Output
from zenml.pipelines import pipeline
from zenml.integrations.mlflow.mlflow_step_decorator import enable_mlflow

from zenml.integrations.mlflow.steps import mlflow_model_deployer_step
from zenml.integrations.evidently.steps import (
    EvidentlyProfileConfig,
    EvidentlyProfileStep,
)
import mlflow

## Define ZenML Steps

In the code that follows, you can see that we are defining the various steps of our pipeline. Each step is decorated with `@step`, the main abstraction that is currently available for creating pipeline steps.

The first step is an `import` step that downloads the MNIST dataset and returns four numpy arrays as its output. 

In [None]:
@step
def importer() -> Output(
    X_train=np.ndarray,
    X_test=np.ndarray,
    y_train=np.ndarray,
    y_test=np.ndarray,
):
    """Load the digits dataset as numpy arrays."""
    X_train, X_test, y_train, y_test = get_digits()
    return X_train, X_test, y_train, y_test

We then add a `Trainer` step, that takes the imported data and trains a sklearn classifier on the data. Note that the model is not explicitly saved within the step. Under the hood ZenML uses Materializers to automatically persist the Artifacts that result from each step into the Artifact Store.

In [None]:
@enable_mlflow  # setup MLflow
@step(enable_cache=False)
def svc_trainer_mlflow(
    X_train: np.ndarray,
    y_train: np.ndarray,
) -> ClassifierMixin:
    """Train a sklearn SVC classifier and log to MLflow."""
    mlflow.sklearn.autolog()  # log all model hparams and metrics to MLflow
    model = SVC(gamma=0.01)
    model.fit(X_train, y_train)
    return model

Finally, we add an `Evaluator` step that takes as input the test set and the trained model and evaluates some final metrics.

In [None]:
@step
def evaluator(
    X_test: np.ndarray,
    y_test: np.ndarray,
    model: ClassifierMixin,
) -> float:
    """Calculate the test set accuracy of an sklearn model."""
    test_acc = model.score(X_test, y_test)
    print(f"Test accuracy: {test_acc}")
    return test_acc

Finally, we add an `Deployer` step that takes as input the test set and the trained model and evaluates some final metrics.

In [None]:
@step
def deployment_trigger(val_acc: float) -> bool:
    """Only deploy if the validation accuracy > 90%."""
    return val_acc > 0.9

We also add a drift detection step

In [None]:
@step
def get_reference_data(
    X_train: np.ndarray,
    X_test: np.ndarray,
) -> Output(reference=pd.DataFrame, comparison=pd.DataFrame):
    """Splits data for drift detection."""
    # X_train = _add_awgn(X_train)
    columns = [str(x) for x in list(range(X_train.shape[1]))]
    return pd.DataFrame(X_test, columns=columns), pd.DataFrame(
        X_train, columns=columns
    )

evidently_profile_config = EvidentlyProfileConfig(
    column_mapping=None, profile_sections=["datadrift"]
)

## Define ZenML Pipeline

A pipeline is defined with the `@pipeline` decorator. This defines the various steps of the pipeline and specifies the dependencies between the steps, thereby determining the order in which they will be run.

In [None]:
@pipeline(enable_cache=False)
def quickstart_pipeline(
    importer,
    trainer,
    evaluator,
    
    get_reference_data,
    drift_detector,
    
    deployment_trigger,
    model_deployer,
):
    """Train and deploy a model with MLflow."""
    X_train, X_test, y_train, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    test_acc = evaluator(X_test=X_test, y_test=y_test, model=model)
    
    reference, comparison = get_reference_data(X_train, X_test)
    drift_report, _ = drift_detector(reference, comparison)
    
    deployment_decision = deployment_trigger(test_acc)  # new
    model_deployer(deployment_decision, model)  # new

## Run the pipeline

Running the pipeline is as simple as calling the `run()` method on an instance of the defined pipeline. Here we explicitly name our pipeline run to make it easier to access later on. Be aware that you can only run the pipeline once with this name. To rerun, rename the the run, or remove the run name.

In [None]:
p = quickstart_pipeline(
    importer=importer(),
    trainer=svc_trainer_mlflow(),
    evaluator=evaluator(),
    
    get_reference_data=get_reference_data(),
    drift_detector=EvidentlyProfileStep(config=evidently_profile_config),
    
    deployment_trigger=deployment_trigger(),
    model_deployer=mlflow_model_deployer_step(),
)

p.run()

# Post-execution: After running the pipeline

## Fetch the latest pipeline run

First off, we load your repository: this is where all your pipelines live. This is how you get all of the pipelines within your repository. We could now just take the pipeline from above by index using `pipelines[0]`. 
Alternatively we can get our pipelines by name from our repo. The name of the pipeline defaults to the function name, if not specified. All runs are saved chronologically within the corresponding pipeline.

In [None]:
from zenml.repository import Repository

repo = Repository()
pipelines = repo.get_pipelines()
my_pipeline = repo.get_pipeline(pipeline_name="quickstart_pipeline")
latest_run = my_pipeline.runs[-1]

In [None]:
### Full explaination in Link to example

## Artifact Lineage and Caching

In [None]:
from zenml.integrations.dash.visualizers.pipeline_run_lineage_visualizer import (
    PipelineRunLineageVisualizer,
)

PipelineRunLineageVisualizer().visualize(latest_run)

In [None]:
### Full explaination in Link to example

## Statistics Visualization

In [None]:
from zenml.integrations.facets.visualizers.facet_statistics_visualizer import (
    FacetStatisticsVisualizer,
)

outputs = latest_run.get_step(name="get_reference_data")
FacetStatisticsVisualizer().visualize(outputs, magic=True)

In [None]:
### Full explaination in Link to example

## View pipelines as ML experiments

In [None]:
# This will start a serving process for mlflow
#  - if you want to continue in the notebook you need to manually
#  interrupt the kernel
from zenml.integrations.mlflow.mlflow_utils import get_tracking_uri
import IPython

get_ipython().system_raw(f'mlflow ui --backend-store-uri="{get_tracking_uri()}" --port=4997 &') 
display(IPython.display.HTML('<script src="https://localhost:4997"></script>'))

In [None]:
### Full explaination in Link to example

## Visualize Drift

In [None]:
from zenml.integrations.evidently.visualizers import EvidentlyVisualizer

drift_detection_step = latest_run.get_step(name="drift_detector")
EvidentlyVisualizer().visualize(drift_detection_step)

In [None]:
### Full explaination in Link to example

## Predict on Deployed Model

In [None]:
import matplotlib.pyplot as plt

model_deployer = repo.active_stack.model_deployer
services = model_deployer.find_model_server(
    pipeline_name="quickstart_pipeline",
    pipeline_step_name="mlflow_model_deployer_step",
    running=True,
)
service = services[0]
service.check_status()

X_test = latest_run.steps[0].outputs["X_test"].read()
y_test = latest_run.steps[0].outputs["y_test"].read()

plt.axis("off")
plt.imshow(X_test[0].reshape(8, 8), cmap=plt.cm.gray_r, interpolation="nearest")
pred0 = service.predict(X_test[0:1])
print(f"Model predicted {pred0}, true label was {y_test[0]}")

In [None]:
### Full explaination in Link to example

In [None]:
## Need to add here a bunch of stuff about 

* ZenBytes
* ZenFiles
* Running on a cloud stack

# Congratulations!

… and that's it for the quickstart. If you came here without a hiccup, you must have successly installed ZenML, set up a ZenML repo, configured a training pipeline, executed it and evaluated the results. And, this is just the tip of the iceberg on the capabilities of ZenML.

However, if you had a hiccup or you have some suggestions/questions regarding our framework, you can always check our [docs](https://docs.zenml.io/) or our [Github](https://github.com/zenml-io/zenml) or even better join us on our [Slack channel](https://zenml.io/slack-invite).

Cheers!

For more detailed information on all the components and steps that went into this short example, please continue reading [our more detailed documentation pages](https://docs.zenml.io/).