<a href="https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/kubeflow/run.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ZenML Quickstart Guide

Our goal here is to help you to get the first practical experience with our tool and give you a brief overview on some basic functionalities of ZenML. We'll create a training pipeline for the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and then later the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset developed by Zalando.

If you want to run this notebook in an interactive environment, feel free to run it in a [Google Colab](https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb) or view it on [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/quickstart) directly.


## Purpose

This quickstart guide is designed to provide a practical introduction to some of the main concepts and paradigms used by the ZenML framework. If you want more detail, our [full documentation](https://docs.zenml.io/) provides more on the concepts and how to implement them.

## Using Google Colab

You will want to use a GPU for this example. If you are following this quickstart in Google's Colab, follow these steps:

- Before running anything, you need to tell Colab that you want to use a GPU. You can do this by clicking on the ‘Runtime’ tab and selecting ‘Change runtime type’. A pop-up window will open up with a drop-down menu.
- Select ‘GPU’ from the menu and click ‘Save’.
- It may ask if you want to restart the runtime. If so, go ahead and do that.

<!-- The code for the MNIST training borrows heavily from [this](https://www.tensorflow.org/datasets/keras_example) -->

## Relation to quickstart.py
This notebook is a variant of [quickstart.py](https://github.com/zenml-io/zenml/blob/main/examples/quickstart/quickstart.py) which is shown off in the [ZenML Docs](https://docs.zenml.io). The core difference being it adds a modular aspect of the importer step and shows how to fetch pipelines, runs, and artifacts in the post-execution workflow.

## Install libraries

In [1]:
# Install the ZenML CLI tool and Tensorflow
!pip install zenml

Collecting zenml
  Downloading zenml-0.5.5-py3-none-any.whl (264 kB)
[?25l[K     |█▎                              | 10 kB 18.8 MB/s eta 0:00:01[K     |██▌                             | 20 kB 12.4 MB/s eta 0:00:01[K     |███▊                            | 30 kB 9.2 MB/s eta 0:00:01[K     |█████                           | 40 kB 8.1 MB/s eta 0:00:01[K     |██████▏                         | 51 kB 4.9 MB/s eta 0:00:01[K     |███████▍                        | 61 kB 5.6 MB/s eta 0:00:01[K     |████████▋                       | 71 kB 5.8 MB/s eta 0:00:01[K     |██████████                      | 81 kB 6.5 MB/s eta 0:00:01[K     |███████████▏                    | 92 kB 5.0 MB/s eta 0:00:01[K     |████████████▍                   | 102 kB 5.3 MB/s eta 0:00:01[K     |█████████████▋                  | 112 kB 5.3 MB/s eta 0:00:01[K     |██████████████▉                 | 122 kB 5.3 MB/s eta 0:00:01[K     |████████████████                | 133 kB 5.3 MB/s eta 0:00:01[K     

In [2]:
import os

os.environ['ZENML_DEBUG'] = 'true'
os.environ['ZENML_LOGGING_VERBOSITY'] = 'INFO'
os.environ['ZENML_ANALYTICS_OPT_IN'] = 'false'

In [7]:
!zenml integration get-requirements tensorflow

[36m[1m[4mREQUIREMENTS FOR TENSORFLOW:
[0m
[32m['tensorflow'][0m
[33m[1m
To install the dependencies of a specific integration, type: [0m
[33m[1mzenml integration install EXAMPLE_NAME[0m


Once the installation is completed, you can go ahead and create your first ZenML repository for your project. As ZenML repositories are built on top of Git repositories, you can create yours in a desired empty directory through:

In [8]:
# Initialize a git repository
!git init

# Initialize ZenML's .zen file
!zenml init

Initialized empty Git repository in /content/.git/
[32mInitializing at /content[0m
[32mZenML repo initialized at /content[0m


Now, the setup is completed. For the next steps, just make sure that you are executing the code within your ZenML repository.

## Import relevant packages

We will use pipelines and steps in to train our model.

In [9]:
import numpy as np
import tensorflow as tf

from zenml.pipelines import pipeline
from zenml.steps import step
from zenml.steps.base_step_config import BaseStepConfig
from zenml.steps.step_output import Output

## Define ZenML Steps

In the code that follows, you can see that we are defining the various steps of our pipeline. Each step is decorated with `@step`, the main abstraction that is currently available for creating pipeline steps.

The first step is an `import` step that downloads the MNIST dataset and samples the first hundred rows for demo purposes.

In [10]:
@step
def importer() -> Output(
    X_train=np.ndarray, y_train=np.ndarray, X_test=np.ndarray, y_test=np.ndarray
):
    """Download the MNIST data store it as numpy arrays."""
    (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
    return X_train, y_train, X_test, y_test

We then add a `Trainer` step, that takes the normalized data and trains a Keras classifier on the data. Note that the `Output[ModelArtifact]` type helps in writing the model out to our artifact store. 

In [11]:
class TrainerConfig(BaseStepConfig):
    """Trainer params"""

    epochs: int = 1
        
@step
def trainer(
    X_train: np.ndarray,
    y_train: np.ndarray,
    config: TrainerConfig,
) -> tf.keras.Model:
    """A simple Keras Model to train on the data."""
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    model.add(tf.keras.layers.Dense(10))

    model.compile(
        optimizer=tf.keras.optimizers.Adam(0.001),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=["accuracy"],
    )

    model.fit(X_train, y_train, epochs=config.epochs)

    # write model
    return model

Finally, we add an `Evaluator` step that takes as input the test set and the trained model and evaluates some final metrics.

In [12]:
@step
def evaluator(
    X_test: np.ndarray,
    y_test: np.ndarray,
    model: tf.keras.Model,
) -> float:
    """Calculate the accuracy on the test set"""
    test_acc = model.evaluate(X_test, y_test, verbose=2)
    return test_acc

## Define ZenML Pipeline

A pipeline is defined with the `@pipeline` decorator. This defines the various steps of the pipeline and specifies the dependencies between the steps, thereby determining the order in which they will be run.

In [13]:
@pipeline
def mnist_pipeline(
    importer,
    trainer,
    evaluator,
):
    """Links all the steps together in a pipeline"""
    X_train, y_train, X_test, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    evaluator(X_test=X_test, y_test=y_test, model=model)

## Run the pipeline

Running the pipeline is as simple as calling the `run()` method on an instance of the defined pipeline.

In [14]:
# Initialise the pipeline
first_pipeline = mnist_pipeline(
    importer=importer(),
    trainer=trainer(config=TrainerConfig(epochs=1)),
    evaluator=evaluator(),
)
first_pipeline.run()

[1;35mCreating pipeline: mnist_pipeline[0m
[1;35mCache enabled for pipeline `[0m[33;21mmnist_pipeline`[1;35m[0m
[1;35mUsing orchestrator `[0m[33;21mlocal_orchestrator`[1;35m for pipeline `[0m[33;21mmnist_pipeline`[1;35m. Running pipeline..[0m
[1;35mStep `[0m[33;21mimporter`[1;35m has started.[0m
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1;35mStep `[0m[33;21mimporter`[1;35m has finished in 1.905s.[0m
[1;35mStep `[0m[33;21mtrainer`[1;35m has started.[0m
INFO:tensorflow:Assets written to: /content/.zen/local_store/trainer/output/2/assets


INFO:tensorflow:Assets written to: /content/.zen/local_store/trainer/output/2/assets


[1;35mStep `[0m[33;21mtrainer`[1;35m has finished in 5.951s.[0m
[1;35mStep `[0m[33;21mevaluator`[1;35m has started.[0m
313/313 - 1s - loss: 6.3139 - accuracy: 0.8769 - 508ms/epoch - 2ms/step
[1;35mStep `[0m[33;21mevaluator`[1;35m has finished in 0.960s.[0m


## From MNIST to Fashion MNIST

We got pretty good results on the MNIST model that we trained, but maybe we want to see how a similar training pipeline would work on a different dataset.

You can see how easy it is to switch out one data import step and processing for another in our pipeline.

In [15]:
# Define a new modified import data step to download the Fashion MNIST model
@step
def importer_fashion_mnist() -> Output(
    X_train=np.ndarray, y_train=np.ndarray, X_test=np.ndarray, y_test=np.ndarray
):
    """Download the MNIST data store it as an artifact"""
    (X_train, y_train), (
        X_test,
        y_test,
    ) = tf.keras.datasets.fashion_mnist.load_data()
    return X_train, y_train, X_test, y_test

In [16]:
# Initialise a new pipeline
second_pipeline = mnist_pipeline(
    importer=importer_fashion_mnist(),
    trainer=trainer(config=TrainerConfig(epochs=1)),
    evaluator=evaluator(),
)

# Run the new pipeline
second_pipeline.run()

[1;35mCreating pipeline: mnist_pipeline[0m
[1;35mCache enabled for pipeline `[0m[33;21mmnist_pipeline`[1;35m[0m
[1;35mUsing orchestrator `[0m[33;21mlocal_orchestrator`[1;35m for pipeline `[0m[33;21mmnist_pipeline`[1;35m. Running pipeline..[0m
[1;35mStep `[0m[33;21mimporter_fashion_mnist`[1;35m has started.[0m
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
[1;35mStep `[0m[33;21mimporter_fashion_mnist`[1;35m has finished in 2.318s.[0m
[1;35mStep `[0m[33;21mtrainer`[1;35m has started.[0m
INFO:tensorflow:Assets written to: /content/.zen/local_store/trainer/output/5/assets


INFO:tensorflow:Assets written to: /content/.zen/local_store/trainer/output/5/assets


[1;35mStep `[0m[33;21mtrainer`[1;35m has finished in 4.942s.[0m
[1;35mStep `[0m[33;21mevaluator`[1;35m has started.[0m
313/313 - 0s - loss: 11.7584 - accuracy: 0.7952 - 453ms/epoch - 1ms/step
[1;35mStep `[0m[33;21mevaluator`[1;35m has finished in 1.128s.[0m


# Post execution workflow

In [17]:
from zenml.core.repo import Repository

## Get repo

In [18]:
repo = Repository()

## Pipelines 

In [19]:
pipelines = repo.get_pipelines()

## Retrieve the pipeline

In [20]:
mnist_pipeline = pipelines[0]

## Get the first run

In [21]:
runs = mnist_pipeline.runs  # chronologically ordered
mnist_run = runs[0]

## Get the second run

In [22]:
fashion_mnist_run = runs[1]

## Get the steps (note the first step name is different)

In [23]:
mnist_run.steps

[StepView(id=1, name='importer', parameters={}),
 StepView(id=2, name='trainer', parameters={'epochs': 1}),
 StepView(id=3, name='evaluator', parameters={})]

In [24]:
fashion_mnist_run.steps

[StepView(id=4, name='importer', parameters={}),
 StepView(id=5, name='trainer', parameters={'epochs': 1}),
 StepView(id=6, name='evaluator', parameters={})]

## Check the results of the evaluator and compare

In [25]:
mnist_eval_step = mnist_run.get_step(name='evaluator')
fashion_mnist_eval_step = fashion_mnist_run.get_step(name='evaluator')

In [26]:
# One output is simply called `output`, multiple is a dict called `outputs`.
mnist_eval_step.output.read()

[6.313859462738037, 0.8769000172615051]

In [27]:
fashion_mnist_eval_step.output.read()

[11.758384704589844, 0.795199990272522]

# Congratulations!

… and that's it for the quickstart. If you came here without a hiccup, you must have successly installed ZenML, set up a ZenML repo, configured a training pipeline, executed it and evaluated the results. And, this is just the tip of the iceberg on the capabilities of ZenML.

However, if you had a hiccup or you have some suggestions/questions regarding our framework, you can always check our [docs](https://docs.zenml.io/) or our [Github](https://github.com/zenml-io/zenml) or even better join us on our [Slack channel](https://zenml.io/slack-invite).

Cheers!

For more detailed information on all the components and steps that went into this short example, please continue reading [our more detailed documentation pages](https://docs.zenml.io/).