<a href="https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ZenML Quickstart Guide

Our goal here is to help you to get the first practical experience with our tool and give you a brief overview on some basic functionalities of ZenML.

The quickest way to get started is to create a simple pipeline. We'll be using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset (originally developed by Yann LeCun and others) digits, and then later the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset developed by Zalando.

If you want to run this notebook in an interactive environment, feel free to run it in a Google Colab.

## Purpose

This quickstart guide is designed to provide a practical introduction to some of the main concepts and paradigms used by the ZenML framework. If you want more detail, our [full documentation](https://docs.zenml.io/) provides more on the concepts and how to implement them.

## Using Google Colab

You will want to use a GPU for this example. If you are following this quickstart in Google's Colab, follow these steps:

- Before running anything, you need to tell Colab that you want to use a GPU. You can do this by clicking on the ‘Runtime’ tab and selecting ‘Change runtime type’. A pop-up window will open up with a drop-down menu.
- Select ‘GPU’ from the menu and click ‘Save’.
- It may ask if you want to restart the runtime. If so, go ahead and do that.

<!-- The code for the MNIST training borrows heavily from [this](https://www.tensorflow.org/datasets/keras_example) -->

## Install libraries

In [None]:
# Install the ZenML CLI tool and Tensorflow
!pip install zenml=="0.5.0" tensorflow

Once the installation is completed, you can go ahead and create your first ZenML repository for your project. As ZenML repositories are built on top of Git repositories, you can create yours in a desired empty directory through:

In [None]:
# Initialize a git repository
!git init

# Initialize ZenML's .zen file
!zenml init

Now, the setup is completed. For the next steps, just make sure that you are executing the code within your ZenML repository.

## Import relevant packages

We will use pipelines and steps in to train our model.

In [1]:
import numpy as np
import tensorflow as tf

from zenml.pipelines import pipeline
from zenml.steps import step
from zenml.steps.base_step_config import BaseStepConfig
from zenml.steps.step_output import Output

2021-10-15 15:46:39,543 — zenml.materializers.default_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.beam_materializer.BeamMaterializer'> for <class 'apache_beam.pipeline.Pipeline'>
2021-10-15 15:46:39,544 — zenml.materializers.default_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.beam_materializer.BeamMaterializer'> for apache_beam.pvalue.PCollection
2021-10-15 15:46:39,545 — zenml.materializers.default_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.built_in_materializer.BuiltInMaterializer'> for <class 'int'>
2021-10-15 15:46:39,545 — zenml.materializers.default_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.built_in_materializer.BuiltInMaterializer'> for <class 'str'>
2021-10-15 15:46:39,546 — zenml.materializers.default_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.built_in_materializer.BuiltInMate

## Define ZenML Steps

In the code that follows, you can see that we are defining the various steps of our pipeline. Each step is decorated with `@step`, the main abstraction that is currently available for creating pipeline steps.

The first step is an `import` step that downloads the MNIST dataset and samples the first hundred rows for demo purposes.

In [2]:
@step
def importer_mnist() -> Output(
    X_train=np.ndarray, y_train=np.ndarray, X_test=np.ndarray, y_test=np.ndarray
):
    """Download the MNIST data store it as an artifact"""
    (X_train, y_train), (
        X_test,
        y_test,
    ) = tf.keras.datasets.mnist.load_data()
    return X_train, y_train, X_test, y_test

2021-10-15 15:46:41,341 — zenml.steps.base_step — DEBUG — Registering class importer_mnist, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fca00d33f60>}
2021-10-15 15:46:41,344 — zenml.steps.base_step — DEBUG — importer_mnist args: []


Secondly, we normalize all images.

In [3]:
@step
def normalizer(
    X_train: np.ndarray, X_test: np.ndarray
) -> Output(X_train_normed=np.ndarray, X_test_normed=np.ndarray):
    """Normalize the values for all the images so they are between 0 and 1"""
    X_train_normed = X_train / 255.0
    X_test_normed = X_test / 255.0
    return X_train_normed, X_test_normed

2021-10-15 15:46:41,359 — zenml.steps.base_step — DEBUG — Registering class normalizer, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fca00d33cc0>}
2021-10-15 15:46:41,360 — zenml.steps.base_step — DEBUG — normalizer args: ['X_train', 'X_test']


We then add a `Trainer` step, that takes the normalized data and trains a Keras classifier on the data. Note that the `Output[ModelArtifact]` type helps in writing the model out to our artifact store. 

In [4]:
class TrainerConfig(BaseStepConfig):
    """Trainer params"""

    epochs: int = 1
        
@step
def trainer(
    config: TrainerConfig,
    X_train: np.ndarray,
    y_train: np.ndarray,
) -> tf.keras.Model:
    """Train a neural net from scratch to recognise MNIST digits return our
    model or the learner"""
    model = tf.keras.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(10, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(0.001),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=["accuracy"],
    )

    model.fit(
        X_train,
        y_train,
        epochs=config.epochs,
    )

    # write model
    return model

2021-10-15 15:46:41,371 — zenml.steps.base_step — DEBUG — Registering class trainer, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fca00cf2e48>}
2021-10-15 15:46:41,373 — zenml.steps.base_step — DEBUG — trainer args: ['config', 'X_train', 'y_train']


Finally, we add an `Evaluator` step that takes as input the test set and the trained model and evaluates some final metrics.

In [5]:
@step
def evaluator(
    X_test: np.ndarray,
    y_test: np.ndarray,
    model: tf.keras.Model,
) -> np.ndarray:
    """Calculate the loss for the model for each epoch in a graph"""

    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
    return np.array([test_loss, test_acc])

2021-10-15 15:46:41,383 — zenml.steps.base_step — DEBUG — Registering class evaluator, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fca00cf2630>}
2021-10-15 15:46:41,384 — zenml.steps.base_step — DEBUG — evaluator args: ['X_test', 'y_test', 'model']


## Define ZenML Pipeline

A pipeline is defined with the `@pipeline` decorator. This defines the various steps of the pipeline and specifies the dependencies between the steps, thereby determining the order in which they will be run.

In [6]:
# Define the pipeline

@pipeline
def mnist_pipeline(
    importer,
    normalizer: normalizer,
    trainer,
    evaluator,
):
    # Link all the steps artifacts together
    X_train, y_train, X_test, y_test = importer()
    X_trained_normed, X_test_normed = normalizer(X_train=X_train, X_test=X_test)
    model = trainer(X_train=X_trained_normed, y_train=y_train)
    evaluator(X_test=X_test_normed, y_test=y_test, model=model)

## Initialise a Pipeline Run

Here we initialise a run of our `MNISTTrainingPipeline`.

In [7]:
# Initialise the pipeline
first_pipeline = mnist_pipeline(
    importer=importer_mnist(),
    normalizer=normalizer(),
    trainer=trainer(config=TrainerConfig(epochs=1)),
    evaluator=evaluator(),
)

2021-10-15 15:46:41,400 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/zenservice.json
2021-10-15 15:46:41,404 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/.config/zenml/.zenglobal.json
2021-10-15 15:46:41,405 — zenml.core.local_service — DEBUG — Fetching stack with key local_stack


## Run the Pipeline

Running the pipeline is as simple as calling the `run()` method on the defined pipeline.

In [8]:
# Run the pipeline
first_pipeline.run()

2021-10-15 15:46:41,411 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/zenservice.json
2021-10-15 15:46:41,415 — zenml.core.local_service — DEBUG — Fetching orchestrator with key local_orchestrator
2021-10-15 15:46:41,415 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-15 15:46:41,417 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/orchestrators/890a0967-ebac-4d94-8646-c019775948cc.json
2021-10-15 15:46:41,418 — zenml.materializers.spec_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.numpy_materializer.NumpyMaterializer'> for X_train
2021-10-15 15:46:41,419 — zenml.materializers.spec_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.numpy_materializer.NumpyMaterializer'> for y_train
2021-10-15 15:46:41,419 — zenml.materializers.spec_mate

## From MNIST to Fashion MNIST

We got pretty good results on the MNIST model that we trained, but maybe we want to see how a similar training pipeline would work on a different dataset.

You can see how easy it is to switch out one data import step and processing for another in our pipeline.

In [9]:
# Define a new modified import data step to download the Fashion MNIST model
@step
def importer_fashion_mnist() -> Output(
    X_train=np.ndarray, y_train=np.ndarray, X_test=np.ndarray, y_test=np.ndarray
):
    """Download the MNIST data store it as an artifact"""
    (X_train, y_train), (
        X_test,
        y_test,
    ) = tf.keras.datasets.fashion_mnist.load_data()
    return X_train, y_train, X_test, y_test

2021-10-15 15:46:41,668 — zenml.steps.base_step — DEBUG — Registering class importer_fashion_mnist, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fca00461710>}
2021-10-15 15:46:41,671 — zenml.steps.base_step — DEBUG — importer_fashion_mnist args: []


In [10]:
# Initialise a new pipeline
second_pipeline = mnist_pipeline(
    importer=importer_fashion_mnist(),
    normalizer=normalizer(),
    trainer=trainer(config=TrainerConfig(epochs=1)),
    evaluator=evaluator(),
)

2021-10-15 15:46:41,725 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/zenservice.json
2021-10-15 15:46:41,739 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/.config/zenml/.zenglobal.json
2021-10-15 15:46:41,744 — zenml.core.local_service — DEBUG — Fetching stack with key local_stack


In [11]:
# Run the new pipeline
second_pipeline.run()

2021-10-15 15:46:41,756 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/zenservice.json
2021-10-15 15:46:41,763 — zenml.core.local_service — DEBUG — Fetching orchestrator with key local_orchestrator
2021-10-15 15:46:41,764 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-15 15:46:41,766 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/orchestrators/890a0967-ebac-4d94-8646-c019775948cc.json
2021-10-15 15:46:41,768 — zenml.materializers.spec_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.numpy_materializer.NumpyMaterializer'> for X_train
2021-10-15 15:46:41,768 — zenml.materializers.spec_materializer_registry — DEBUG — Registered materializer <class 'zenml.materializers.numpy_materializer.NumpyMaterializer'> for y_train
2021-10-15 15:46:41,769 — zenml.materializers.spec_mate

… and that's it for the quickstart. If you came here without a hiccup, you must have successly installed ZenML, set up a ZenML repo, configured a training pipeline, executed it and evaluated the results. And, this is just the tip of the iceberg on the capabilities of ZenML.

However, if you had a hiccup or you have some suggestions/questions regarding our framework, you can always check our [docs](https://docs.zenml.io/) or our [Github](https://github.com/zenml-io/zenml) or even better join us on our [Slack channel](https://zenml.io/slack-invite).

Cheers!

For more detailed information on all the components and steps that went into this short example, please continue reading [our more detailed documentation pages](https://docs.zenml.io/).

# Post execution workflow

In [12]:
from zenml.core.repo import Repository

## Get repo

In [13]:
repo = Repository()

2021-10-15 15:46:44,038 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/zenservice.json


## Pipelines 

In [14]:
pipelines = repo.get_pipelines()

2021-10-15 15:46:44,308 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/.config/zenml/.zenglobal.json
2021-10-15 15:46:44,319 — zenml.utils.analytics_utils — DEBUG — Analytics opt-in: False.
2021-10-15 15:46:44,327 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/.config/zenml/.zenglobal.json
2021-10-15 15:46:44,334 — zenml.core.local_service — DEBUG — Fetching stack with key local_stack
2021-10-15 15:46:44,343 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/zenservice.json
2021-10-15 15:46:44,358 — zenml.core.local_service — DEBUG — Fetching metadata store with key local_metadata_store
2021-10-15 15:46:44,359 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-15 15:46:44,363 — zenml.core.utils — DEBUG — Parsing file: /home/hamza/workspace/maiot/github_temp/zenml/.zen/metadata_stores/1afb6c7a-cd33-415e-9289-6c85c9ad20f4.json
2021-10-15 15:46:

In [15]:
mnist_pipeline = pipelines[0]

In [16]:
runs = mnist_pipeline.get_runs()
run = runs[0]

2021-10-15 15:46:44,692 — zenml.metadata.base_metadata_store — DEBUG — Fetched 10 pipeline runs for pipeline named 'mnist_pipeline'.


In [17]:
steps = run.steps
steps

2021-10-15 15:46:44,764 — zenml.metadata.base_metadata_store — DEBUG — Fetched 4 steps for pipeline run '2021-10-15T14:28:29.388282'.


[StepView(id=1, name='importer_mnist', parameters={}),
 StepView(id=2, name='normalizer', parameters={}),
 StepView(id=3, name='trainer', parameters={'epochs': '1'}),
 StepView(id=4, name='evaluator', parameters={})]

In [21]:
eval_step = steps[3]
eval_step

StepView(id=4, name='evaluator', parameters={})

In [22]:
evaluator_output = eval_step.outputs[0]
evaluator_output

2021-10-15 15:46:58,770 — zenml.metadata.base_metadata_store — DEBUG — Fetched 3 inputs and 1 outputs for step 'evaluator'.


ArtifactView(id=8, type='BaseArtifact', uri='/home/hamza/workspace/maiot/github_temp/zenml/.zen/local_store/evaluator/output/4', materializer='zenml.materializers.numpy_materializer.NumpyMaterializer@zenml_0.5.0rc2')

In [24]:
evaluator_output.read(np.ndarray)

2021-10-15 15:47:07,083 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-15 15:47:07,086 — zenml.post_execution.artifact — DEBUG — Using 'NumpyMaterializer' to read 'BaseArtifact' (uri: /home/hamza/workspace/maiot/github_temp/zenml/.zen/local_store/evaluator/output/4).


array([0.305024  , 0.91329998])