<a href="https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/Quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ZenML Quickstart Guide

Our goal here is to help you to get the first practical experience with our tool and give you a brief overview on some basic functionalities of ZenML.

The quickest way to get started is to create a simple pipeline. We'll be using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset (originally developed by Yann LeCun and others) digits, and then later the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset developed by Zalando.

If you want to run this notebook in an interactive environment, feel free to run it in a Google Colab.

## Purpose

This quickstart guide is designed to provide a practical introduction to some of the main concepts and paradigms used by the ZenML framework. If you want more detail, our [full documentation](https://docs.zenml.io/) provides more on the concepts and how to implement them.

## Using Google Colab

You will want to use a GPU for this example. If you are following this quickstart in Google's Colab, follow these steps:

- Before running anything, you need to tell Colab that you want to use a GPU. You can do this by clicking on the ‘Runtime’ tab and selecting ‘Change runtime type’. A pop-up window will open up with a drop-down menu.
- Select ‘GPU’ from the menu and click ‘Save’.
- It may ask if you want to restart the runtime. If so, go ahead and do that.

<!-- The code for the MNIST training borrows heavily from [this](https://www.tensorflow.org/datasets/keras_example) -->

## Install libraries

In [None]:
# Install the ZenML CLI tool and Tensorflow
!pip install zenml=="0.5.0rc2" tensorflow

Collecting zenml==0.5.0rc2
  Downloading zenml-0.5.0rc2-py3-none-any.whl (121 kB)
[K     |████████████████████████████████| 121 kB 5.3 MB/s 
Collecting pydantic<2.0.0,>=1.8.2
  Downloading pydantic-1.8.2-cp37-cp37m-manylinux2014_x86_64.whl (10.1 MB)
[K     |████████████████████████████████| 10.1 MB 43.4 MB/s 
[?25hCollecting click<9.0.0,>=8.0.1
  Downloading click-8.0.1-py3-none-any.whl (97 kB)
[K     |████████████████████████████████| 97 kB 7.1 MB/s 
[?25hCollecting analytics-python<2.0.0,>=1.4.0
  Downloading analytics_python-1.4.0-py2.py3-none-any.whl (15 kB)
Collecting ml-pipelines-sdk<2.0.0,>=1.2.0
  Downloading ml_pipelines_sdk-1.2.0-py3-none-any.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 80.0 MB/s 
Collecting apache-beam<3.0.0,>=2.30.0
  Downloading apache_beam-2.32.0-cp37-cp37m-manylinux2010_x86_64.whl (9.8 MB)
[K     |████████████████████████████████| 9.8 MB 23.6 MB/s 
[?25hCollecting gitpython<4.0.0,>=3.1.18
  Downloading GitPython-3.1.24-py3-none-a

Once the installation is completed, you can go ahead and create your first ZenML repository for your project. As ZenML repositories are built on top of Git repositories, you can create yours in a desired empty directory through:

In [None]:
# Initialize a git repository
!git init

# Initialize ZenML's .zen file
!zenml init

Initialized empty Git repository in /content/.git/
2021-10-05 09:50:54,066 — zenml.steps.base_step — DEBUG — Registering class BaseStep, bases: (), dct: {'__module__': 'zenml.steps.base_step', '__qualname__': 'BaseStep', '__doc__': 'The base implementation of a ZenML Step which will be inherited by all\n    the other step implementations', '__init__': <function BaseStep.__init__ at 0x7fac84d86f80>, 'component': <property object at 0x7fac84c77890>, '__call__': <function BaseStep.__call__ at 0x7fac84d91440>, '__getattr__': <function BaseStep.__getattr__ at 0x7fac84c7d050>, 'process': <function BaseStep.process at 0x7fac84c7d0e0>}
2021-10-05 09:50:54,066 — zenml.steps.base_step — DEBUG — BaseStep args: ['self']
2021-10-05 09:50:54,152 — zenml.orchestrators — DEBUG — Airflow not installed.
2021-10-05 09:50:54,676 — zenml.logger — DEBUG — Logging set to level: DEBUG
2021-10-05 09:50:54,676 — zenml.logger — DEBUG — Logging set to level: DEBUG
2021-10-05 09:50:54,676 — zenml.logger — DEBUG — 

Now, the setup is completed. For the next steps, just make sure that you are executing the code within your ZenML repository.

## Import relevant packages

We will use pipelines and steps in to train our model.

In [None]:
from typing import List

import numpy as np
import tensorflow as tf

from zenml.annotations import Input, Output, Step
from zenml.artifacts import DataArtifact, ModelArtifact
from zenml.pipelines import pipeline
from zenml.steps import step

2021-10-05 09:51:03,284 — zenml.steps.base_step — DEBUG — Registering class BaseStep, bases: (), dct: {'__module__': 'zenml.steps.base_step', '__qualname__': 'BaseStep', '__doc__': 'The base implementation of a ZenML Step which will be inherited by all\n    the other step implementations', '__init__': <function BaseStep.__init__ at 0x7fcea59a7dd0>, 'component': <property object at 0x7fcea5848ef0>, '__call__': <function BaseStep.__call__ at 0x7fcea59b1290>, '__getattr__': <function BaseStep.__getattr__ at 0x7fcea5858e60>, 'process': <function BaseStep.process at 0x7fcea5858ef0>}
2021-10-05 09:51:03,286 — zenml.steps.base_step — DEBUG — BaseStep args: ['self']
2021-10-05 09:51:03,388 — zenml.orchestrators — DEBUG — Airflow not installed.
2021-10-05 09:51:03,714 — zenml.logger — DEBUG — Logging set to level: DEBUG


## Define ZenML Steps

In the code that follows, you can see that we are defining the various steps of our pipeline. Each step is decorated with `@step`, the main abstraction that is currently available for creating pipeline steps.

The first step is an `import` step that downloads the MNIST dataset and samples the first hundred rows for demo purposes.

In [None]:
@step(name="import_basic_mnist")
def ImportDataStep() -> List[float]:
    """Download the MNIST data store it as an artifact"""
    (X_train, y_train), (
        X_test,
        y_test,
    ) = tf.keras.datasets.mnist.load_data()
    return [
        X_train.tolist()[0:100],
        y_train.tolist()[0:100],
        X_test.tolist()[0:100],
        y_test.tolist()[0:100],
    ]

2021-10-05 09:51:07,302 — zenml.steps.base_step — DEBUG — Registering class import_basic_mnist, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fcea536b310>}
2021-10-05 09:51:07,306 — zenml.steps.base_step — DEBUG — import_basic_mnist args: []


Secondly, we normalize all images.

In [None]:
@step(name="normalize")
def NormalizeDataStep(data: Input[DataArtifact]) -> List[float]:
    """Normalize the values for all the images so they are between 0 and 1"""
    import_data = data.materializers.json.read_file()
    X_train_normed = np.array(import_data[0]) / 255.0
    X_test_normed = np.array(import_data[2]) / 255.0
    return [
        X_train_normed.tolist(),
        import_data[1],
        X_test_normed.tolist(),
        import_data[3],
    ]

2021-10-05 09:51:08,895 — zenml.steps.base_step — DEBUG — Registering class normalize, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fcea4ea7250>}
2021-10-05 09:51:08,898 — zenml.steps.base_step — DEBUG — normalize args: ['data']


We then add a `Trainer` step, that takes the normalized data and trains a Keras classifier on the data. Note that the `Output[ModelArtifact]` type helps in writing the model out to our artifact store. 

In [None]:
@step(name="trainer")
def MNISTTrainModelStep(
    data: Input[DataArtifact],
    model_artifact: Output[ModelArtifact],
    epochs: int,
):
    """Train a neural net from scratch to recognise MNIST digits return our
    model or the learner"""
    import_data = data.materializers.json.read_file()

    model = tf.keras.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(10, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(0.001),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=["accuracy"],
    )

    model.fit(
        import_data[0],
        import_data[1],
        epochs=epochs,
    )

    # write model
    model_artifact.materializers.keras.write_model(model)


2021-10-05 09:51:12,545 — zenml.steps.base_step — DEBUG — Registering class trainer, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fcead86bf50>}
2021-10-05 09:51:12,547 — zenml.steps.base_step — DEBUG — trainer args: ['data', 'model_artifact', 'epochs']


Finally, we add an `Evaluator` step that takes as input the test set and the trained model and evaluates some final metrics.

In [None]:
@step(name="evaluate")
def EvaluateModelStep(
    data: Input[DataArtifact], model_artifact: Input[ModelArtifact]
) -> List[float]:
    """Calculate the loss for the model for each epoch in a graph"""
    model = model_artifact.materializers.keras.read_model()
    import_data = data.materializers.json.read_file()

    test_loss, test_acc = model.evaluate(
        import_data[2], import_data[3], verbose=2
    )
    return [test_loss, test_acc]

2021-10-05 09:51:15,271 — zenml.steps.base_step — DEBUG — Registering class evaluate, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fcea534fa50>}
2021-10-05 09:51:15,272 — zenml.steps.base_step — DEBUG — evaluate args: ['data', 'model_artifact']


## Define ZenML Pipeline

A pipeline is defined with the `@pipelines.SimplePipeline` decorator. This defines the various steps of the pipeline and specifies the dependencies between the steps, thereby determining the order in which they will be run.

In [None]:
# Define the pipeline

@pipeline("mnist")
def MNISTTrainingPipeline(
    import_data: Step[ImportDataStep],
    normalize_data: Step[NormalizeDataStep],
    trainer: Step[MNISTTrainModelStep],
    evaluator: Step[EvaluateModelStep],
):
    # Link all the steps artifacts together
    normalize_data(data=import_data.outputs.return_output)
    trainer(data=normalize_data.outputs.return_output)
    evaluator(
        data=normalize_data.outputs.return_output,
        model_artifact=trainer.outputs.model_artifact,
    )

## Initialise a Pipeline Run

Here we initialise a run of our `MNISTTrainingPipeline`, passing in the URI for the dataset we wish to download. In our case this is the MNIST digits dataset.

In [None]:
# Initialise the pipeline
mnist_trainer = MNISTTrainingPipeline(
    import_data=ImportDataStep(),
    normalize_data=NormalizeDataStep(),
    trainer=MNISTTrainModelStep(epochs=10),
    evaluator=EvaluateModelStep(),
)

2021-10-05 09:51:24,816 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/zenservice.json
2021-10-05 09:51:24,822 — zenml.core.utils — DEBUG — Parsing file: /root/.config/zenml/.zenglobal.json
2021-10-05 09:51:24,825 — zenml.core.local_service — DEBUG — Fetching stack with key local_stack


## Run the Pipeline

Running the pipeline is as simple as calling the `run()` method on the defined pipeline.

In [None]:
# Run the pipeline
mnist_trainer.run()

2021-10-05 09:51:28,029 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/zenservice.json
2021-10-05 09:51:28,036 — zenml.core.local_service — DEBUG — Fetching orchestrator with key local_orchestrator
2021-10-05 09:51:28,037 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-05 09:51:28,042 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/orchestrators/ede3cd3a-ccd4-4da7-9806-9d374f275bdb.json
2021-10-05 09:51:28,047 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/zenservice.json
2021-10-05 09:51:28,053 — zenml.core.local_service — DEBUG — Fetching artifact_store with key local_artifact_store
2021-10-05 09:51:28,054 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-05 09:51:28,057 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/artifact_stores/e7f1dfad-0fcb-418c-80f3-dc663bb4cc29.j

2021-10-05 09:51:36,032 — tensorflow — INFO — Assets written to: /content/.zen/local_store/trainer/model_artifact/3/assets


4/4 - 0s - loss: 1.9763 - accuracy: 0.3200


## From MNIST to Fashion MNIST

We got pretty good results on the MNIST model that we trained, but maybe we want to see how a similar training pipeline would work on a different dataset.

You can see how easy it is to switch out one data import step and processing for another in our pipeline.

In [None]:
# Define a new modified import data step to download the Fashion MNIST model
@step(name="import_fashion_mnist")
def ImportDataStep() -> List[float]:
    """Download the Fashion MNIST data store it as an artifact"""
    (X_train, y_train), (
        X_test,
        y_test,
    ) = tf.keras.datasets.fashion_mnist.load_data()  # CHANGING to fashion
    return [
        X_train.tolist()[0:100],
        y_train.tolist()[0:100],
        X_test.tolist()[0:100],
        y_test.tolist()[0:100],
    ]

2021-10-05 09:51:41,588 — zenml.steps.base_step — DEBUG — Registering class import_fashion_mnist, bases: (<class 'zenml.steps.base_step.BaseStep'>,), dct: {'process': <staticmethod object at 0x7fce9cbd74d0>}
2021-10-05 09:51:41,590 — zenml.steps.base_step — DEBUG — import_fashion_mnist args: []


In [None]:
# Initialise a new pipeline
fashion_mnist_trainer = MNISTTrainingPipeline(
    import_data=ImportDataStep(),
    normalize_data=NormalizeDataStep(),
    trainer=MNISTTrainModelStep(epochs=10),
    evaluator=EvaluateModelStep(),
)

2021-10-05 09:51:46,493 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/zenservice.json
2021-10-05 09:51:46,503 — zenml.core.utils — DEBUG — Parsing file: /root/.config/zenml/.zenglobal.json
2021-10-05 09:51:46,508 — zenml.core.local_service — DEBUG — Fetching stack with key local_stack


In [None]:
# Run the new pipeline

fashion_mnist_trainer.run()

2021-10-05 09:51:51,030 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/zenservice.json
2021-10-05 09:51:51,036 — zenml.core.local_service — DEBUG — Fetching orchestrator with key local_orchestrator
2021-10-05 09:51:51,039 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-05 09:51:51,042 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/orchestrators/ede3cd3a-ccd4-4da7-9806-9d374f275bdb.json
2021-10-05 09:51:51,047 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/zenservice.json
2021-10-05 09:51:51,051 — zenml.core.local_service — DEBUG — Fetching artifact_store with key local_artifact_store
2021-10-05 09:51:51,055 — zenml.utils.source_utils — DEBUG — Unpinned step found with no git sha. Attempting to load class from current repository state.
2021-10-05 09:51:51,058 — zenml.core.utils — DEBUG — Parsing file: /content/.zen/artifact_stores/e7f1dfad-0fcb-418c-80f3-dc663bb4cc29.j

2021-10-05 09:51:58,158 — tensorflow — INFO — Assets written to: /content/.zen/local_store/trainer/model_artifact/7/assets


4/4 - 0s - loss: 1.6853 - accuracy: 0.3300


… and that's it for the quickstart. If you came here without a hiccup, you must have successly installed ZenML, set up a ZenML repo, configured a training pipeline, executed it and evaluated the results. And, this is just the tip of the iceberg on the capabilities of ZenML.

However, if you had a hiccup or you have some suggestions/questions regarding our framework, you can always check our docs or our github or even better join us on our Slack channel.

Cheers!

For more detailed information on all the components and steps that went into this short example, please continue reading [our more detailed documentation pages](https://docs.zenml.io/).