# ZenML Quickstart Guide

<a href="https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

ZenML is an extensible, open-source MLOps framework for creating portable, 
production-ready machine learning pipelines. By decoupling infrastructure from 
code, ZenML enables developers across your organization to collaborate more 
effectively as they develop to production.

![ZenMl Overview](_assets/zenml_overview.png)

Let's see it in action and use ZenML to deploy an LLM into production. 
As an example, we will first train and deploy a simple LLM locally. 
Then we will switch the entire workflow to a production environment in the cloud 
that will automatically train the model on GPU-enabled hardware and deploy it to
a scalable Kubernetes cluster.

## 1. Install Requirements

Let's install ZenML to get started. 

You might notice we also install some *integrations* and *hub plugins* here, 
these terms will be explained in more detail later when thez are used.

In [None]:
%pip install "zenml[server]" gradio  # install ZenML and Gradio
!zenml integration install pytorch mlflow -y  # install ZenML integrations
!zenml hub install mingpt_example mlflow_steps -y  # install ZenML Hub plugins
!zenml init  # Initialize a ZenML repository
%pip install pyngrok pyparsing==2.4.2  # required for Colab

import IPython

# automatically restart kernel
IPython.Application.instance().kernel.do_shutdown(restart=True)

Please wait for the installation to complete before running subsequent cells. At the end of the installation, the notebook kernel will automatically restart.

## 2. Train and Deploy an LLM Locally

TODO: motivate task, finetuning vs. prompt engineering?

We will use Andrej Karpathy's [minGPT](https://github.com/karpathy/minGPT/tree/master/mingpt),
which we previously installed from the [ZenML Hub TODO](TODO). The ZenML Hub is
a place so dark... TODO

Since training a multi-billion parameter model is not efficient to do locally,
we will use the smallest GPT model `gpt-nano` for local training to validate our
setup works. But don't worry, we will switch to a larger model later when we can
train on a GPU in the cloud.

To start, let's define a Python function to load the `gpt-nano` model and
decorate it with a `@step` decorator. This is enough to turn the function into a
ZenML step which can now be executed on any infrastructure, as we will see later.

In [None]:
from zenml.hub.mingpt_example.mingpt.model import GPT
from zenml import step

@step
def mingpt_model_loader_step(model_type="gpt-nano") -> GPT:
    model_config = GPT.get_default_config()
    model_config.model_type = model_type
    model_config.vocab_size = 50257  # openai's model vocabulary
    model_config.block_size = 1024  # openai's model block_size
    model = GPT(model_config)
    return model

Note that we do not need to save the model within the step explicitly; 
ZenML is automatically taking care of this for us. 
Under the hood, ZenML persists all step inputs and outputs in an 
[Artifact Store](https://docs.zenml.io/component-gallery/artifact-stores). 
This also means that all of our data and models are automatically versioned and 
tracked.

The `mingpt_example` plugin also contains a PyTorch dataset that can load the 
text of any website as lists of word embeddings to be used for language model 
training. Let us now define a ZenML step that uses this dataset to load a
website of our choice for LLM training.

For this example, we will use the BBC website but feel free to change it to a
website of your choice!

In [None]:
from zenml.hub.mingpt_example.dataset import UrlTokenDataset


@step
def url_dataset_loader_step(urls=["https://www.bbc.com/"]) -> UrlTokenDataset:
    return UrlTokenDataset(urls=urls)

Next, let's define a step that uses the `mingpt` trainer class to train the 
`gpt-nano` model on our dataset and logs the model and metrics to MLflow:

In [None]:
import mlflow
from torch.nn import Module
from torch.utils.data import Dataset

from zenml.hub.mingpt_example.mingpt.trainer import Trainer


@step(experiment_tracker="mlflow_tracker")
def mingpt_trainer_step(
    dataset: Dataset, 
    model: Module,
    max_iters=2000,
    learning_rate=5e-4,
) -> Module:
    
    train_config = Trainer.get_default_config()
    train_config.learning_rate = learning_rate
    train_config.max_iters = max_iters
    train_config.device = "mps"  # TODO
    trainer = Trainer(train_config, model, dataset)

    def batch_end_callback(trainer):
        if trainer.iter_num % 100 == 0:
            mlflow.log_metric(
                "train_loss", trainer.loss.item(), step=trainer.iter_num
            )
            print(
                f"iter_dt {trainer.iter_dt * 1000:.2f}ms; "
                f"iter {trainer.iter_num}: train loss {trainer.loss.item():.5f}"
            )
    trainer.set_callback("on_batch_end", batch_end_callback)

    trainer.run()

    mlflow.pytorch.log_model(model, "model")

    return model

Next, let's add a step that deploys our model as prediction service using MLflow.
The ZenML Hub already contains steps for this which we can simply import and use:

In [None]:
from zenml.integrations.mlflow.steps.mlflow_deployer import mlflow_model_deployer_step

We can now combine all these steps into a ZenML Pipeline by defining a simple
Python function decorated with ZenML's `@pipeline` decorator.

In [None]:
from zenml import pipeline

@pipeline(enable_cache=False)
def training_pipeline():
    """Train, evaluate, and deploy a model."""
    dataset = url_dataset_loader_step()
    model = mingpt_model_loader_step()
    model = mingpt_trainer_step(dataset, model)
    mlflow_model_deployer_step(model=model)

To run this pipeline, we will first need to setup MLflow.

The `zenml integration install mlflow` command that we ran in the beginning of
the notebook already installed MLflow for us together with ZenML's [MLflow integration TODO](TODO),
which contains an MLflow experiment tracker, model deployer, as well as a model
registry.

Similarly to how we wrapped our code in a ZenML pipeline before, we will now
setup an MLflow experiment tracker and model deployer as part of a [ZenML Stack TODO](),
which will allow us to decouple our infrastructure and tools (MLflow in this case)
from the code that we are running.

Usually the easiest way to register new stacks is by using the ZenML dashboard,
but we can also do it programmatically via the CLI:

In [None]:
# Register the MLflow experiment tracker
!zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Register the MLflow model registry
!zenml model-registry register mlflow_registry --flavor=mlflow

# Register the MLflow model deployer
!zenml model-deployer register mlflow_deployer --flavor=mlflow

# Register a new stack with the new stack components
!zenml stack register quickstart_stack -a default\
                                       -o default\
                                       -d mlflow_deployer\
                                       -e mlflow_tracker\
                                       -r mlflow_registry\

Now that we have set up our stack, we can run arbitrary pipelines on it. Let's
try it out and run our LLM training pipeline we have defined earlier:

In [None]:
!zenml stack set quickstart_stack

In [None]:
training_pipeline()

## 3. Inspect Run Metadata and Lineage

ZenML automatically tracks metadata of all runs, and saves all datasets and 
models to disk and versions them. Let's open the ZenML dashboard and check it
out!

In [None]:
!zenml up

In [None]:
import zenml

zenml.show()

This will spin up a local ZenML server and connect you to it. 
You can login with username `default` and an empty password.

![ZenML Server Up](_assets/zenml-up.gif)

Go to the "Runs" tab and click on your run. You should now be able to see a
detailed lineage graph of your run. Try clicking on some of the steps or
artifacts to explore all the metadata ZenML tracks for you!

## 4. Interact with the Deployed Model

As we can see, the last output of our run was a model prediction service that
is now running in the background and waiting for requests.

You can run `zenml model-deployer models list` to get an overview of all 
currently deployed models:

In [None]:
!zenml model-deployer models list

To find the exact URL where the deployed model is reachable, we can use ZenML's 
post-execution utility code:

In [None]:
pipeline_run = training_pipeline.get_runs()[0]
deployer_step = pipeline_run.get_step("mlflow_model_deployer_step")
deployed_model_url = deployer_step.metadata["deployed_model_url"].value
print(deployed_model_url)

In practice, you can now simply send REST requests to this model from your 
website or app.

To demonstrate this, we have built a simple frontend for our model using
[gradio](https://gradio.app/). The code details are not too important at this
point, but feel free to checkout `utils/frontend.py` if you're interested.

In [None]:
from utils.frontend import QuestionAnsweringFrontend

QuestionAnsweringFrontend(deployed_model_url=deployed_model_url).launch()

## 5. Train GPT-XL on remote stack

After playing with the frontend for a bit, you might notice that the overall
question answering performance of our app is still quite poor. This is not
surprising since we have used a fairly small model that wasn't even pretrained.

Let's change this and use a pretrained version of the largest mingpt model, 
`gpt-xl` instead:

In [None]:
@step
def pretrained_gpt_xl_loader_step() -> GPT:
    return GPT.from_pretrained("gpt2-xl")

To make our model service available to users around the world we will 
also need to deploy it in a highly scalable cloud environment instead of 
our local machine.

Fortunately, the ZenML Hub also contains model deployment steps for [Seldon TODO](), which can do just that. 

Let's import that step and define the corresponding pipeline:

In [None]:
from zenml.integrations.seldon.steps.seldon_deployer import seldon_model_deployer_step
from zenml.hub.mingpt_example.steps import url_dataset_loader_step, pretrained_gpt_xl_loader_step, mingpt_trainer_step

@pipeline(enable_cache=False)
def gpt_xl_training_pipeline():
    """Train, evaluate, and deploy a model."""
    dataset = url_dataset_loader_step()
    model = pretrained_gpt_xl_loader_step()
    model = mingpt_trainer_step(dataset, model)
    seldon_model_deployer_step(model=model, deployment_decision=True)

To run this pipeline, we will need a quite sophisticated infrastructure setup
that contains a Kubernetes cluster to host Seldon as well as a GPU-enabled
environment to train GPT-XL.

Furthermore, we might also want to run our other pipeline steps in a more
scalable cloud environment, for which we would need an orchestration tool like 
[Kubeflow TODO]() as well as a cloud storage bucket where we can save our 
datasets, models and other artifacts.

To summarize, we need to provision an environment that:
- runs fully in the cloud
- has access to a GPU for our model trainer step
- contains a Kubernetes cluster with Seldon and Kubeflow installed in it
- contains a cloud storage bucket

TODO: stack visualization

Setting up an environment like this is usually not trivial. Fortuantely, ZenML
provides a lot of utility tools to set up all the infrastructure you need:

In [None]:
# TODO: show how to deploy this stack with ZenML (@Jayesh ?)

Alternatively, if you do not have a cloud project of your own where you can set 
this up, you can use the [ZenML Sandbox TODO](TODO) to
temporarily provision this infrastructure stack, free of charge.

In [None]:
# TODO: connect to sandbox (@Safoine ?)

Since ZenML pipelines can run on any stack, it is quite easy to run our new
training pipeline in this cloud environment now:

In [None]:
!zenml stack set ...

In [None]:
gpt_xl_training_pipeline()

Since the pipeline is now running in the cloud and using a much larger model,
it will take a few minutes until the run is complete.

To keep track of your run, open the ZenML dashboard again and navigate to the
detail page of your run where you can see a status indicator that shows you at 
a glance whether the run is finished or not.

TODO: screenshot

In [None]:
zenml.show()

Once the run is complete, we will again have our model deployed as a prediction
service to which we can send REST API requests. However, this time the service
is deployed in a highly scalable cloud environment, is reachable from anywhere
around the world, and the quality of its answers should also be much better:

In [None]:
deployed_model_url = None  # TODO
QuestionAnsweringFrontend(deployed_model_url=deployed_model_url).launch()

## Congratulations!

You just built your first ML Pipeline! You not only trained a model, you also deployed it, served it, and learned how to monitor and visualize everything that's going on. Did you notice how easy it was to bring all of the different components together using ZenML's abstractions? And that is just the tip of the iceberg of what ZenML can do; check out the [**Integrations**](https://zenml.io/integrations) page for a list of all the cool MLOps tools that ZenML supports!

## Where to go next

* If you have questions or feedback... 
  * Join our [**Slack Community**](https://zenml.io/slack-invite) and become part of the ZenML family!
* If this quickstart was a bit too quick for you... 
  * Check out [**ZenBytes**](https://github.com/zenml-io/zenbytes), our lesson series on practical MLOps, where we cover each MLOps concept in much more detail.
* If you want to learn more about using or extending ZenML...
  * Check out our [**Docs**](https://docs.zenml.io/) or read through our code on [**Github**](https://github.com/zenml-io/zenml).
* If you want to quickly learn how to use a specific tool with ZenML...
  * Check out our collection of [**Examples**](https://github.com/zenml-io/zenml/tree/doc/hamza-misc-updates/examples).
* If you want to see some advanced ZenML use cases... 
  * Check out [**ZenML Projects**](https://github.com/zenml-io/zenml-projects), our collection of production-grade ML use-cases.