Skip to content
5 changes: 3 additions & 2 deletions docs/book/developer-guide/fetching-historic-runs.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ description: Interact with Past Runs inside a Step
---

# Fetching historic runs
### The need to fetch historic runs

## The need to fetch historic runs

Sometimes, it is necessary to fetch information from previous runs in order to make a decision within a currently
executing step. Examples of this:
Expand All @@ -12,7 +13,7 @@ executing step. Examples of this:
* Fetching a model out of a list of trained models.
* Fetching the latest model produced by a different pipeline to run an inference on.

### Utilizing `StepContext`
## Utilizing `StepContext`

ZenML allows users to fetch historical parameters and artifacts using the `StepContext`
[fixture](./step-fixtures.md).
Expand Down
12 changes: 7 additions & 5 deletions docs/book/developer-guide/materializer.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
description: Control how Data is persisted between Steps
description: Control how data is persisted between steps.
---

A ZenML pipeline is built in a data-centric way. The outputs and inputs of steps
Expand All @@ -8,12 +8,14 @@ step should be considered as its very own process that reads and writes its
inputs and outputs from and to the artifact store. This is where
**materializers** come into play.

### What is a materializer?
# Materializers: Serializing and deseralizing your artifacts

A materializer dictates how a given artifact can be written to and retrieved
from the artifact store. It contains all serialization and deserialization
logic.

## What is a materializer?

```python
from typing import Type, Any
from zenml.materializers.base_materializer import BaseMaterializerMeta
Expand Down Expand Up @@ -77,7 +79,7 @@ Each materializer has `ASSOCIATED_TYPES` and `ASSOCIATED_ARTIFACT_TYPES`.
etc. This is simply a tag to query certain artifact types in the
post-execution workflow.

### Extending the `BaseMaterializer`
## Writing a custom materializer

Let's say you have a custom class called `MyObject` that flows between two steps
in a pipeline:
Expand Down Expand Up @@ -270,7 +272,7 @@ first_pipeline(

</details>

# Skip materialization
## Skip materialization

While in most cases, [materializers](../developer-guide/materializer.md)
should be used to control how artifacts are consumed and output from steps in a
Expand Down Expand Up @@ -304,7 +306,7 @@ non-materialized step.
Be careful: Using artifacts directly like this might have unintended
consequences for downstream tasks that rely on materialized artifacts.

## A simple example
### A simple example

A simple example can suffice to showcase how to use non-materialized artifacts:

Expand Down
12 changes: 6 additions & 6 deletions docs/book/developer-guide/post-execution-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Inspect a Finished Pipeline Run.
After executing a pipeline, the user needs to be able to fetch it from history and perform certain tasks. This page
captures these workflows at an orbital level.

### Accessing past pipeline runs
## Accessing past pipeline runs

In the context of a post-execution workflow, there is an implied hierarchy of some basic ZenML components:

Expand All @@ -17,7 +17,7 @@ repository -> pipelines -> runs -> steps -> outputs
# where -> implies a 1-many relationship.
```

#### Repository
### Repository

The highest level `Repository` object is where to start from.

Expand All @@ -27,7 +27,7 @@ from zenml.repository import Repository
repo = Repository()
```

#### Pipelines
### Pipelines

The repository contains a collection of all created pipelines with at least one run sorted by the time of their first
run from oldest to newest.
Expand Down Expand Up @@ -59,7 +59,7 @@ Be careful when accessing pipelines by index. Even if you just ran a pipeline it
fact that the pipelines are sorted by time of `first` run. As such it is recommended to access the pipeline by its name
{% endhint %}

#### Runs
### Runs

Each pipeline can be executed many times. You can easily get a list of all runs like this

Expand All @@ -73,7 +73,7 @@ run = runs[-1]
run = pipeline_x.get_run(run_name=...)
```

#### Steps
### Steps

Within a given pipeline run you can now zoom in further on the individual steps.

Expand Down Expand Up @@ -102,7 +102,7 @@ def this_is_the_step_name():
...
```

#### Outputs
### Outputs

Most of your steps will probably create outputs. You'll be able to inspect these outputs like this:

Expand Down
98 changes: 9 additions & 89 deletions docs/book/developer-guide/runtime-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,14 @@ Business logic is what defines
a step and the pipeline. Step and pipeline configurations are used to
dynamically set parameters at runtime.

## Step configuration
You can configure your pipelines at runtime in the following ways:

* Configure from within the code: Do this when you are quickly iterating on your code
and don't want to change your actual step code. This is useful in the development phase.
* Configure from the CLI and a YAML config: Do this when you want to launch pipeline runs
without modifying the code at all. This is most useful in production scenarios.

## Configuring from within code

You can easily add a configuration to a step by creating your configuration as a
subclass to the BaseStepConfig.
Expand Down Expand Up @@ -81,7 +88,7 @@ first_pipeline(step_1=my_first_step(),
).with_config("path_to_config.yaml").run()
```

## Run from CLI
## Configuring from the CLI and a YAML config file

In case you want to have control to configure and run your pipeline from outside
your code. For this you can use the
Expand Down Expand Up @@ -307,90 +314,3 @@ way you ensure each run is directly
associated with an associated code version.
{% endhint %}

## Pipeline Run Name

When running a pipeline by calling `my_pipeline.run()`, ZenML uses the current
date and time as the name for the
pipeline run. In order to change the name for a run, simply pass it as a
parameter to the `run()` function:

```python
first_pipeline_instance.run(run_name="custom_pipeline_run_name")
```

{% hint style="warning" %}
Pipeline run names must be unique, so make sure to compute it dynamically if you
plan to run your pipeline multiple
times.
{% endhint %}

Once the pipeline run is finished we can easily access this specific run during
our post-execution workflow:

```python
from zenml.repository import Repository

repo = Repository()
pipeline = repo.get_pipeline(pipeline_name="first_pipeline")
run = pipeline.get_run("custom_pipeline_run_name")
```

### Summary in Code

<details>
<summary>Code Example for this Section</summary>

```python
from zenml.steps import step, Output, BaseStepConfig
from zenml.pipelines import pipeline


@step
def my_first_step() -> Output(output_int=int, output_float=float):
"""Step that returns a pre-defined integer and float"""
return 7, 0.1


class SecondStepConfig(BaseStepConfig):
"""Trainer params"""
multiplier: int = 4


@step
def my_second_step(config: SecondStepConfig, input_int: int,
input_float: float
) -> Output(output_int=int, output_float=float):
"""Step that multiply the inputs"""
return config.multiplier * input_int, config.multiplier * input_float


@pipeline
def first_pipeline(
step_1,
step_2
):
output_1, output_2 = step_1()
step_2(output_1, output_2)


# Set configuration when executing
first_pipeline(step_1=my_first_step(),
step_2=my_second_step(SecondStepConfig(multiplier=3))
).run(run_name="custom_pipeline_run_name")

# Set configuration based on yml
first_pipeline(step_1=my_first_step(),
step_2=my_second_step()
).with_config("config.yml").run()
```

With config.yml looking like this

```yaml
steps:
step_2:
parameters:
multiplier: 3
```

</details>
91 changes: 90 additions & 1 deletion docs/book/developer-guide/steps-and-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Pipeline run `first_pipeline-20_Apr_22-16_07_14_577771` has finished in 0.128s.

You'll learn how to inspect the finished run within the chapter on our [Post Execution Workflow](./post-execution-workflow.md).

### Summary in Code
#### Summary in Code

<details>
<summary>Code Example for this Section</summary>
Expand Down Expand Up @@ -112,3 +112,92 @@ def first_pipeline(
first_pipeline(step_1=my_first_step(), step_2=my_second_step()).run()
```
</details>


### Give each pipeline run a name

When running a pipeline by calling `my_pipeline.run()`, ZenML uses the current
date and time as the name for the
pipeline run. In order to change the name for a run, simply pass it as a
parameter to the `run()` function:

```python
first_pipeline_instance.run(run_name="custom_pipeline_run_name")
```

{% hint style="warning" %}
Pipeline run names must be unique, so make sure to compute it dynamically if you
plan to run your pipeline multiple
times.
{% endhint %}

Once the pipeline run is finished we can easily access this specific run during
our post-execution workflow:

```python
from zenml.repository import Repository

repo = Repository()
pipeline = repo.get_pipeline(pipeline_name="first_pipeline")
run = pipeline.get_run("custom_pipeline_run_name")
```

#### Summary in Code

<details>
<summary>Code Example for this Section</summary>

```python
from zenml.steps import step, Output, BaseStepConfig
from zenml.pipelines import pipeline


@step
def my_first_step() -> Output(output_int=int, output_float=float):
"""Step that returns a pre-defined integer and float"""
return 7, 0.1


class SecondStepConfig(BaseStepConfig):
"""Trainer params"""
multiplier: int = 4


@step
def my_second_step(config: SecondStepConfig, input_int: int,
input_float: float
) -> Output(output_int=int, output_float=float):
"""Step that multiply the inputs"""
return config.multiplier * input_int, config.multiplier * input_float


@pipeline
def first_pipeline(
step_1,
step_2
):
output_1, output_2 = step_1()
step_2(output_1, output_2)


# Set configuration when executing
first_pipeline(step_1=my_first_step(),
step_2=my_second_step(SecondStepConfig(multiplier=3))
).run(run_name="custom_pipeline_run_name")

# Set configuration based on yml
first_pipeline(step_1=my_first_step(),
step_2=my_second_step()
).with_config("config.yml").run()
```

With config.yml looking like this

```yaml
steps:
step_2:
parameters:
multiplier: 3
```

</details>
12 changes: 8 additions & 4 deletions docs/book/introduction/core-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@ when using ZenML, starting with the most basic to things you'll only encounter
when deploying your work to the cloud. At the very highest level, the workflow
is as follows:

- You write your code to define what you want to happen in your machine learning
- You write your code as a pipeline to define what you want to happen in your machine learning
workflow
- You configure a ZenML Stack which is the infrastructure and setup that will
run your machine learning code
run your machine learning code.
- A stack consists of stack components that interact with your pipeline and its steps in various ways.
- You can easily switch between different Stacks (i.e. infrastructure
configurations) depending on your needs at any given moment.
- You can use whatever you want as part of your Stacks as we're built as a
Expand Down Expand Up @@ -42,7 +43,9 @@ things. (Your code lives inside a Repository, which is the main abstraction
within which your project-specific pipelines should live.)

When it comes time to run your pipeline, ZenML offers an abstraction to handle
all the decisions around how your pipeline gets run.
all the decisions around how your pipeline gets run. The different stack
components interact in different ways depending on how you've written your
pipeline.

## Stacks, Components and Stores

Expand Down Expand Up @@ -90,7 +93,8 @@ experiments through the metadata store.
At a certain point, however, you'll want to do something that requires a bit
more compute power - perhaps requiring GPUs for model training - or some custom
functionality at which point you'll want to add some extra components to your
stack.
stack. These stacks will supercharge your steps and pipelines with extra functionality
which you can then use in production!

## Cloud Training, Deployment, Monitoring...

Expand Down
4 changes: 2 additions & 2 deletions docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
* [Introduction](index.md)
* [Quickstart](https://github.com/zenml-io/zenml/tree/main/examples/quickstart)
* [Core Concepts](introduction/core-concepts.md)
* [Examples](introduction/zenml-example-cli.md)
* [Examples & Use-cases](introduction/zenml-example-cli.md)

## Developer Guide

* [Installation](developer-guide/installation.md)
* [Steps & Pipelines](developer-guide/steps-and-pipelines.md)
* [Runtime Configuration](developer-guide/runtime-configuration.md)
* [Post-Execution Workflow](developer-guide/post-execution-workflow.md)
* [Materializer](developer-guide/materializer.md)
* [Materializers](developer-guide/materializer.md)
* [Caching](developer-guide/caching.md)
* [Step Fixture](developer-guide/step-fixtures.md)
* [Fetching Historic Runs](developer-guide/fetching-historic-runs.md)
Expand Down