# How Data Flows In ZenML

Pipelines in ZenML are data-centric. This means that data forms the link between different steps in a pipeline. In other words, the flow of the pipeline execution is data based and not task or function based. 

Since data holds such an integral position in the workflow, it is important to be able to track and maintain it seamlessly across all steps. In this chapter we will see how ZenML takes care of tracking your artifacts and all relevant metadata automatically and makes it available to be used and analyzed using a host of first class integration with tools like Evidently, MLflow and Wandb among others!

## ZenML behind the scenes

Let's look at a simple pipeline from before. ZenML works behind the scenes to store the outputs of each step and make them accessible to all other steps. 

In [None]:
from steps import importer, trainer, evaluator
from zenml.pipelines import pipeline
from zenml.steps import Output, step

In [None]:
# definition of our pipeline
@pipeline
def digits_pipeline(
    importer,
    trainer,
    evaluator,
):
    """Links all the steps together in a pipeline"""
    X_train, X_test, y_train, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    evaluator(X_test=X_test, y_test=y_test, model=model)

### Diving deeper
We can see that these steps are linked together with their inputs and outputs. If we dive into the code of one of the steps, we can notice that the artifacts for this step are strongly typed. In the example below, the output is clearly specified as an object of type `ClassifierMixin`. 

In [None]:
import numpy as np
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC

@step
def svc_trainer(
    X_train: np.ndarray,
    y_train: np.ndarray,
) -> ClassifierMixin:
    """Train another simple sklearn classifier for the digits dataset."""
    print("test")
    model = SVC(gamma=0.001)
    model.fit(X_train, y_train)
    return model

### Materializers

Having the knowledge of the type of artifacts produced by a step allows ZenML to pair the type with its corresponding "materializer". Materializers in ZenML are responsible for defining the logic for storing an artifact as a specific file type. Some types are supported by built-in materializers right out of the box, such as for libraries like numpy, pandas, pytorch, sklearn and more. In the case where you have a type of output which is not yet supported by ZenML, you can very easily implement one on your own!

Let's build a custom materializer for the `ClassifierMixin` type for our trainer step. A ZenML implementation already exists and so this would be redundant but it serves as a good exercise on just how easy it is to replace a few values and have your own materializer.

In [None]:
import os
import numpy as np
from typing import Any, Type
import pickle

from zenml.materializers.base_materializer import BaseMaterializer
from zenml.io import fileio
from zenml.steps import step

DEFAULT_FILENAME = 'model'

class SklearnMaterializer(BaseMaterializer):
    """Materializer to read data to and from sklearn."""

    ASSOCIATED_TYPES = (
        ClassifierMixin,
    )

    def handle_input(
        self, data_type: Type[Any]
    ) -> ClassifierMixin:
        """Reads a ClassifierMixin model from a pickle file."""
        super().handle_input(data_type)
        filepath = os.path.join(self.artifact.uri, DEFAULT_FILENAME)
        with fileio.open(filepath, "rb") as fid:
            clf = pickle.load(fid)
        return clf

    def handle_return(
        self,
        clf: ClassifierMixin
    ) -> None:
        """Creates a pickle for a ClassifierMixin model

        Args:
            clf: A ClassifierMixin model.
        """
        super().handle_return(clf)
        filepath = os.path.join(self.artifact.uri, DEFAULT_FILENAME)
        with fileio.open(filepath, "wb") as fid:
            pickle.dump(clf, fid)


#### Few important points to notice
- The `ASSOCIATED_TYPES` field contains the types which you want this materializer to be used for. 
- The `handle_input` function holds the logic for reading the specific type.
- The `handle_return` function holds the logic for storing the type to a file format of your choice.

In this example, we have used `pickle` to save and load our `ClassifierMixin` python object for simplicity. You can choose to have other specialised implementations depending on the type used and its corresponding best practices. 

You can resuse this code and replace the values in the associated types to quickly build a materializer for your custom needs. For more examples and different implementations, check out our docs on [custom materializers](https://docs.zenml.io/guides/functional-api/materialize-artifacts#create-custom-materializer) and the code for built-in materializers on our GitHub!

In [None]:
# Initialize the pipeline
first_pipeline = digits_pipeline(
    importer=importer(),
    trainer=trainer.svc_trainer_mlflow(),
    evaluator=evaluator(),
)
first_pipeline.run()