yamlett - Yet Another Machine Learning Experiment Tracking Tool

  1. What is yamlett?
  2. Installation
  3. Getting started
  4. Examples
    1. MLflow vs yamlett
    2. Storing large artifacts

What is yamlett?

yamlett provides a simple but flexible way to track your ML experiments.

It has a simple interface with only two primitives: Run and Experiment.

  • A Run is used to store information about one iteration of your Experiment. You can use it to record any (BSON-serializable) information you want such as model parameters, metrics, or pickled artifacts.
  • An Experiment is a collection of Run objects. It has a name and it is a wrapper around a pymongo.collection.Collection object (reference), meaning that you can query it using find or aggregate. Think of it as a way to collect all the modeling iterations for a specific project.

The main difference with other tracking tools (e.g. MLflow) is that yamlett lets you save complex structured information using dictionaries or lists and filter on them later using MongoDB queries.

yamlett is particularly useful if your experiments are configuration-driven. Once your configuration is loaded as a python object, storing it is as easy as"config", config).


yamlett can be installed with pip:

pip install yamlett

It also requires a MongoDB instance that you can connect to. If you don't have one and just want to try out yamlett, we provide a docker compose file that starts a MongoDB instance available at localhost:27017 (along with instances of Presto and Metabase).

Getting started

In yamlett, MongoClient connection parameters can be passed as keyword arguments in both Run and Experiment to specify what MongoDB instance you want to connect to. If you don't pass anything, the default arguments (localhost:27017) will be used. If you have a custom MongoDB instance, you can specify its host and port when creating a Run using run=Run(host="", port=27017). Once you have a run instantiated, you can store a key/value pair with, value) and you can look at the stored data with


MLflow vs yamlett

In this section, we compare the same model run but with two different tracking different approaches: MLflow-like vs yamlett.

  1. Set up the experiment

    First, let's load a dataset for a simple classification problem that ships with scikit-learn.

    from sklearn.datasets import load_iris
    X, y = load_iris(return_X_y=True)

    Then, we create a logistic regression model and train that model on the iris dataset, increasing the number of iterations and changing the regularization strength.

    from sklearn.linear_model import LogisticRegression
    model = LogisticRegression(max_iter=200, C=0.1), y)
  2. MLflow-like tracking

    With yamlett, you are free to organize you tracking information so you could decide to store it using a "flat" approach similar to MLflow where each key has an associated value and there can be no nesting.

    from yamlett import Run
    from sklearn.metrics import f1_score
    run = Run(id="mlflow-like-run")
    # store some information about your trained model: its class and its parameters"params_model_class", model.__class__.__name__)
    for param_name, param_value in model.get_params().items():"params_model_{param_name}", param_value)
    # store information about your data"data_n_features", X.shape[0])"data_n_observations", X.shape[1])
    # store the F1 score on the train data"metrics_train_f1_score", f1_score(y, model.predict(X), average="weighted"))
    # you can even store a pickled version of your model on disk"model", model, pickled=True)

    After running this code, we can retrieve the stored information by calling

    {'_id': 'mlflow-like-run',
     'data_n_features': 150,
     'data_n_observations': 4,
     'metrics_train_f1_score': 0.9599839935974389,
     'model': LogisticRegression(C=0.1, max_iter=200),
     'params_model_C': 0.1,
     'params_model_class': 'LogisticRegression',
     'params_model_class_weight': None,
     'params_model_dual': False,
     'params_model_fit_intercept': True,
     'params_model_intercept_scaling': 1,
     'params_model_l1_ratio': None,
     'params_model_max_iter': 200,
     'params_model_multi_class': 'auto',
     'params_model_n_jobs': None,
     'params_model_penalty': 'l2',
     'params_model_random_state': None,
     'params_model_solver': 'lbfgs',
     'params_model_tol': 0.0001,
     'params_model_verbose': 0,
     'params_model_warm_start': False}

    This approach is straightforward: one scalar for each key in the document. However, one downside is that you need to maintain your own namespace convention. For example here, we used underscores to separate the different levels of information (params, data, metrics, etc) but this can quickly get confusing if chosen incorrectly: is it params/model/fit_intercept or params/model_fit/intercept? It is also more work than needed when information already comes nicely organized (e.g. model.get_params()).

  3. yamlett tracking

    The method we propose in this package leverages Python dictionaries / NoSQL DB documents to automatically store your information in a structured way. Let's see what it looks like using the same run as above:

    from yamlett import Run
    from sklearn.metrics import f1_score
    run = Run(id="yamlett-run")
    # store your model information
    model_info = {
        "class": model.__class__.__name__,
        "params": model.get_params(),
    }"model", model_info)
    # store information about your data"data", {"n_features": X.shape[0], "n_observations": X.shape[1]})
    # store the F1 score on your train data"metrics.f1_score", f1_score(y, model.predict(X), average="weighted"))
    # you can even store a pickled version of your model on disk"model.artifact", model, pickled=True)

    Once again, let's call and see what information we stored:

    {'_id': 'yamlett-run',
     'data': <Box: {'n_features': 150, 'n_observations': 4}>,
     'metrics': <Box: {'f1_score': 0.9599839935974389}>,
     'model': {'artifact': LogisticRegression(C=0.1, max_iter=200),
               'class': 'LogisticRegression',
               'params': {'C': 0.1,
                          'class_weight': None,
                          'dual': False,
                          'fit_intercept': True,
                          'intercept_scaling': 1,
                          'l1_ratio': None,
                          'max_iter': 200,
                          'multi_class': 'auto',
                          'n_jobs': None,
                          'penalty': 'l2',
                          'random_state': None,
                          'solver': 'lbfgs',
                          'tol': 0.0001,
                          'verbose': 0,
                          'warm_start': False}}}

    The run information is now stored in a document that can be easily parsed based on its structure. The top level keys of the document are data, metrics, and model making it easier to find information than with long keys in a flat dictionary. For instance, you may want to look at all the metrics for a given run using["metrics"].

    <Box: {'f1_score': 0.9599839935974389}>

    Note that yamlett does not impose the document hierarchy so you are free to organize your run data as you see fit. Additionally, because yamlett is a light abstraction layer on top of MongoDB, you can query runs in an Experiment using find or aggregate. For example, we can retrieve all runs in the default experiment for which:

    1. the model was fit with a bias term
    2. on a dataset with at least 3000 data points
    3. that yielded an F1 score of at least 0.9
    from yamlett import Experiment
    e = Experiment()
            "model.params.fit_intercept": True,
            "data.n_observations": {"$gte": 3000},
            "metrics.f1_score": {"$gte": 0.9},

Storing large artifacts

MongoDB has a maximum document size of 16MB. This means that storing models or outputs larger than 16MB along with the run information is not directly possible. yamlett still lets you do that with, value, pickled=True). When pickled is set to True, the value is not directly stored in MongoDB but it is pickled and stored "on disk". By default, your run object will store pickled objects in a .yamlett folder in the current working directory. However, you can change this by specifying a path when you instantiate your Run: this path can be a local path or a cloud-based path (e.g. s3://bucket/experiment/). Note that you will need to follow the cloudpathlib installation instructions for your cloud provider before you can use yamlett to store large objects in the cloud.


