ZnTrack zɪŋk træk
is a lightweight and easy-to-use package for tracking
parameters in your Python projects using DVC. With ZnTrack, you can define
parameters in Python classes and monitor how they change over time. This
information can then be used to compare the results of different runs, identify
computational bottlenecks, and avoid the re-running of code components where
parameters have not changed.
- Parameter, output and metric tracking: ZnTrack makes it easy to store and track the values of parameters in your Python code. It further allows you to store any outputs produced and gives an easy interface to define metrics.
- Lightweight and database-free: Unlike other parameter tracking solutions, ZnTrack is lightweight and does not require any databases.
To get started with ZnTrack, you can install it via pip: pip install zntrack
Next, you can start using ZnTrack to track parameters, outputs and metrics in
your Python code. Here's an example of how to use ZnTrack to track the value of
a parameter in a Python class. Start in an empty directory and run git init
and dvc init
for preparation.
Then put the following into a python file called hello_world.py
and call it
with python hello_world.py
.
import zntrack
from random import randrange
class HelloWorld(zntrack.Node):
"""Define a ZnTrack Node"""
# parameter to be tracked
max_number: int = zntrack.params()
# parameter to store as output
random_number: int = zntrack.outs()
def run(self):
"""Command to be run by DVC"""
self.random_number = randrange(self.max_number)
if __name__ == "__main__":
# Write the computational graph
with zntrack.Project() as project:
hello_world = HelloWorld(max_number=512)
project.run()
This will create a DVC stage HelloWorld
. The workflow is
defined in dvc.yaml
and the parameters are stored in params.yaml
.
This will run the workflow with dvc repro
automatically. Once the graph is
executed, the results, i.e. the random number can be accessed directly by the
Node object.
hello_world.load()
print(hello_world.random_number)
You can easily load this Node directly from a repository.
import zntrack node = zntrack.from_rev( "HelloWorld", remote="https://github.com/PythonFZ/ZnTrackExamples.git", rev="890c714", )Try accessing the
max_number
parameter andrandom_number
output. All Nodes from this and many other repositories can be loaded like this.
An overview of all the ZnTrack features as well as more detailed examples can be found in the ZnTrack Documentation.
ZnTrack also provides tools to convert a Python function into a DVC Node. This approach is much more lightweight compared to the class-based approach with only a reduced set of functionality. Therefore, it is recommended for smaller nodes that do not need the additional toolset that the class-based approach provides.
from zntrack import nodify, NodeConfig
import pathlib
@nodify(outs=pathlib.Path("text.txt"), params={"text": "Lorem Ipsum"})
def write_text(cfg: NodeConfig):
cfg.outs.write_text(
cfg.params.text
)
# build the DVC graph
with zntrack.Project() as project:
write_text()
project.run()
The cfg
dataclass passed to the function provides access to all configured
files and parameters via dot4dict. The
function body will be executed by the dvc repro
command or if ran via
write_text(run=True)
. All parameters are loaded from or stored in
params.yaml
.
On a fundamental level the ZnTrack package provides an easy-to-use interface for
DVC directly from Python. It handles all the computational overhead of reading
config files, defining outputs in the dvc.yaml
as well as in the script and
much more.
For more information on DVC visit their homepage.
If you use ZnTrack in your research and find it helpful please cite us.
@misc{zillsZnTrackDataCode2024,
title = {{{ZnTrack}} -- {{Data}} as {{Code}}},
author = {Zills, Fabian and Sch{\"a}fer, Moritz and Tovey, Samuel and K{\"a}stner, Johannes and Holm, Christian},
year = {2024},
eprint={2401.10603},
archivePrefix={arXiv},
}
This project is distributed under the Apache License Version 2.0.
The following (incomplete) list of other projects that either work together with ZnTrack or can achieve similar results with slightly different goals or programming languages.
- DVC - Main dependency of ZnTrack for Data Version Control.
- dvthis - Introduce DVC to R.
- DAGsHub Client - Logging parameters from within .Python
- MLFlow - A Machine Learning Lifecycle Platform.
- Metaflow - A framework for real-life data science.
- Hydra - A framework for elegantly configuring complex applications
- Snakemake - Workflow management system to create reproducible and scalable data analyses.