# The Python Ecosystem

``signac`` is designed to be extremely lightweight, making it easy to work with other tools.
Here, we demonstrate how it can be integrated with some other tools, which we also use to provide some comparison of ``signac``'s functionality with these tools.

# Sacred

The [Sacred provenance management tool](sacred.readthedocs.io/en/latest/) is a popular Python package for logging experiments and reproducing them later.
It provides some functionality that appears similar to ours, but it can be used in a complementary manner.

In [None]:
!rm -r project.py experiment.py workspace signac.rc

In [None]:
import signac
project = signac.init_project("Sacred")
for i in range(5):
    project.open_job({"i": i}).init()

In [None]:
%%writefile experiment.py
from sacred import Experiment

ex = Experiment()

@ex.command
def hello(i):
    print('hello #', i)

@ex.command
def goodbye(i):
    print('goodbye #', i)

In [None]:
%%writefile project.py
from flow import FlowProject
from sacred.observers import FileStorageObserver
import inspect

from experiment import ex

class SacredProject(FlowProject):
    pass

# Note: This is assuming that the signac operation and the experiment command names are identical
def sacred_op(job):
    sacred_cmd = inspect.stack()[1][3]
    ex.add_config(**job.sp())
    ex.observers[:] = [FileStorageObserver.create(job.fn('my_runs'))]
    ex.run(sacred_cmd)
    job.doc[sacred_cmd] = True
    @SacredProject.label
    def hello_lab(job):
        return job.doc.get('hello') is not None

@SacredProject.operation
@SacredProject.post.true('hello')
def hello(job):
    sacred_op(job)
    
@SacredProject.operation
@SacredProject.pre.after(hello)
@SacredProject.post.true('goodbye')
def goodbye(job):
    sacred_op(job)

if __name__ == '__main__':
    SacredProject().main()

In [None]:
!python3 project.py run -n 1

In [None]:
!python3 project.py run

In [None]:
!python3 project.py status --stack --pretty --full

# Pandas

The data in a signac database can easily be coerced into a format suitable for pandas.
The precise method by which this is accomplished depends on the desired data.
This example provides a simple demonstration where the index alone is sufficient, along with a more complex example involving deeper indexing.

In [None]:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")
!rm -r project.py experiment.py workspace signac.rc

In [None]:
import pandas as pd
import numpy as np
import signac
project = signac.init_project("Pandas")

names = ["foo", "bar", "baz"]
alphas = range(10)
betas = np.random.rand(5)
for name in names:
    for alpha in alphas:
        for beta in betas:
            project.open_job({"alpha": alpha, "beta": beta, "name": name}).init()

In [None]:
df = pd.DataFrame(project.index())
df.set_index("_id", inplace=True)
df

This result describes the space, but typically you're interested in the actual data, which is contained in the state point.
We can extract that alone, and generate a more useful data frame.
Additionally, any of this data can be filtered using the global signac query API.

In [None]:
statepoints = {doc['_id']: doc['statepoint'] for doc in signac.Collection(project.index()).find(
    {"statepoint.beta": {"$lt": 0.2}, "statepoint.alpha": {'$in': [1, 3, 5]}, "statepoint.name": {"$regex": "ba*"}})}
df_data = pd.DataFrame(statepoints).T
df_data

It's also easy to add more information into this data frame.
This includes data from the job document or data from files in the data space.

In [None]:
for job in project:
    job.doc.product = job.sp.alpha*job.sp.beta
    with job:
        with open('product_squared.txt', 'w') as f:
            f.write(str(job.doc.product**2))

In [None]:
statepoints = {doc['_id']: {**doc['statepoint'], 'product': doc['product']} for doc in project.index()}
df_data = pd.DataFrame(statepoints).T
df_data

In [None]:
index = signac.Collection(project.index({".*product_squared.*": "TextFile"}))
ps = {}
for doc in index.find({"filename": {"$regex": "product_squared.txt"}}):
    with signac.fetch(doc) as file:
        ps[doc['signac_id']] = {"product_squared": file.read()}
df_data.join(pd.DataFrame(ps).T)

# Datreant
The ``datreant.core`` package is one of the closer analogues to the ``signac`` data managment package.
However, it is even less restrictive than ``signac`` in that it does not require any index; it simply offers a way to manage data on the filesystem.
We have benchmarked the two packages to see how they fare relative to one another; however, they can also be used in conjunction if there is value in maintaining trees within a ``signac`` data space.

In [None]:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")
!rm -r project.py experiment.py workspace signac.rc

In [None]:
import signac
project = signac.init_project("Datreant")
for i in range(5):
    project.open_job({"i": i}).init()

In [None]:
import datreant.core as dtr
import random, string
for job in project:
    with job:
        s = dtr.Treant('tree1')
        s.tags.add(''.join(random.choice(string.ascii_uppercase) for _ in range(5)))
        !ls && ls tree1 && cat tree1/Treant* && echo "\n"        
        s = dtr.Treant('tree2')
        s.tags.add(''.join(random.choice(string.ascii_uppercase) for _ in range(5)))

In [None]:
trees = ['tree1', 'tree2']
for job in project:
    with job:
        for tree in trees:
            s = dtr.Treant(tree)
            print("For job {}, the treant {} contains tags {}".format(
                   job.get_id(), tree, ", ".join(t for t in s.tags)))