Package | |||
---|---|---|---|
Tests | |||
Code | |||
Docs |
BioProv is a Python library for W3C-PROV representation of bioinformatics workflows. It enables you to quickly write workflows and to describe relationships between samples, files, users and programs.
Please see the tutorials for a more detailed introduction and visit ReadTheDocs for the complete documentation.
>>> import bioprov as bp
# Create samples and file objects
>>> sample = bp.Sample("mysample")
>>> genome = bp.File("mysample.fasta", "genome")
>>> sample.add_files(genome)
# Create programs
>>> output = sample.files["blast_out"] = bp.File("mysample.blast.tsv", "blast_out")
>>> blastn = bp.Program("blastn",
params={"-query": sample.files["genome"],
"-db": "mydb.fasta", "-out": output}
)
>>> sample.add_programs(blastn)
# Run programs
>>> sample.run_programs()
# Save your project
>>> proj = bp.Project((sample,), tag="example_project")
>>> proj.to_json()
# Create PROV documents
>>> prov = bp.BioProvDocument(proj)
# Save in PROVN or graphical format
>>> prov.write_provn() # human-readable text format
>>> prov.dot.write_pdf() # graphical format
BioProv also has a command-line application to run preset workflows.
$ bioprov -h
usage: bioprov [-h] [--show_config | --show_db | --clear_db | -v | -l]
{genome_annotation,blastn,kaiju} ...
BioProv command-line application. Choose a command to begin.
optional arguments:
-h, --help show this help message and exit
--show_config Show location of config file.
--show_db Show location of database file.
--clear_db Clears all records in database.
-v, --version Show BioProv version
-l, --list List Projects in the BioProv database.
workflows:
{genome_annotation,blastn,kaiju}
BioProv is built with the Biopython and Pandas libraries.
You can import data into BioProv using Pandas objects.
# Read csv straight into BioProv
>>> samples = bp.read_csv("my_dataframe.tsv", sep="\t", sequencefile_cols="assembly")
# Alternatively, use a pandas DataFrame
>>> df = pd.read_csv("my_dataframe.tsv", sep="\t")
# [...] manipulate your df
>>> df["assembly"] = "assembly_directory/" + df["assembly"]
# Now load from your df
>>> project = bp.from_df(df, sequencefile_cols="assembly", source_file="my_dataframe.tsv")
# `samples` becomes a Project dict-like object
>>> sample1 = project['sample1']
# You can also export your sample and associated files and attributes as a dataframe
>>> project.to_csv()
# Install from pip
$ pip install bioprov
# Or install from source
$ git clone https://github.com/vinisalazar/bioprov # download
$ cd bioprov; pip install . # install
$ pytest # test
Important! BioProv requires Prodigal to be tested. Otherwise tests will fail.
Contributions are welcome!
BioProv is in active development and no warranties are provided (please see the License).