Skip to content

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data

License

Notifications You must be signed in to change notification settings

shandley/hecatomb

Repository files navigation

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Documentation Status Anaconda-Server Badge Anaconda-Server Badge

A hecatomb is a great sacrifice or an extensive loss. Heactomb the software empowers an analyst to make data driven decisions to 'sacrifice' false-positive viral reads from metagenomes to enrich for true-positive viral reads. This process frequently results in a great loss of suspected viral sequences / contigs.

For detailed pipeline overview, installation, usage and customisation instructions, please refer to the documentation hosted at Read the Docs.

Citation

Hecatomb is currently on BioRxiv!

Quick start guide

Running on HPC

Hecatomb is powered by Snakemake and greatly benefits from the use of Snakemake profiles for HPC Clusters. More information and example for setting up Snakemake profiles for Hecatomb in the documentation.

Install

# create conda env and install
conda create -n hecatomb -c conda-forge -c bioconda hecatomb

# activate conda env
conda activate hecatomb

# check the installation
hecatomb -h

# download the databases - you only have to do this once
  # locally: using 8 threads (default is 32 threads)
hecatomb install --threads 8

  # HPC: using a snakemake profile named 'slurm'
hecatomb install --profile slurm

Run the test dataset

# locally: uses 32 threads and 64 GB RAM by default
hecatomb run --test

# HPC: using a profile named 'slurm'
hecatomb run --test --profile slurm

Current limitations

Hecatomb is currently designed to only work with paired-end reads. We have considered making a branch for single-end reads, but that is not currently available.

When you specify a directory of reads with --reads, Hecatomb expects paired sequencing reads in the format sampleName_R1/R2.fastq(.gz). e.g.

sample1_R1.fastq.gz
sample1_R2.fastq.gz
sample2_R1.fastq.gz
sample2_R2.fastq.gz

When you specify a TSV file with --reads, Hecatomb expects a 3-column tab separated file with the first column specifying a sample name, and the other columns the relative or full paths to the forward and reverse read files. e.g.

sample1    /path/to/reads/sample1.1.fastq.gz    /path/to/reads/sample1.2.fastq.gz
sample2    /path/to/reads/sample2.1.fastq.gz    /path/to/reads/sample2.2.fastq.gz

Dependencies

The only dependency you need to get up and running with Hecatomb is conda. Hecatomb relies on conda (and mamba) to ensure portability and ease of installation of its dependencies. All of Hecatomb's dependencies are installed during installation or runtime, so you don't have to worry about a thing!

Links

Hecatomb @ bio.tools

Hecatomb @ WorkflowHub

About

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages