DiVA (DNA Variant Analysis) is a pipeline for Next-Generation Sequencing Exome data anlysis.
All solida-core workflows follow GATK Best Practices for Germline Variant Discovery, with the incorporation of further improvements and refinements after their testing with real data in various CRS4 Next Generation Sequencing Core Facility research sequencing projects.
Pipelines are based on Snakemake, a workflow management system that provides all the features needed to create reproducible and scalable data analyses.
Software dependencies are specified into the environment.yaml
file and directly managed by Snakemake using Conda, ensuring the reproducibility of the workflow on a great number of different computing environments such as workstations, clusters and cloud environments.
The pipeline workflow is composed by three major analysis sections:
-
Mapping: paired-end reads in fastq format are aligned against a reference genome to produce a deduplicated and recalibrated BAM file. This section is executed by DiMA pipeline.
-
Variant Calling: a joint call is performed from all project's bam files
-
Annotation: discovered variants are annotated and results are converted in a set of different output file formats enabling downstream analysis for all kind of users
Parallely, statistics collected during these steps are used to generate reports for Quality Control.
A complete view of the analysis workflow is provided by the pipeline's graph.
DiVA pipeline documentation can be found in the docs/
directory:
- Pipeline Structure:
- Pipeline Workflow
- Required Files:
- Running the pipeline: