Quality summary pipeline for the consensus caller. The program takes four VCF files as inputs, and produces a variety of plots as output. The four input files refer to VCF files generated with Atlas, GATK Unified Genotyper, Freebayes, and a consensus set generated by CGES. The tool itself is written for easy integration with galaxy.
- Mendel inconsistencies
- Sample Missingness
- Sample F-statistics (summarizing heterozygosity)
- Minor allele frequency spectra
- Transition-transversion mutation ratio
There are a few missing plots that are reported in the manuscript:
- Site missingness spectra
- Site/Sample concordance plots for consensus
- Rediscovery plots. (( the issue here is packaging reference sets ))
-h, --help show this help message and exit --cges-vcf=CGES File path for CGES VCF for which to generate QC metrics. --atlas-vcf=ATLAS File path for ATLAS VCF for which to generate QC metrics. --gatk-vcf=GATK File path for GATK VCF for which to generate QC metrics. --freebayes-vcf=FREEBAYES File path for Freebayes VCF for which to generate QC metrics. --ped-file=PEDFILE Pedigree file for samples (Optional). --tstv-out=TSTVOUT Output file location for TsTv plots PDF. --het-out=HETOUT Output file location for heterozygosity plots PDF. --maf-out=MAFOUT Output file location for minor allele frequency plots PDF. --miss-out=MISSOUT Output file location for missingess plots PDF. --rediscover-out=REDISCOVEROUT Output file location for rediscovery rate plots PDF. --mendel-out=MENDELOUT Output file location for Mendel inconsistency plots PDF. --temp-dir=TEMPDIR Directory for writing intermediate analysis files.
Note: This tool wraps a number of other programs: PLINK, vcftools, and R code files run with Rscript. Should you want to run this, you must have these programs in your search path.
This is a simple python executable where input VCF files and output PDFs of plots are specified explicitly. There is also the requirement of a temporary directory. This is where intermediate files produced by PLINK and vcftools are written.
A full summary can be generated with the command:
python python/qc.pipeline.py \ --atlas-vcf "test/atlas.test.vcf" \ --gatk-vcf "test/gatk.test.vcf" \ --freebayes-vcf "test/freebayes.test.vcf" \ --cges-vcf "test/cges.test.vcf" \ --ped-file "test/test.pedigree.txt" \ --tstv-out "test/tstv.dat" \ --het-out "test/het.dat" \ --maf-out "test/maf.dat" \ --miss-out "test/miss.dat" \ --mendel-out "test/mendel.dat" \ --temp-dir "test/"