Quality summary pipeline for the consensus caller. The program takes four VCF files as inputs, and produces a variety of plots as output. The four input files refer to VCF files generated with Atlas, GATK Unified Genotyper, Freebayes, and a consensus set generated by CGES. The tool itself is written for easy integration with galaxy.
- Mendel inconsistencies
- Sample Missingness
- Sample F-statistics (summarizing heterozygosity)
- Minor allele frequency spectra
- Transition-transversion mutation ratio
There are a few missing plots that are reported in the manuscript:
- Site missingness spectra
- Site/Sample concordance plots for consensus
- Rediscovery plots. (( the issue here is packaging reference sets ))
-h, --help show this help message and exit
--cges-vcf=CGES File path for CGES VCF for which to generate QC
metrics.
--atlas-vcf=ATLAS File path for ATLAS VCF for which to generate QC
metrics.
--gatk-vcf=GATK File path for GATK VCF for which to generate QC
metrics.
--freebayes-vcf=FREEBAYES
File path for Freebayes VCF for which to generate QC
metrics.
--ped-file=PEDFILE Pedigree file for samples (Optional).
--tstv-out=TSTVOUT Output file location for TsTv plots PDF.
--het-out=HETOUT Output file location for heterozygosity plots PDF.
--maf-out=MAFOUT Output file location for minor allele frequency plots
PDF.
--miss-out=MISSOUT Output file location for missingess plots PDF.
--rediscover-out=REDISCOVEROUT
Output file location for rediscovery rate plots PDF.
--mendel-out=MENDELOUT
Output file location for Mendel inconsistency plots
PDF.
--temp-dir=TEMPDIR Directory for writing intermediate analysis files.
Note: This tool wraps a number of other programs: PLINK, vcftools, and R code files run with Rscript. Should you want to run this, you must have these programs in your search path.
This is a simple python executable where input VCF files and output PDFs of plots are specified explicitly. There is also the requirement of a temporary directory. This is where intermediate files produced by PLINK and vcftools are written.
A full summary can be generated with the command:
python python/qc.pipeline.py \
--atlas-vcf "test/atlas.test.vcf" \
--gatk-vcf "test/gatk.test.vcf" \
--freebayes-vcf "test/freebayes.test.vcf" \
--cges-vcf "test/cges.test.vcf" \
--ped-file "test/test.pedigree.txt" \
--tstv-out "test/tstv.dat" \
--het-out "test/het.dat" \
--maf-out "test/maf.dat" \
--miss-out "test/miss.dat" \
--mendel-out "test/mendel.dat" \
--temp-dir "test/"