This is not an official Verily product.
This repository contains code to perform cohort-level quality control checks on human genomic variants. Cloud technology is used to perform queries in parallel. For prior work, see Cloud-based interactive analytics for terabytes of genomic variants data.
View output from these queries run on public data
Before running the queries yourself, you can see the results on a few public datasets:
- QC overview reports on DeepVariant Platinum Genomes data
- QC overview reports on Simons Genome Diversity Project data
- QC overview reports on 1000 Genomes data
- example ad hoc explorations of QC results
Run these queries on your own data
Load data to BigQuery
Using the MOVE_TO_CALLS merge strategy will produce a core set of columns common to all tables created from VCFs and calls for the exact same (
reference_bases, and all
alternate_bases) grouped together in a single row.
We recommend loading single-sample VCFs into a "genome call table" and also the multisample VCF into a "multisample-variants table".
If you do not have a multisample VCF, you could:
- use https://github.com/gatk-workflows/gatk4-germline-snps-indels#joint-discovery-gatk- to create one
- use https://github.com/verilylifesciences/joint-genotype to create one
- or skip the queries that require knowing how many samples match the reference such as Hardy-Weinberg Equilibrium
If your sample information does not already include ancestry, you can predict the ancestry for each genome using Genomic ancestry inference with deep learning.
Run the QC overview reports
Run the RMarkdown parameterized reports to get an overview of your data.
Drill down on results
Drill down further on results by creating additional plots and/or performing additional queries. For example, these queries can be used from the context of Jupyter notebooks, and then additional queries or other queries can be used to further explain the results for a particular dataset.
The methods make use of:
Each technology has introductory material that may help you when working with the code in this repository.