Genotyping structural variants in pangenome graphs using the vg toolkit
This repository contains the commands and scripts used for Genotyping structural variants in pangenome graphs using the vg toolkit, 2019, in press. They are primarily dependent on toil-vg, which can run most other dependencies via Docker. There is a WIKI here for genotyping SVs with toil-vg. Github issues is the best place to raise questions or concerns.
Links to necessary data are also listed for each analysis.
Of note the code to reproduce the figures and tables in the manuscript are available in the manuscript's repo.
The different methods were compared on simulated sequence and SVs.
Different depth were tested.
We also tested the effect of errors in the breakpoint location of the SVs.
Scripts are available and described in the
Whole genome experiments in human
These were run on AWS via Toil. In theory, they could use any other framework that Toil supports, though the scripts will have to be modified accordingly.
human directory, there is one folder for each dataset with the commands to download/prepare the data and genotype SV with vg and the other methods.
- Human Genome Structural Variation Consortium (HGSVC)
- Genome in a Bottle (GiaB)
- Pseudo-diploid CHM genome (CHMPD)
- SV catalog from Audano et al. Cell 2019 (SVPOP)
There is also a
toil-scripts folder with helper scripts that were used to run the analysis on AWS.
The VCFs produced produced by vg and the other methods across these datasets are available at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=vgsv2019/vcfs/.
De-novo assembly graph experiments in yeast
The yeast experiments are written as snakemake pipelines. Each pipeline consists of a set of rules that process a set of input files into a set of output files.
yeast directory, there are several folders for the different phases of the experiment as well as detailed descriptions on how to re-run it.