Toolkit for extracting SVs from long sequences and benchmarking variant callers
Switch branches/tags
Nothing to show
Clone or download
walaj COMPL > TSI
Latest commit 14f975f Jan 10, 2017

Build Status

svlib - tools for benchmarking structural variations


License: GPLv3

Table of contents


I have succesfully built on Unix with GCC-4.8, 4.9 and 5.1

git clone --recursive
cd svlib



Convert a BND formatted SV VCF into a BEDPE. Output format will be:

chr1 pos1 pos1 chr2 pos2 pos2 id 0 str1 str2 info genotypes


## convert a snowman vcf and sort with bedtools
svlib vcftobedpe | sortBed -i stdin


Simulate rearrangements and indels

svlib sim -G $REFHG19 -s 42 -R 1000 -X 10000 -A mysim


Test the ability of an aligner to realign an SV contig

svlib realigntest -G $REFHG19 -b some.bam > sim.fasta
## align sim.fasta with some aligner (eg BWA) to something like sim.bam
svlib realigntest -G $REFHG19 -E sim.bam > results


Call SVs and indels from a qname sorted BAM of long sequence alignments

TUM=tumor.shortreads.bam ## eg standard Illumina reads for scoring/genotyping
NORM=normal.shortreads.bam ## should be coordinate sorted
svlib seqtovcf qsorted.contigs.bam -p $CORES -t $TUM -n $NORM -a output_id -G $REFHG19
Using seqtovcf with noisy BAMs

seqtovcf can optionally use exclusion lists to avoid processing variants that fall in noise regions (e.g. centromeres), or to avoid re-running the same variants supported by different long-sequences.

## first pass to find contigs which support a unique variant (without reads for speed)
## -W flag will write a *.extracted.contigs.bam, so you can see which ones were chosen
## in the case that multiple contigs support the same variant, svlib only uses the highest quality contig
svlib seqtovcf qsorted.contigs.bam -a first_pass -G $REFHG19 -W
gunzip -c first_pass.bps.txt.gz | cut -f20 | sort | uniq >
## second pass (with reads for genotyping)
svlib seqtovcf qsorted.contigs.bam -L -B excluded_regions.bed -a second_pass -t $TUM -n $NORM


svlib is developed and maintained by Jeremiah Wala ( -- Rameen Berkoukhim's lab -- Dana Farber Cancer Institute, Boston, MA

This project was developed in collaboration with the Cancer Genome Analysis team at the Broad Institute. Particular thanks to:

  • Cheng-Zhong Zhang (Matthew Meyerson Lab)
  • Marcin Imielinski