heuristics to merge structural variant calls in VCF format.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

mergeSVcallers

Creating an integrated SV callset is difficult. The code associated with this project was designed to help merge SVs in a consistent and straightforward way. The inputs to mergeSVcallers are Tabix merged VCF files and the output is a merged VCF file. MergeSVcallers can be re-run iteratively.

Please feel free to join the SV merge quest!

Downloading and installing:

git clone --recursive https://github.com/zeeev/mergeSVcallers.git
cd mergeSVcallers/
make

Usage

 Usage:
       mergeSVcallers -a ref.fasta -f a.vcf.gz,b.vcf.gz -t WHAM,LUMPY -s 500

 Required:
          -a - <STRING> - The samtools faidx indexed FASTA file
          -f - <STRING> - A comma separated list of Tabix indexed VCF files
          -t - <STRING> - A comma separated  list of tags/identifiers for each file

 Optional:
          -s - <INT>   - Merge SVs with both breakpoints N BP away [100]
          -r - <FLOAT> - Reciprocal overlap also required  [0]
 Info:
          -This tool provides a simple set of operations to merge SVs.
          -Output is unsorted.

##Tested Tools

  • WHAM-GRAPHENING
  • LUMPY
  • GENOME STRIP CNVs
  • GENOME STRIP DELETION
  • DELLY
  • VARIATION HUNTER

##TODO

  • create a test suite
  • merge by reciprocal overlap
  • add a splitter function
  • add translocation functionality

Quick Venn

There are two utility scripts designed to quickly generate venn diagrams from the merged VCF file generated by mergeSVcallers. The first scrip generates the input data for the plot script:

perl vennGen.pl --file ../merged.test.vcf --patterns "WHAM-,LUMPY-,GENOME-STRIP" --names WHAM,LUMPY,GS > plottest-data.txt

The plottest-data is a Boolean dataframe measuring the intersection at each merged SV. The output is then passed to the simple [R] plotting script.

R --vanilla < plotVenn.R --args plottest-data.txt testing DEL 50

The first argument is the data. The second argument is a file prefix for the plot. The last argument is the type that you want to plot. The last option is the minimum SVLEN. This script uses the package gplots in [R] to generate a PDF in the same directory.

Here is an example plot (This is just an example of poorly matched samples):

alt tag