Benchmarking of variant calling a on NA24385

This repository contains code, instructions and links to data for reproducing the variant calling analysis presented in the manuscript "Assessing graph-based read mappers against a novel baseline approach highlights strengths and weaknesses of the current generation of methods".

Reproduce the experiments

1) Dependencies

There are a lot of different tools needed to run the benchmarks, e.g. Graphtyper, Hap.py, vg etc. The easiest way to get all these is to use the Docker repository available here. Alternatively, have a look at this dockerfile to see which software is needed and install them manually.

2) Download the data

There are only two data file needed for reproducing the analysis (a fasta file with reads and a vcf file with truth variant calls). The truth vcf file is available from NCBI here. The fasta file used to run the linear reference genome benchmarks is availale here. Download these two files.

You also need a bed file of "confident regions". This file is for simplicity included in this repository (confident.bed).

You will also need vg graphs and Offset-based graphs. All these are available here: https://zenodo.org/record/2586090/files/human_pruned_1pc.tar.gz?download=1. Extract the gz file after download.

Run the analysis

Init a variable pointing to your graphtyper-pipelines installation:

graphtyper_pipelines_dir=/path/to/dir

Assuming you have the downsampled.fasta file and the HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_triophased_chr21.vcf.gz in the directory you are standing, and that the graph directory is stored in a variable graph_dir=/path/to/downloaded/graphdir/, you can run variant calling with the different tools like this:

vg

./run_vg.sh $graph_dir
./run_graphtyper.sh $graphtyper_pipelines_dir $truth_file

Two step mapper

For this run, you will additionally need a BWA-MEM2 index.

./run_two_step.sh $graph_dir /path/to/bwa-mem2/index.fa
./run_graphtyper.sh $graphtyper_pipelines_dir $truth_file

Linear mapper

For this run, you will additionally need a BWA-MEM2 index.

./run_linear.sh $graph_dir /path/to/bwa-mem2/index.fa
./run_graphtyper.sh $graphtyper_pipelines_dir $truth_file

For each run, hap.py will print the resulting table to the terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Readme.md		Readme.md
confident.bed		confident.bed
graphtyper_config.sh		graphtyper_config.sh
header.sam		header.sam
run_graphtyper.sh		run_graphtyper.sh
run_linear.sh		run_linear.sh
run_two_step.sh		run_two_step.sh
run_vg.sh		run_vg.sh
test.sam		test.sam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

confident.bed

confident.bed

graphtyper_config.sh

graphtyper_config.sh

header.sam

header.sam

run_graphtyper.sh

run_graphtyper.sh

run_linear.sh

run_linear.sh

run_two_step.sh

run_two_step.sh

run_vg.sh

run_vg.sh

test.sam

test.sam

Repository files navigation

Benchmarking of variant calling a on NA24385

Reproduce the experiments

1) Dependencies

2) Download the data

Run the analysis

vg

Two step mapper

Linear mapper

About

Releases 1

Packages

Languages

uio-bmi/variant_calling_benchmarks_two_step_mapper

Folders and files

Latest commit

History

Repository files navigation

Benchmarking of variant calling a on NA24385

Reproduce the experiments

1) Dependencies

2) Download the data

Run the analysis

vg

Two step mapper

Linear mapper

About

Resources

Stars

Watchers

Forks

Languages