This is a cuda-enabled fork of SNPTools impute.cpp. This code should scale linearly with sample size up to a small multiple of the number of CUDA cores (shaders) on the GPU being used.
GLPhase also has an option for incorporating pre-existing haplotypes into the phasing and imputation process. Release 1.4.13 was used with this option to impute genotypes for the first release of the Haplotype Reference Consortium.
# Clone this repository recursively git clone --recursive https://github.com/winni2k/GLPhase.git cd GLPhase # to compile all code (with all optimizations turned on) make # run the glphase executable to get a description of the # glphase command line arguments bin/glphase # run regression tests (turns off optimizations) make test # run regression tests + longer integration tests make disttest # compile without CUDA support # first clean the work dir make clean make NCUDA=1 # compile without CUDA or OMP support (on MacOSX for example) make NCUDA=1 NOMP=1
Converting a VCF to SNPTools
A perl script at
scripts/vcf2STBin.pl can be used to convert a VCF
with PL format fields to a SNPTools conformant
.bin file. For
example, this command will convert a gzipped input VCF at
input.vcf.gz into a SNPTools
.bin file at
Running GLPhase (v1.4.13)
As a drop-in replacement for SNPTools/impute.cpp
GLPhase can be run as a CUDA-enabled drop-in replacement for
SNPTools/impute.cpp. Assuming a SNPTools style
.bin file with
genotype likelihoods exists:
Using pre-existing haplotypes
GLPhase can use pre-existing haplotypes to restrict the set of possible haplotypes from which the MH sampler may choose surrogate parent haplotypes. This approach is described in:
The Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics (accepted) -- bioRxiv
This command phases and imputes haplotypes on a SNPTools
using a genetic map and pre-existing haplotypes. The output file is
a gzipped VCF file at
glphase -B0 -i5 -m95 -q0 -Q1 -t2 -C100 -K200 \ input.bin \ -g genetic_map.txt \ -h pre_existing_haplotypes.haps.gz \ -s pre_existing_haplotypes.sample \ -o output_base_name
Using a reference panel
GLPhase can use a reference panel of haplotypes to inform genotype
imputation of samples for which genotype likelihoods are available.
In contrast to pre-existing haplotypes, the haplotypes
in the reference panel do not need to be from the same samples that
are being imputed. In this mode, when surrogate parent haplotypes
are being chosen for a sample, the haplotypes may come from the
current estimate of sample haplotypes or the reference panel.
can be specified to restrict the choice of surrogate parent haplotypes
to the reference panel in the first iteration of haplotype estimation.
glphase \ input.bin \ -g samples/hapGen/ex.map \ -H samples/hapGen/ex.haps.gz \ -L samples/hapGen/ex.leg \ -k \ -o output_base_name
zcat output_base_name.vcf.gz | bcftools -Ob -o \ output_base_name.bcf