A heuristic algorithm for the Pure Parsimony Xor Haplotyping problem (PPXH) based on Graph Realization.
by Yuri Pirola
This program implements a heuristic for the Pure Parsimony Xor Haplotyping problem based on a fast algorithm for the Graph Realization problem.
The heuristic is described in the following papers:
- Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, and Romeo Rizzi. Pure Parsimony Xor Haplotyping. IEEE/ACM Transactions on Computational Biology and Bioinformatics. Vol. 7(4), pp. 598-610 (Oct-Dec, 2010). link arXiv version
- Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, and Romeo Rizzi. Pure Parsimony Xor Haplotyping. Proc. of 5th Int. Symp. Bioinformatics Research and Applications (ISBRA). Springer. Vol. 5542, pp. 186-197 (May, 2009). link
PPXH-GR is currently distributed only on source form.
PPXH-GR source code is available at the
yp/PPXH-GR Git repository
hosted by GitHub.
The repository can be explored using the GitHub web interface at
It is also possible to clone the entire repository using the following command:
$ git clone git://github.com/yp/PPXH-GR.git
Or, if you have a GitHub account, you can fork the project from the repository web page.
- GREAL, a software for the Graph
GREAL is only distributed as executable and needs that the following
libstdc++-libc6.2-2.so.3is placed in the working directory
- Basic Linux/Unix utilities, such as
The program can be compiled by issuing the command at the command prompt:
$ make all
The resulting executable,
ppxh-gr, can then be copied (if necessary)
in some other folder.
The program takes as input a genotype matrix and returns a haplotype matrix that explains the given genotype matrix. The command line is as follow:
$ ./ppxh-gr -f FILENAME [-r N]
FILENAME indicates the name of the file containing the genotype
N indicates the number of solutions that PPXH-GR computes
Please note that the following files must be placed in the working directory:
exec-gr.sh(supplied with PPXH-GR)
The input genotype matrix is described as a sequence of n rows
(representing the genotypes of the n individuals) each one containing
exactly m characters (representing the alleles at the m SNPs/loci
over which the xor-genotypes are defined).
Each character is either
1 (indicating an homozygous locus) or
(indicating an heterozygous locus).
Please refer to the file
genot-matrix-test.txt for an example of a
simple 7x4 genotype matrix.
Computing the haplotype resolutions
The Python script
hapmat2resolution.py provides a simple way to
post-process the output computed by
PPXH-GR in order to obtain all the
possible pairs of computed haplotypes that solve each xor-genotype.
hapmat2resolution.py has two mandatory arguments:
-i), which specifies the file containing the input xor-genotype matrix;
-o), which specifies the file containing the output computed by
The optional flag
-v) increases the verbosity of the log.
hapmat2resolution.py extracts a most parsimonious haplotype matrix
(ties are broken randomly) and uses it to compute the pairs of
haplotypes which solve each xor-genotype.
The output of
hapmat2resolution.py is a JSON
object with the following properties:
xor-genotypes, an array of the input xor-genotypes (not necessarily in the same order of file
haplotypes, an array of the output haplotypes of the most parsimonious haplotype matrix;
resolutions, an array of objects each one with the following properties:
xor-genotype-id, the identifier of the current xor-genotype (w.r.t. property
xor-genotypesof the first-level object);
xor-genotype-str, the string representation of the current xor-genotype;
haplotype-pairs, an array of objects with the following properties:
haplotype1-id, the identifier of the first haplotype in the pair that solves the current xor-genotype (w.r.t. property
haplotypesof the first-level object);
haplotype1-str, the string representation of the first haplotype;
haplotype2-id, the identifier of the second haplotype in the pair that solves the current xor-genotype (w.r.t. property
haplotypesof the first-level object);
haplotype2-str, the string representation of the second haplotype.
PPXH-GR is released under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PPXH-GR is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Please refer to file
COPYING or to the
GNU website for a copy of the GNU
General Public License.
Please contact Yuri Pirola for additional information.
Web page: http://bimib.disco.unimib.it/index.php/Pirola_Yuri