A heuristic algorithm for the Pure Parsimony Xor Haplotyping problem (PPXH) based on Graph Realization
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
AUTHORS
COPYING
Makefile
README.md
exec-gr.sh
genot-matrix-test.txt
hapmat2resolution.py

README.md

PPXH-GR

A heuristic algorithm for the Pure Parsimony Xor Haplotyping problem (PPXH) based on Graph Realization.

by Yuri Pirola


Introduction

This program implements a heuristic for the Pure Parsimony Xor Haplotyping problem based on a fast algorithm for the Graph Realization problem.

The heuristic is described in the following papers:

  • Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, and Romeo Rizzi. Pure Parsimony Xor Haplotyping. IEEE/ACM Transactions on Computational Biology and Bioinformatics. Vol. 7(4), pp. 598-610 (Oct-Dec, 2010). link arXiv version
  • Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, and Romeo Rizzi. Pure Parsimony Xor Haplotyping. Proc. of 5th Int. Symp. Bioinformatics Research and Applications (ISBRA). Springer. Vol. 5542, pp. 186-197 (May, 2009). link

Download

PPXH-GR is currently distributed only on source form. PPXH-GR source code is available at the yp/PPXH-GR Git repository hosted by GitHub. The repository can be explored using the GitHub web interface at https://github.com/yp/PPXH-GR.

The latest stable version of PPXH-GR can be downloaded in either .tar.gz or in .zip format. Previous stable releases can be downloaded from https://github.com/yp/PPXH-GR/archives/master.

It is also possible to clone the entire repository using the following command:

$ git clone git://github.com/yp/PPXH-GR.git

Or, if you have a GitHub account, you can fork the project from the repository web page.

Dependencies

Compilation dependencies:

Run-time dependencies:

  • GREAL, a software for the Graph Realization problem. GREAL is only distributed as executable and needs that the following (old) library libstdc++-libc6.2-2.so.3 is placed in the working directory
  • Basic Linux/Unix utilities, such as bash, grep, and sed

Compilation

The program can be compiled by issuing the command at the command prompt:

$ make all

The resulting executable, ppxh-gr, can then be copied (if necessary) in some other folder.

Usage

The program takes as input a genotype matrix and returns a haplotype matrix that explains the given genotype matrix. The command line is as follow:

$ ./ppxh-gr -f FILENAME [-r N]

where FILENAME indicates the name of the file containing the genotype matrix, and N indicates the number of solutions that PPXH-GR computes (default: 10).

Please note that the following files must be placed in the working directory:

  • exec-gr.sh (supplied with PPXH-GR)
  • grealLinux
  • libstdc++-libc6.2-2.so.3

The input genotype matrix is described as a sequence of n rows (representing the genotypes of the n individuals) each one containing exactly m characters (representing the alleles at the m SNPs/loci over which the xor-genotypes are defined). Each character is either 0/1 (indicating an homozygous locus) or 2 (indicating an heterozygous locus). Please refer to the file genot-matrix-test.txt for an example of a simple 7x4 genotype matrix.

Computing the haplotype resolutions

The Python script hapmat2resolution.py provides a simple way to post-process the output computed by PPXH-GR in order to obtain all the possible pairs of computed haplotypes that solve each xor-genotype.

hapmat2resolution.py has two mandatory arguments:

  • --input (-i), which specifies the file containing the input xor-genotype matrix;
  • --output (-o), which specifies the file containing the output computed by PPXH-GR.

The optional flag --verbose (-v) increases the verbosity of the log.

hapmat2resolution.py extracts a most parsimonious haplotype matrix (ties are broken randomly) and uses it to compute the pairs of haplotypes which solve each xor-genotype.

The output of hapmat2resolution.py is a JSON object with the following properties:

  • xor-genotypes, an array of the input xor-genotypes (not necessarily in the same order of file --input);
  • haplotypes, an array of the output haplotypes of the most parsimonious haplotype matrix;
  • resolutions, an array of objects each one with the following properties:
    • xor-genotype-id, the identifier of the current xor-genotype (w.r.t. property xor-genotypes of the first-level object);
    • xor-genotype-str, the string representation of the current xor-genotype;
    • haplotype-pairs, an array of objects with the following properties:
      • haplotype1-id, the identifier of the first haplotype in the pair that solves the current xor-genotype (w.r.t. property haplotypes of the first-level object);
      • haplotype1-str, the string representation of the first haplotype;
      • haplotype2-id, the identifier of the second haplotype in the pair that solves the current xor-genotype (w.r.t. property haplotypes of the first-level object);
      • haplotype2-str, the string representation of the second haplotype.

License

PPXH-GR is released under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PPXH-GR is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Please refer to file COPYING or to the GNU website for a copy of the GNU General Public License.

Contacts

Please contact Yuri Pirola for additional information.
E-mail: yuri.pirola@gmail.com
Web page: http://bimib.disco.unimib.it/index.php/Pirola_Yuri