This is an attempt to implement an exact algorithm that solves the protein folding problem in the 2D HP model.
The Hydrophobic-Polar model is a simplified model proposed by Ken A. Dill in 1985 in which the proteins are represented by a sequence of two types of amino acids: Hydrophobic (H) and Polar (hydrophilic) (P).
The 2D HP folding problem consists in finding an "optimal" conformation, in a 2D grid, of a protein represented by a sequence of 'H's and 'P's.
The optimality of a conformation is defined by its score: the number of hydrophobic (H-H) bonds (number of H's that are neighbors on the grid but not in the chain). A conformation is optimal if there are no other conformation for the same protein with a higher score.
The following images show some examples of optimal conformations with their scores.
The aim of this project is to solve this problem using an exhaustive search algorithm.
More on the implemented algorithm
This work is based on the MINWALK algorithm described in this paper.
The algorithm is basically a generic backtracking algorithm with 6 pruning criteria (those that are described in the paper).
I used Xcode during development primarily for its integrated debugger. However, the whole project can be compiled using the makefile.
The xCode project contains 4 Targets:
- HPFoldingDebug: same as the makefile target
hpfoldbut with Xcode's debugger
- testGrid: I use it to manually test . Nothing interesting there.
- hpTests: same as
hptest(see bellow) but with debugger.
- generateTests: the program used to generate the test files (see bellow the hptest section).
Here are described in more details, the targets of the makefile.
The main for this target is in tests/test.cpp.
When launched, it will load the chains contained in the file specified in
testInputsFilePath, calculate their optimal scores and compare them with the scores contained in the file specified by
The structure of those files are 1 chain per line and one max-score per line respectively.
There are 3 pairs of test files already generated:
The main for this target is in src/main.cpp.
The main is the same as for hpfold, but the the
GRAPH parameter is passed to gcc when compiling (see the makefile).
This one is basically the same as the previous one but with a graphical interface.
The UI is drawn using the GLUT framework.
Note that the performances of this version aren't as good as in the CLI-based version.
Depending on which version you want to compile, run one of the following commands in the root of the project.
make # console version make hpfold # console version make hpfoldgui # graphical version make tests # tests
The resulting executable(s) should appear in the
Fold a chain
./hpfoldgui from the dist directory (depending on the version you want) without any param, you will be prompted to enter the chain you want to fold using only
$ > ./dist/hpfold Chain: HPPHHPHHHPHHPH HPPHHPHHHPHHPH 6
The max score for this chain is 6.
Fold several chains
You can pass the path to a file containing a list of chains (one per line) as first argument to hpfold.
$ > ./dist/hpfold chains.txt HHHHHHHHHH 4 PHHHHHHHHH 4 HPHHHHHHHH 4 PPHHHHHHHH 3 HHPHHHHHHH 4
If you specify a second file in the command line, it will be used to store the results.
Run the tests
$ > ./dist/hptest
As explained earlier, you have to specify the input files in test.cpp.
The script benchmark/run.sh runs
hpfold for each chain in the specified input file (benchmark.txt by default) and measure the execution time with the
This is not the most precise benchmarking method since it will measure the total execution time of the command and not only the time spent by the folding process, but it gives a pretty good idea of the performances of the implementation.
Results & Benchmarks
Here's the execution times for 10 random chains of length 30:
Here the result for the standard HP benchmark:
Ken A. Dill. Theory for the folding and stability of globular proteins. Biochemistry, 1985, 24 (6), pp 1501–1509. http://pubs.acs.org/doi/abs/10.1021/bi00327a032
Giaquinta E., Pozzi L. An effective exact algorithm and a new upper bound for the number of contacts in the hydrophobic-polar two-dimensional lattice model. J Comput Biol. 2013 Aug;20(8):593-609. https://www.ncbi.nlm.nih.gov/pubmed/23899013