New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid polyphase #432
Hybrid polyphase #432
Conversation
…abled and lots of debug output.
… ILP is now used as default here.
…Minor code optimizations.
…t is retrieved (but not yet used) in polyphase module.
…Return 2 reads when diploid but polyploidy allowed.
…gs. Not for productive use yet.
…gspace earlier. Made some further numeric checks obsolete.
…en processed. In between now use PhaseBreakpoints to track potential errors, affected haplotypes and confidences.
… pos inside them by some heuristic, run polyphase recursively on just affected reads and positions.
…the borders of the prephased blocks. Supports multiple phasing blocks in prephasing now.
…asing. Printing hints for prephasing.
…lock. Adjusted test cases.
…phases being read properly for polyploid input.
Bumping this pull request, because I will soon need the progress here for another branch. I guess there is no one besides @marcelm who will raise any objections. If you find the time, it would be nice to have a look at the changes outside the polyphase module, that could potentially affect other parts of the tool. Anything that breaks inside polyphase will come back to me at some point, anyways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections! (feel free to merge)
In this branch I experimented a bit with how the polyphase module can profit from sparse phasing information that is provided as a partially phased VCF. Such data is produced by the
polyphasegenetic
command, where only certain variant types are phased using pedigree genotype information. The research on this is not done but along the way there were also some improvements outside this experiment that should be merged back into main at some point.The third stage of the polyphase algorithm was reworked. It now uses a "cleaner" formal model to reorder the results from the previous two stages. The partial pre-phasing information is considered in this stage now. In general, I expect minor accuracy improvements for polyphase, also without pre-phasing information.
The way phasing blocks are cut should be a bit more meaningful now. The blocks now also depend on how confident the phasing of ambiguous sites is.
The function in vcf.py to convert phasing blocks into a set of "superreads" was extended to polyploid phasings. Before, the readset only contained reads corresponding to one phase, because the second is always complementary. This was not possible for polyploid phasings, but for consistency I also changed the diploid case to now output superreads for both phases. I adjusted the test cases, but I am not sure if this could cause any problems.
During the previous step, I realized that HP phasing information was incorrectly processed for polyploids. It should now work as intended.
Test cases for reorder.py (the major change) and a few for other modules.
The docs for polyphase were updated and docs for polyphasegenetic were added.