Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid polyphase #432

Merged
merged 42 commits into from Mar 18, 2023
Merged

Hybrid polyphase #432

merged 42 commits into from Mar 18, 2023

Conversation

schrins
Copy link
Collaborator

@schrins schrins commented Jan 26, 2023

In this branch I experimented a bit with how the polyphase module can profit from sparse phasing information that is provided as a partially phased VCF. Such data is produced by the polyphasegenetic command, where only certain variant types are phased using pedigree genotype information. The research on this is not done but along the way there were also some improvements outside this experiment that should be merged back into main at some point.

  1. The third stage of the polyphase algorithm was reworked. It now uses a "cleaner" formal model to reorder the results from the previous two stages. The partial pre-phasing information is considered in this stage now. In general, I expect minor accuracy improvements for polyphase, also without pre-phasing information.

  2. The way phasing blocks are cut should be a bit more meaningful now. The blocks now also depend on how confident the phasing of ambiguous sites is.

  3. The function in vcf.py to convert phasing blocks into a set of "superreads" was extended to polyploid phasings. Before, the readset only contained reads corresponding to one phase, because the second is always complementary. This was not possible for polyploid phasings, but for consistency I also changed the diploid case to now output superreads for both phases. I adjusted the test cases, but I am not sure if this could cause any problems.

  4. During the previous step, I realized that HP phasing information was incorrectly processed for polyploids. It should now work as intended.

  5. Test cases for reorder.py (the major change) and a few for other modules.

  6. The docs for polyphase were updated and docs for polyphasegenetic were added.

schrins and others added 30 commits January 26, 2023 20:44
…t is retrieved (but not yet used) in polyphase module.
…Return 2 reads when diploid but polyploidy allowed.
…gspace earlier. Made some further numeric checks obsolete.
…en processed. In between now use PhaseBreakpoints to track potential errors, affected haplotypes and confidences.
… pos inside them by some heuristic, run polyphase recursively on just affected reads and positions.
…the borders of the prephased blocks. Supports multiple phasing blocks in prephasing now.
@schrins
Copy link
Collaborator Author

schrins commented Feb 28, 2023

Bumping this pull request, because I will soon need the progress here for another branch.

I guess there is no one besides @marcelm who will raise any objections. If you find the time, it would be nice to have a look at the changes outside the polyphase module, that could potentially affect other parts of the tool. Anything that breaks inside polyphase will come back to me at some point, anyways.

Copy link
Contributor

@marcelm marcelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections! (feel free to merge)

@marcelm marcelm linked an issue Mar 17, 2023 that may be closed by this pull request
@schrins schrins merged commit 053e394 into main Mar 18, 2023
@schrins schrins deleted the hybrid-polyphase branch March 18, 2023 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault on whatshap polyphase
2 participants