New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HMM for > 2 alleles #437
Comments
I see that I raised this issue originally in #403. I'm happy to revisit it for this PR. |
A summary is helpful here, I think. Internally we use recombination probabilities, calculated from a recombination rate combined with a the distance between sites (e.g. here). This calculated recombination probability is passed to the internal algorithm (as Line 736 in 1290344
We also specify mismatch probabilities ( Line 745 in 1290344
For A=2 alleles, the emission probs are therefore The problem with this particular parameterisation is that with a high mismatch probability ( I suggest that for L&S matching with >2 alleles we don't want to have this particular parameterization. I suggest 2 possibilities:
Note that @astheeggeggs has thought about this, for example at https://github.com/astheeggeggs/lshmm/blob/9b15d8417e2856e13b28d59b4c8d078068fd8c0b/lshmm/api.py#L142 so we should ask him to look at any reparameterisation we do here to check that it makes sense. |
The emission probas in the HMM are not computed correctly when we have > 2 alleles. We currently raise a ValueError if the user tries to force inference at such sites.
See #415 and the linked issues for background.
The text was updated successfully, but these errors were encountered: