-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microsatellite mutations models #2071
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2071 +/- ##
==========================================
+ Coverage 91.29% 91.35% +0.05%
==========================================
Files 20 20
Lines 10803 10893 +90
Branches 2246 2273 +27
==========================================
+ Hits 9863 9951 +88
- Misses 514 515 +1
- Partials 426 427 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! My only comments are around the interface really, really nice that we can construct this based on the general matrix code.
@petrelharp - any thoughts on the above? What do you think of my ideas for the interface? |
okay, I think this is all cleaned up and ready to go. I've added a bunch more tests to verify the transition matrices, cleaned up docs, refactored stuff according to the nice design suggestions (thanks!), and rebased and squashed. |
LGTM, thanks @andrewkern! Can you update the CHANGELOG please? We're good to merge then. I'll open an issue to track statistical validation of these sims. |
happy to, but i've never messed with the CHANGELOG before. Should I just add this under new features on a new 1.2.1 release? |
Sure. Just stick in 2022-xx-xx as the date. |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
It's nice to have defaults, but we need to document what they are and have a good justification for setting them to that value.
Also hard-coding the defuaults into the signatures means that we have no flexibility later in terms of changing the defaults depending on what the other inputs are. The general pattern is to use param=None
in the signature to avoid painting ourselves into corners (unless it's totally clear that there'll never be another other params). Also it avoids having to put the hard coded params into several functions.
However, we can make an issue to track this if you'd rather merge this now?
On second thoughts, let's just merge and update. Opened #2074 to track. |
tests/test_mutations.py
Outdated
# this test might seem a little loosey goosey | ||
# but the scaling is create numerical noise i think | ||
assert np.abs(exp - obs) < 1e-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh-oh: if you actually need 1e-2 for this to pass, then something is wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will recheck this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was not an issue it turned out
OK, removing the AUTOMERGE then since we should capture @petrelharp's comments in this PR. FYI @andrewkern, I pushed to this branch to update the CHANGELOG date (the date of release was set to today, which might be confusing on the "current" docs build). This means you'll need to fetch from your origin and reset --hard. Or, just force push over the change I made and we can do it again, no biggie. |
Okay I think I've not got all these tests in, EXCEPT for the last one which I'm confused on-- I don't think that's true for all models where s = 0? might need coffee |
tests/test_mutations.py
Outdated
if np.abs(exp - obs) > 1e-6: | ||
raise ValueError(f"failed test on row: {i} exp:{exp} obs:{obs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops - left over from debugging?
if np.abs(exp - obs) > 1e-6: | |
raise ValueError(f"failed test on row: {i} exp:{exp} obs:{obs}") | |
assert np.abs(exp - obs) < 1e-6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!! - just a couple of minor things (aren't checking for validity of a few parameters).
okay this should be all set i think @petrelharp @jeromekelleher? note I haven't edited the CHANGELOG further. not sure if i need to? |
well darn now i need to hit these new tests with tests... lol. brb |
okay this is squashed, rebased, and ready to merge I think |
msprime/mutations.py
Outdated
if u < 0 or u > 1: | ||
raise ValueError("u must be between 0 and 1 inclusive") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be non-inclusive. Otherwise, some wierd things can happen, e.g. the chain is not ergodic:
>>> mm = msprime.MicrosatMutationModel(u=0, lo=3, hi=6, v=-10)
>>> mm.transition_matrix
array([[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 0., 0., 1.]])
>>> mm.root_distribution
array([1., 0., 0., 0.])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you get a valid Markov chain for the endpoints, but where one or both of the endpoints are absorbing, and so you'd get a situation where the root distribution draws with probability 1 from an absorbing state, and so you'd never get any non-silent mutations. That seems like not what anyone would actually want? And, the derived models have 0 < u < 1? There's no harm in leaving it 0 <= u <= 1, I guess, but I'd vote to restrict it.
Terribly sorry, but I have one more change (I think it should be 0 < u < 1 not <=), and there's two pending suggestions? |
pending suggestions should be all set in my commits-- i didn't use the GH interface to incorporate them, but let me know if i missed something |
Oh! And another important thing - where did the defaults come from? Edit: oh right, there's a different issue filed for that. |
okay got the suggestions. will post a separate commit for limiting u |
okay got a commit in that will hopefully close #2074? |
msprime/mutations.py
Outdated
@@ -422,8 +422,8 @@ class TPM(MicrosatMutationModel): | |||
For this model both `p` and `m` need to be set to < 1.0. | |||
|
|||
This is a :class:`.MicrosatMutationModel` with alleles ``[lo, .. , hi]``. | |||
For a precise definition of the mutation rates, see :class:`.MicrosatMutationModel` | |||
with ``s=0``, ``u=0.5``, and ``v=0``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why'd you take this out? This is necessary information for understanding what the mutation rates are?
So - I think that someone needs to be able to work out, exactly, from the docs, what the mutation rate from j to k is, at a given mutation_rate. To do that, we need to say this stuff about calculating qij and dividing by the max, excetera. I figure we can for this model just refer to the general class where it's explained, but for people to be able to do that they need to know how this one is derived from the general class. I guess that people can figure this out from Sainudiin, and find the expression for qij people need to look in Sainudiin anyhow, but it's not completely obvious, and why not just say what these conditions are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I thought it was more precise to say Default parameterization as estimated in Sainudiin et al. (2004).
but I'll keep in this bit above as well as my addition
implemented generalized model of microsatellite evolution better doi for TPM more microsatellite model stuff and some documentation rescaled things at Peter's suggestion Update docs/mutations.md simple commits from review Co-authored-by: Jerome Kelleher <jk@well.ox.ac.uk> refactored design to include MicrosatMutationModel class Update msprime/mutations.py Co-authored-by: Peter Ralph <petrel.harp@gmail.com> Peter's docstring suggestions included added tests of transition matrix; further clean up to docs one last test added another example to the docs edited CHANGELOG Apply suggestions from code review suggestions from @petrelharp Co-authored-by: Peter Ralph <petrel.harp@gmail.com> further tests and cleanup from code review linting wasn't installed... weird insertion throwing a bug fixed doc bug introduced during code review found bug in rescaling test of betas in trans mat parameterize testing added test of internal allele beta ratios add vscode to gitignore comments from peter tests on tests Apply suggestions from code review Co-authored-by: Peter Ralph <petrel.harp@gmail.com> changed bounds on u to be open interval better doc string on defaults for TPM and EL2 models further doc string edits further doc string edits; part 2
@jeromekelleher, @petrelharp and I finished off these doc strings this am. this should be ready to merge |
Here is a PR adding a three specific microsatellite mutation models along with a more general backend to add more.
The generalized parameterization I've chosen here follow from this nice paper from Rasmus Nielsen's group. It offers plenty of flexibility and room for growth. I've currently implemented just 3 of the models in that paper, but it will be trivial to add more.
I've implemented these as a subclass of
MatrixMutationModel
, and so I'm translating the CTMC representation from the original paper into the scaled, jump process transition matrix that is used inMatrixMutationModel
. I hope I got the details correct here but may be off by a scalar?Hoping to get some feedback as to what others would like to see here and what clean up to do.
i should add-- this would close #2013