-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutation model docs #1180
Mutation model docs #1180
Conversation
f3a0e42
to
78d7f98
Compare
Whoops - initially I had some commits off a different PR in here. Took those out. |
Codecov Report
@@ Coverage Diff @@
## master #1180 +/- ##
=======================================
Coverage 90.05% 90.05%
=======================================
Files 27 27
Lines 10946 10946
Branches 2356 2356
=======================================
Hits 9857 9857
Misses 547 547
Partials 542 542
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
docs/api.rst
Outdated
.. autoclass:: msprime.HKY | ||
|
||
TODO: move to msprime/mutations.py | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the following is correct now, I believe. Note that the discussion below is somewhat complicated by the scaling factor M
, required to make P
a valid transition matrix. It looks like seq-gen avoids this issue by not documenting what the overall mutation rate actually is. Do you know, @GertjanBisschop - what's the overall mutation rate for seq-gen under HKY? Are we testing for equality of total number of mutations or just relative frequencies of different transition types? And, how does this description seem to you - clear? correct?
We could make the discussion below so that the mutation rates are just mu * q[i,j]
, if we added a "rate scaling factor" to the mtuation model object, and put M in there, transparent to the user. But I think this is probably not worth it - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work. Will review this and answer all your questions tonight. Whichever way we decide on the scaling factor M
, we will have one up on SeqGen given that this is all now very well documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall rate of mutation for seq-gen is given by:
prob_mut = 1.0 - np.diag(transition_matrix)
seq_gen_rate = np.sum(prob_mut * root_distribution) * msprime_rate
.
We have tested for equality of the total number of mutations, not just relative frequencies.
All of this is definitely clear. But I would argue against adding the rescaling factor to the mutation model object. If I understand things correctly, you would never allow the user, defining their own MatrixMutationModel
to specify any M
value, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I would argue against adding the rescaling factor to the mutation model object. I
Oh, good. =)
If I understand things correctly, you would never allow the user, defining their own MatrixMutationModel to specify any M value, right?
The way I'd imagine it would be that the MatrixMutationModel would have a rate_scaling
argument ("M"), that gets multiplied by the mutation rate when it is used with mutate( ). So, yes, the user would be able to set this, but only if they are doing their own model. Then, if we defined HKY as a MatrixMutationModel(... rate_scaling=M), then we'd be able to delete all references to M from the documentation here.
Ok, I think this is good for this PR? There's bigger picture rearranging that needs to happen, pending the API changes, but these can go in there somehow. |
FYI, to review the docs, check out this branch and then
and then look at the html files in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @petrelharp!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still really love the Parameterization of Matrix Mutation Models. This is super informative!
docs/api.rst
Outdated
.. autoclass:: msprime.HKY | ||
|
||
TODO: move to msprime/mutations.py | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall rate of mutation for seq-gen is given by:
prob_mut = 1.0 - np.diag(transition_matrix)
seq_gen_rate = np.sum(prob_mut * root_distribution) * msprime_rate
.
We have tested for equality of the total number of mutations, not just relative frequencies.
All of this is definitely clear. But I would argue against adding the rescaling factor to the mutation model object. If I understand things correctly, you would never allow the user, defining their own MatrixMutationModel
to specify any M
value, right?
Should I wait on more comments here, @GertjanBisschop? Otherwise, I'll wait on #1176 and then tidy this up. |
No more comments. Sorry @petrelharp, should have made that clearer. |
32e754c
to
dfda55e
Compare
Ok, I've merged together the docs from #1176. I think this is reasonable, but suggestions of other ways to organize it are welcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a massive step forward, thanks @petrelharp! LGTM.
I think this is ready to merge, @benjeffery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great @petrelharp. (I'm learning a lot about mutation models!)
Just a few comments.
msprime/mutations.py
Outdated
transitions and transversions, and sets an equilibrium frequency for each nucleotide. | ||
In addition a custom ancestral frequency (``root_distribution``) can be specified. | ||
With ``kappa=1.0`` and the default values of the other arguments this model is equal | ||
to :class:`JC69MutationModel`. This model is similar to :class:`F84MutationModel` | ||
but with a differing parametrisation for ``kappa``. | ||
|
||
See the documentation for more detail on the precise parameterization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's quite a few places where it says to "see the documentation" Would be nice to say the section name (or URL?) as I'm not sure I'd know where to look otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I wasn't sure how to handle this, since on RTD we'd want to say "See below"; while in help(msprime.HKYMutationModel)
we want to say "see https://msprime.readthedocs.io/en/latest/api.html#msprime.HKYMutationModel"... which on RTD would link to the very same place that the person is reading.
Alternatively, we could just put all of the details, that I've now got after the short description on RTD, into the docstring itself? How about that - or is that too much information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second option, I don't have any issue with large docstrings, if they are all useful info.
Ah, one more thing: the |
The reasoning for keeping It's too much of a tedious faff to do the memory allocations via direct mallocs, so I think we're stuck with it. |
I read that, but still wasn't sure what happens if it's too small. But, come on peter, it's easy to check... I see: a |
Looks like this is ready to go, other than the discussion about how to refer to the documentation? I don't really have much of an opinion on that, so happy to merge whenever. |
92e8ab9
to
3e9aaff
Compare
I moved the docs, and fixed up a bit in the F84 documentation that I think was only correct for the case of equal base frequencies. I think this is good to go! |
Great! Merging. |
Here I've added some documentation for
so is a step towards Documentation for mutation models #1024.