Mutation model docs #1180

petrelharp · 2020-09-12T01:44:03Z

Here I've added some documentation for

some of the mutation matrix models
the infinite alleles model
the slim mutation model
so is a step towards Documentation for mutation models #1024.

petrelharp · 2020-09-12T04:39:52Z

Whoops - initially I had some commits off a different PR in here. Took those out.

codecov · 2020-09-12T04:44:46Z

Codecov Report

Merging #1180 into master will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1180   +/-   ##
=======================================
  Coverage   90.05%   90.05%           
=======================================
  Files          27       27           
  Lines       10946    10946           
  Branches     2356     2356           
=======================================
  Hits         9857     9857           
  Misses        547      547           
  Partials      542      542

Flag	Coverage Δ
#C	`90.05% <100.00%> (ø)`
#python	`96.38% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
msprime/mutations.py	`98.48% <ø> (ø)`
msprime/_msprimemodule.c	`87.24% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f486f1...3e9aaff. Read the comment docs.

msprime/mutations.py

petrelharp · 2020-09-12T21:43:30Z

docs/api.rst

+.. autoclass:: msprime.HKY
+
+TODO: move to msprime/mutations.py
+


Ok, the following is correct now, I believe. Note that the discussion below is somewhat complicated by the scaling factor M, required to make P a valid transition matrix. It looks like seq-gen avoids this issue by not documenting what the overall mutation rate actually is. Do you know, @GertjanBisschop - what's the overall mutation rate for seq-gen under HKY? Are we testing for equality of total number of mutations or just relative frequencies of different transition types? And, how does this description seem to you - clear? correct?

We could make the discussion below so that the mutation rates are just mu * q[i,j], if we added a "rate scaling factor" to the mtuation model object, and put M in there, transparent to the user. But I think this is probably not worth it - what do you think?

This is great work. Will review this and answer all your questions tonight. Whichever way we decide on the scaling factor M, we will have one up on SeqGen given that this is all now very well documented.

The overall rate of mutation for seq-gen is given by:
prob_mut = 1.0 - np.diag(transition_matrix)
seq_gen_rate = np.sum(prob_mut * root_distribution) * msprime_rate.
We have tested for equality of the total number of mutations, not just relative frequencies.

All of this is definitely clear. But I would argue against adding the rescaling factor to the mutation model object. If I understand things correctly, you would never allow the user, defining their own MatrixMutationModel to specify any M value, right?

But I would argue against adding the rescaling factor to the mutation model object. I

Oh, good. =)

If I understand things correctly, you would never allow the user, defining their own MatrixMutationModel to specify any M value, right?

The way I'd imagine it would be that the MatrixMutationModel would have a rate_scaling argument ("M"), that gets multiplied by the mutation rate when it is used with mutate( ). So, yes, the user would be able to set this, but only if they are doing their own model. Then, if we defined HKY as a MatrixMutationModel(... rate_scaling=M), then we'd be able to delete all references to M from the documentation here.

petrelharp · 2020-09-13T02:35:04Z

Ok, I think this is good for this PR? There's bigger picture rearranging that needs to happen, pending the API changes, but these can go in there somehow.

petrelharp · 2020-09-13T02:36:22Z

FYI, to review the docs, check out this branch and then make them locally:

git fetch origin pull/1180/head:mutation_model_docs
git checkout mutation_model_docs
cd docs
make

and then look at the html files in docs/_build/html

jeromekelleher

Looks great, thanks @petrelharp!

docs/api.rst

GertjanBisschop

Still really love the Parameterization of Matrix Mutation Models. This is super informative!

GertjanBisschop · 2020-09-14T21:58:10Z

docs/api.rst

+.. autoclass:: msprime.HKY
+
+TODO: move to msprime/mutations.py
+


The overall rate of mutation for seq-gen is given by:
prob_mut = 1.0 - np.diag(transition_matrix)
seq_gen_rate = np.sum(prob_mut * root_distribution) * msprime_rate.
We have tested for equality of the total number of mutations, not just relative frequencies.

All of this is definitely clear. But I would argue against adding the rescaling factor to the mutation model object. If I understand things correctly, you would never allow the user, defining their own MatrixMutationModel to specify any M value, right?

docs/api.rst

petrelharp · 2020-09-18T03:55:23Z

Should I wait on more comments here, @GertjanBisschop? Otherwise, I'll wait on #1176 and then tidy this up.

GertjanBisschop · 2020-09-18T08:14:22Z

No more comments. Sorry @petrelharp, should have made that clearer.

petrelharp · 2020-09-23T04:12:41Z

Ok, I've merged together the docs from #1176. I think this is reasonable, but suggestions of other ways to organize it are welcome.

jeromekelleher

This is a massive step forward, thanks @petrelharp! LGTM.

I think this is ready to merge, @benjeffery?

benjeffery

This is looking great @petrelharp. (I'm learning a lot about mutation models!)
Just a few comments.

docs/api.rst

msprime/mutations.py

docs/api.rst

msprime/mutations.py

benjeffery · 2020-09-23T10:34:29Z

msprime/mutations.py

    transitions and transversions, and sets an equilibrium frequency for each nucleotide.
    In addition a custom ancestral frequency (``root_distribution``) can be specified.
    With ``kappa=1.0`` and the default values of the other arguments this model is equal
    to :class:`JC69MutationModel`. This model is similar to :class:`F84MutationModel`
    but with a differing parametrisation for ``kappa``.

+    See the documentation for more detail on the precise parameterization.


There's quite a few places where it says to "see the documentation" Would be nice to say the section name (or URL?) as I'm not sure I'd know where to look otherwise.

Right - I wasn't sure how to handle this, since on RTD we'd want to say "See below"; while in help(msprime.HKYMutationModel) we want to say "see https://msprime.readthedocs.io/en/latest/api.html#msprime.HKYMutationModel"... which on RTD would link to the very same place that the person is reading.

Alternatively, we could just put all of the details, that I've now got after the short description on RTD, into the docstring itself? How about that - or is that too much information?

I think the second option, I don't have any issue with large docstrings, if they are all useful info.

petrelharp · 2020-09-23T15:44:20Z

Ah, one more thing: the block_size argument to the SLiM model is still undocumented. And, sadly, I still don't know quite what it does. It's the block size for the tsk_blkalloc... but, what happens if it is wrong? If it is too small, will the user get an error? When would the user want to adjust this number?

jeromekelleher · 2020-09-23T16:34:08Z

The reasoning for keeping block_size is given here. Basically, we have to have it because in principle the amount of memory each slim mutation requires is unbounded, and we can't alloc more than the block_size in one go using tsk_blkalloc. So, the user shouldn't every need to worry about this unless their model is pretty daft and has 8K worth of mutations stacked on top of each other.

It's too much of a tedious faff to do the memory allocations via direct mallocs, so I think we're stuck with it.

petrelharp · 2020-09-23T17:09:10Z

I read that, but still wasn't sure what happens if it's too small. But, come on peter, it's easy to check... I see: a _msprime.LibraryError: Out of memory.. I'll say "You should not need to change this unless you get a "out of memory" error due to a very large number of stacked mutations."

jeromekelleher · 2020-09-24T07:29:16Z

Looks like this is ready to go, other than the discussion about how to refer to the documentation? I don't really have much of an opinion on that, so happy to merge whenever.

petrelharp · 2020-09-24T14:33:29Z

I moved the docs, and fixed up a bit in the F84 documentation that I think was only correct for the case of equal base frequencies. I think this is good to go!

jeromekelleher · 2020-09-24T15:56:36Z

Great! Merging.

petrelharp force-pushed the mutation_model_docs branch from f3a0e42 to 78d7f98 Compare September 12, 2020 04:39

petrelharp commented Sep 12, 2020

View reviewed changes

msprime/mutations.py Outdated Show resolved Hide resolved

petrelharp commented Sep 12, 2020

View reviewed changes

petrelharp requested a review from GertjanBisschop September 13, 2020 02:35

jeromekelleher reviewed Sep 14, 2020

View reviewed changes

docs/api.rst Outdated Show resolved Hide resolved

docs/api.rst Show resolved Hide resolved

docs/api.rst Outdated Show resolved Hide resolved

petrelharp mentioned this pull request Sep 14, 2020

don't restart alleles at every site for infinite alleles #1182

Merged

GertjanBisschop reviewed Sep 14, 2020

View reviewed changes

petrelharp mentioned this pull request Sep 21, 2020

Refactor mutation model API to add string references #1176

Merged

petrelharp force-pushed the mutation_model_docs branch 2 times, most recently from 32e754c to dfda55e Compare September 23, 2020 04:11

jeromekelleher approved these changes Sep 23, 2020

View reviewed changes

benjeffery reviewed Sep 23, 2020

View reviewed changes

more docs for mutation models

3e9aaff

petrelharp force-pushed the mutation_model_docs branch from 92e8ab9 to 3e9aaff Compare September 24, 2020 14:33

jeromekelleher added the AUTOMERGE-REQUESTED label Sep 24, 2020

mergify bot merged commit 7fbb8fa into tskit-dev:master Sep 24, 2020

mergify bot removed the AUTOMERGE-REQUESTED label Sep 24, 2020

petrelharp deleted the mutation_model_docs branch December 15, 2021 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutation model docs #1180

Mutation model docs #1180

petrelharp commented Sep 12, 2020

petrelharp commented Sep 12, 2020

codecov bot commented Sep 12, 2020 •

edited

Loading

petrelharp Sep 12, 2020

GertjanBisschop Sep 14, 2020

GertjanBisschop Sep 14, 2020 •

edited

Loading

petrelharp Sep 14, 2020

petrelharp commented Sep 13, 2020

petrelharp commented Sep 13, 2020

jeromekelleher left a comment

GertjanBisschop left a comment

GertjanBisschop Sep 14, 2020 •

edited

Loading

petrelharp commented Sep 18, 2020

GertjanBisschop commented Sep 18, 2020

petrelharp commented Sep 23, 2020

jeromekelleher left a comment

benjeffery left a comment

benjeffery Sep 23, 2020

petrelharp Sep 23, 2020

benjeffery Sep 24, 2020

petrelharp commented Sep 23, 2020

jeromekelleher commented Sep 23, 2020

petrelharp commented Sep 23, 2020

jeromekelleher commented Sep 24, 2020

petrelharp commented Sep 24, 2020

jeromekelleher commented Sep 24, 2020

		.. autoclass:: msprime.HKY

		TODO: move to msprime/mutations.py

Mutation model docs #1180

Mutation model docs #1180

Conversation

petrelharp commented Sep 12, 2020

petrelharp commented Sep 12, 2020

codecov bot commented Sep 12, 2020 • edited Loading

Codecov Report

petrelharp Sep 12, 2020

Choose a reason for hiding this comment

GertjanBisschop Sep 14, 2020

Choose a reason for hiding this comment

GertjanBisschop Sep 14, 2020 • edited Loading

Choose a reason for hiding this comment

petrelharp Sep 14, 2020

Choose a reason for hiding this comment

petrelharp commented Sep 13, 2020

petrelharp commented Sep 13, 2020

jeromekelleher left a comment

Choose a reason for hiding this comment

GertjanBisschop left a comment

Choose a reason for hiding this comment

GertjanBisschop Sep 14, 2020 • edited Loading

Choose a reason for hiding this comment

petrelharp commented Sep 18, 2020

GertjanBisschop commented Sep 18, 2020

petrelharp commented Sep 23, 2020

jeromekelleher left a comment

Choose a reason for hiding this comment

benjeffery left a comment

Choose a reason for hiding this comment

benjeffery Sep 23, 2020

Choose a reason for hiding this comment

petrelharp Sep 23, 2020

Choose a reason for hiding this comment

benjeffery Sep 24, 2020

Choose a reason for hiding this comment

petrelharp commented Sep 23, 2020

jeromekelleher commented Sep 23, 2020

petrelharp commented Sep 23, 2020

jeromekelleher commented Sep 24, 2020

petrelharp commented Sep 24, 2020

jeromekelleher commented Sep 24, 2020

codecov bot commented Sep 12, 2020 •

edited

Loading

GertjanBisschop Sep 14, 2020 •

edited

Loading

GertjanBisschop Sep 14, 2020 •

edited

Loading