-
Notifications
You must be signed in to change notification settings - Fork 79
Fix parsimony on nonbinary trees #1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8989085 to
070a4a5
Compare
|
📖 Docs for this PR can be previewed here |
Codecov Report
@@ Coverage Diff @@
## main #1030 +/- ##
=======================================
Coverage 93.71% 93.71%
=======================================
Files 26 26
Lines 20920 20923 +3
Branches 875 875
=======================================
+ Hits 19606 19609 +3
Misses 1277 1277
Partials 37 37
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
3868846 to
5e65e57
Compare
|
This is ready to go I think. @hyanwong, can you take a look at this please? In particular, can you think of other test examples we should put in here, to make sure the result is parsimonious? |
5e65e57 to
b78576b
Compare
|
Will do. Sorry for the slow reply - internet issues. |
This all looks great. Thanks for sorting this out @jeromekelleher . If you are looking for more trees to test, then perhaps
It's really helpful to have the "balanced tree" generation code now! Thanks for doing that. It also occurred to me that it's now trivial to produce a "random" binary tree for testing, by coupling |
|
Oh, and another thing: should this deal with (a) internal sample nodes, especially when there is an internal sample node which is also a polytomy, and (b) tip nodes that aren't samples ("dangling nodes). I suspect (b) is already tested, but maybe not when some of the branches from a polytomy are dangling, and some aren't. (a) is more important to test than (b), I would think. |
I did think of this, but then wondered if it would really be of much use since we can already call msprime.simulate() to do much the same thing. I was also slightly worried that people might use this as some sort of evolutionary model, when it's not. I could easily be convinced it's a good idea, though, if you think it's a useful thing to add. |
|
Thanks for the review, I'll add the tests you mention. |
|
Oh, and lastly (promise!), are we dealing OK with polytomies that have missing data? Both where some or all of the children of a polytomy are marked as missing (e.g. a star topology with all samples bar one missing), and where there is an internal sample but that sample is marked as missing (I guess this should behave just as if the internal sample node wasn't a sample at all. |
That's a very good point. The (relatively weak) arguments that I can see for it are:
|
|
OK, let's break the |
b78576b to
0521c0b
Compare
|
I've added the extra tests @hyanwong - great ideas. The internal samples and missing data stuff was already very thoroughly tested, and I've added in a few extra cases where we're using the balanced trees of varying arity. I think we can be confident this is correct now! |
|
Great. This seems mergeable to me then. Great work @jeromekelleher - hope the Hartigan algorithm didn't mess with your head too much. |
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving the tests here. One suggestion about fixtures - but it is no biggie as happy either way.
| uint64_t t = 1; | ||
| int8_t r = 0; | ||
|
|
||
| assert(v != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this isn't tsk_bug_assert as the consequences are pretty disastrous!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty perf sensitive code, so I want to compile out the assert.
| tree = ts.first() | ||
| @pytest.mark.parametrize("n", range(2, 10)) | ||
| def test_all_missing(self, n): | ||
| ts = msprime.simulate(n, random_seed=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The places you have ts = msprime.simulate(n, random_seed=2) could be a session-scoped parameterised fixture. https://docs.pytest.org/en/stable/fixture.html#fixture-parametrize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might pick that up later - it's hard to know where to draw the line when updating the test code!
|
Great stuff. Does this automatically make it into the likelihood compression part of the matching algorithm now? |
No, that's got to be done separately. |
We were incorrectly using Fitch parsimony on general trees. Closes tskit-dev#987
0521c0b to
0c63887
Compare

Closes #987
PR Checklist: