-
Notifications
You must be signed in to change notification settings - Fork 79
Implement keep_unary_in_individuals in Python #1190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement keep_unary_in_individuals in Python #1190
Conversation
8df06fa to
b4ce0e0
Compare
|
Also fixes #1120 . Note that I've used the string value |
b4ce0e0 to
63459de
Compare
Codecov Report
@@ Coverage Diff @@
## main #1190 +/- ##
==========================================
+ Coverage 93.72% 94.42% +0.69%
==========================================
Files 26 22 -4
Lines 21511 15457 -6054
Branches 904 905 +1
==========================================
- Hits 20161 14595 -5566
+ Misses 1312 824 -488
Partials 38 38
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the string/bool flags I'm afraid.
Are there also some tests we could add that don't hit #1154?
Other suggestions welcome! I'm happy to sit on the fence on this one.
Isn't that the only thing we are testing here? |
|
Maybe we are trying to be too clever here with the API and it would be simpler to just add @petrelharp, any objections? |
63459de to
d0dbc6e
Compare
|
Now updated & swapped back to individual parameters |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost here as to what's going in what PR now. This one seems to include a fix for the unary_in_individuals bug, but doesn't include any tests.
I think it would be simpler if we brought this and #1191 together into one and tried to get them merged ASAP.
We should include tests that use the pedigree information in the forward simulator that was introduced in recent commits.
Yes, I agree. Since most of the discussion is in this PR & issue, I'll close the other one and add the tests here.
I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly? |
We're interested in unary nodes that refer to individuals, for which the application is forward simulations that store individuals. We have a simple forward simulator that stores invididauls now, so we should use it to test this feature. |
Yes, that's what the tests in #1191 do, ISTR. But they don't use the pedigree information, only the individuals. Perhaps it'll be clearer when I merge those tests into this PR? |
|
OK @jeromekelleher - this should be done now. I added an extra test that scans though all the edges and checks (after simplification either with keep_unary or keep_unary_with_individuals) that all the individuals along the genetic lineages are kept, along with their unary nodes. I needed to flip the parameter in |
|
P.s. ping me if you need me to squash all these commits together after review? |
d560923 to
9bdc236
Compare
| 10, | ||
| seed=1, | ||
| deep_history=True, | ||
| deep_history=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought throught he implications of changing this for test_shuffled_individual_parent_mapping, have you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erm, not sure why should this make a difference? We still shuffle the individuals and check that they are (a) shuffled and (b) the original individual ids correspond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But perhaps I'm not understanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the default here? Seems gratuitous to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below. If we want to check that all nodes have an individual, and all the individuals point to their correct parents, then we can't include the deep history nodes, which have no individuals (and also the nodes at the top of the WF generated simulation will not have the correct parents).
I.e. if we have a deep history, we (almost by definition) cannot fully map the trees onto the individuals pedigree. It seemed worth testing that we were capturing all of the genetic pedigree in the individuals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(yan has thought this over)
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but some gratuitous changes included AFAICT
| 10, | ||
| seed=1, | ||
| deep_history=True, | ||
| deep_history=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the default here? Seems gratuitous to me.
python/tests/tsutil.py
Outdated
|
|
||
|
|
||
| def insert_individuals(ts, samples=None, ploidy=1): | ||
| def insert_individuals(ts, nodes=None, ploidy=1, allow_mixed_ploidy=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use this allow_mixed_ploidy argument? If not, why add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re deep_history - if we graft deep history onto the simulation we no longer have individuals for all the nodes (the new "deep history" nodes don't have individuals)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re mixed ploidy, sure, can remove. It just seemed like a useful thing to flag up to our future selves. Perhaps all that's needed is to make a comment/warning in the docstring that this utility function could result in mixed ploidies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
python/tests/test_wright_fisher.py
Outdated
| # check samples match | ||
| assert sts.num_samples == len(sub_samples) | ||
| for n, sn in zip(sub_samples, sts.samples()): | ||
| print("::::", n, sn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want this print in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP. I do this all the time (just ask Jerome)
|
I've made the WF test stricter (now it's an if-and-only-if test). |
|
With the new test and pending the suggestions above I'm good with this. Do you want to take care of those and remove my print statement and squash, @hyanwong? |
2eaddea to
92bc883
Compare
|
OK, thanks @petrelharp and @jeromekelleher - I removed the print statement and the |
|
Ah, wait - another thing to do. You've implemented the |
Yeah, you're right. Is what you are meaning here? tskit/python/tests/test_topology.py Line 2403 in a23b5b4
I had a look, and I think there might be a bug in that test. In particular, the ts-with-unary-nodes that is created is (I think) called |
|
1296d99 adds a test for the python simplifier and also corrects what I think is a wrong variable name in the previous code (we weren't actually testing the ts with unary nodes before: oops). If this is correct, I can squash the commit into the previous, if required |
|
Looks good to me. I made some minor adjustments to the WF test I wrote, so it's also testing correctness of |
and fix the associated simplify bug
bc38574 to
e49fe6d
Compare
LGTM. Squashed now. Looking over it I realise that I didn't do the thing of slicing through the TS at a certain time and checking that all lineages were covered; it's done by picking nodes at random instead. But I guess the random thing is a harder test anyway, so all's fine. Also there's the slicing test in the SLiM PR that's just been merged, so the idea is saved somewhere, at least. |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're all happy, so good to go! Thanks @hyanwong and @petrelharp.
Yay. Thanks Jerome. |
Adds python support for keep_unary="in_individuals", but does not fix #1154 . As requested by @benjefferyUpdated to include the fix + tests