-
Notifications
You must be signed in to change notification settings - Fork 79
Add KEEP_UNARY_IN_INDIVIDUALS option to simplify() #1119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1119 +/- ##
=======================================
Coverage 93.70% 93.71%
=======================================
Files 26 26
Lines 21065 21079 +14
Branches 899 899
=======================================
+ Hits 19739 19754 +15
+ Misses 1289 1288 -1
Partials 37 37
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
8a04707 to
393c2a0
Compare
|
If we want to get this into a C API release version so we can release SLiM 3.5.1, what do we need to do @benjeffery ? |
I think we can push out a C API release soon after this is merged - we can hold #866 till after as it is still in progress anyway, |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One nit.
393c2a0 to
1edcbd7
Compare
Thanks. I didn't know if we wanted to reorder the bits passed as flags to |
I don't think there's much point in changing the bit values - someone might be depending on them, and there isn't really much to be gained from having them contiguous (and when we tag 1.0 we'll have to get used to these compromises anyway). |
|
LGTM, and it's very straightforward, although how do we feel about it not really being tested for correctness? I'd say that this implementation is "obviously correct", but changes down the line to simplify's algorithm might break that. |
Correctness-wise don't think it's much different from KEEP_UNARY, is it? There are a few basic tests I added in the suite, but I imagine this might get tested more thoroughly (? for correctness) if it is implemented in the Python API too? |
1edcbd7 to
f4a70f5
Compare
Right, I'm wondering if we need to actually do this. |
I think not, because the new functionality is simply a slightly restricted version of keep_unary. I'm not sure how it could go seriously wrong? |
|
Right now, I agree - I don't see how it could go wrong. But, suppose that next month, someone else makes some other change to simplify that breaks this. They don't catch it becaue they add tests for the functionality that they've implemented, and assume that everything else (e.g., this) is well-tested elsewhere. |
|
But I have tested a (admittedly single tree) case where one unary node has an individual and another one doesn't, and I've checked the number of nodes that are expected to remain after the KEEP_UNARY_IN_INDIVIDUALS flag. What other tests might you be thinking of? |
|
I think the |
Yes, they do fail if it's not implemented, or not implemented correctly. |
|
As a follow up, we should open an issue to track adding the functionality in Python, though, so we don't forget about it. |
Now at #1120 |
Well, asyou say, there's tests for (a) equality of tables where the behavior is the same as KEEP_UNARY (b) equality of tables where it's the same as just plain simplify, and (c) for equality of number of rows where KEEP_UNARY_IN_INDIVIDUALS has new behavior. At minimum I'd compare to the tables that are expected, rather than just counting the number of rows. It'd also be nice to put in a few more odd situations, like an individual on a unary node above the root; an individual on a unary node above a single sample, unary nodes without individuals in both situations, a case where a node was not unary but becomes unary because of simplification, and a case where an individual has two nodes, and one will be retained because of this but the other not. Like so: |
Yes, fair point. Or maybe, simply test that the nodes are the expected ones (e.g. return the new node ids)
Shouldn't we then test the normal KEEP_UNARY in this way too. Or if we do (I haven't checked), why would it be any different with KEEP_UNARY_IN_INDIVIDUALS
Yep, I can see that I guess. |
|
I think this is the expected result: |
|
That wasn't quite right, but I've got it sorted out now. |
|
Is this ready to go @petrelharp? It would be good to get this in and ship the next C API release so we can get this into SLiM and so we don't need to hold up #1125. (Although, tracking individual parents properly would be very nice to be able to do in SLiM too - would it be worth pulling all this into a C API release and updating SLiM then?) |
|
I think we concluded in MesserLab/SLiM#139 (comment) that on the principle of least harm we would default to setting KEEP_UNARY_IN_INDIVIDUALS to |
|
I think this is ready to merge - I was just waiting for someone (@hyanwong?) to look at my additional test case. I think it's good, though - merge if you like. We should discuss the C/SLiM release on slack. |
Oh sorry. I didn't realise you were waiting on me. Merge away. |
|
I am going to squash the commits and merge. |
461fa9a to
929f4d2
Compare
As I understood the situation, SLiM 3.5.1 is waiting on this because it seemed better not to have to roll a 3.5.2 release with this feature, or to have the feature delayed until SLiM 3.6 (whatever that turns out to be) happens, which will probably be months down the road. Anyway, I'm not sure where the right place is now to discuss this with you and @petrelharp – probably not on a merged PR! – but let's figure this out. :-> |
Actually, I think it's not. It's only waiting on my reworking of the pyslim docs to include "retaining" individuals. That doesn't need this PR in tskit.
Indeed. I think the link above, MesserLab/SLiM#139 is fine. |
|
Is that a definitive no to a release? Happy to tag a release if SLiM does need this. Next release after that will likely be a couple of weeks looking at the milestone. |
|
|
|
@hyanwong I don't understand your read on the situation. Could you explain why what I wrote above is incorrect? |
|
Ah, I see the discussion has moved to Slack... |

Description
Adds KEEP_UNARY_IN_INDIVIDUALS option to simplify()
Fixes #1113
PR Checklist: