Skip to content

Conversation

@hyanwong
Copy link
Member

@hyanwong hyanwong commented Jan 18, 2021

Description

Adds KEEP_UNARY_IN_INDIVIDUALS option to simplify()

Fixes #1113

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@codecov
Copy link

codecov bot commented Jan 18, 2021

Codecov Report

Merging #1119 (929f4d2) into main (b6cee42) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1119   +/-   ##
=======================================
  Coverage   93.70%   93.71%           
=======================================
  Files          26       26           
  Lines       21065    21079   +14     
  Branches      899      899           
=======================================
+ Hits        19739    19754   +15     
+ Misses       1289     1288    -1     
  Partials       37       37           
Flag Coverage Δ
c-tests 92.51% <100.00%> (+0.01%) ⬆️
lwt-tests 92.80% <ø> (ø)
python-c-tests 94.86% <ø> (ø)
python-tests 98.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/core.h 100.00% <ø> (ø)
c/tskit/core.c 96.99% <100.00%> (+0.02%) ⬆️
c/tskit/tables.c 90.77% <100.00%> (+0.01%) ⬆️
c/tskit/trees.c 94.83% <0.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6cee42...929f4d2. Read the comment docs.

@hyanwong hyanwong force-pushed the keep_unary_in_individuals branch 4 times, most recently from 8a04707 to 393c2a0 Compare January 18, 2021 15:40
@hyanwong
Copy link
Member Author

If we want to get this into a C API release version so we can release SLiM 3.5.1, what do we need to do @benjeffery ?

@benjeffery
Copy link
Member

If we want to get this into a C API release version so we can release SLiM 3.5.1, what do we need to do @benjeffery ?

I think we can push out a C API release soon after this is merged - we can hold #866 till after as it is still in progress anyway,

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One nit.

@hyanwong hyanwong force-pushed the keep_unary_in_individuals branch from 393c2a0 to 1edcbd7 Compare January 18, 2021 17:12
@hyanwong
Copy link
Member Author

LGTM! One nit.

Thanks. I didn't know if we wanted to reorder the bits passed as flags to simplify() so that the two TSK_KEEP_UNARY bits were together? I don't know if the actual bit used is purely an internal thing or not, but I though better not to rearrange for the moment?

@jeromekelleher
Copy link
Member

I don't know if the actual bit used is purely an internal thing or not, but I though better not to rearrange for the moment?

I don't think there's much point in changing the bit values - someone might be depending on them, and there isn't really much to be gained from having them contiguous (and when we tag 1.0 we'll have to get used to these compromises anyway).

@petrelharp
Copy link
Contributor

LGTM, and it's very straightforward, although how do we feel about it not really being tested for correctness? I'd say that this implementation is "obviously correct", but changes down the line to simplify's algorithm might break that.

@hyanwong
Copy link
Member Author

LGTM, and it's very straightforward, although how do we feel about it not really being tested for correctness? I'd say that this implementation is "obviously correct", but changes down the line to simplify's algorithm might break that.

Correctness-wise don't think it's much different from KEEP_UNARY, is it? There are a few basic tests I added in the suite, but I imagine this might get tested more thoroughly (? for correctness) if it is implemented in the Python API too?

@hyanwong hyanwong force-pushed the keep_unary_in_individuals branch from 1edcbd7 to f4a70f5 Compare January 18, 2021 18:18
@petrelharp
Copy link
Contributor

I imagine this might get tested more thoroughly (? for correctness) if it is implemented in the Python API too?

Right, I'm wondering if we need to actually do this.

@hyanwong
Copy link
Member Author

I imagine this might get tested more thoroughly (? for correctness) if it is implemented in the Python API too?

Right, I'm wondering if we need to actually do this.

I think not, because the new functionality is simply a slightly restricted version of keep_unary. I'm not sure how it could go seriously wrong?

@petrelharp
Copy link
Contributor

Right now, I agree - I don't see how it could go wrong. But, suppose that next month, someone else makes some other change to simplify that breaks this. They don't catch it becaue they add tests for the functionality that they've implemented, and assume that everything else (e.g., this) is well-tested elsewhere.

@hyanwong
Copy link
Member Author

hyanwong commented Jan 18, 2021

But I have tested a (admittedly single tree) case where one unary node has an individual and another one doesn't, and I've checked the number of nodes that are expected to remain after the KEEP_UNARY_IN_INDIVIDUALS flag. What other tests might you be thinking of?

@jeromekelleher
Copy link
Member

I think the test_simplest_unary_with_individuals does test this for correctness @petrelharp, doesn't it? I don't think we need to go down the implement-in-Python path, it's quite a simple addition, and the tests do fail if it's not implemented correctly (right @hyanwong?)

@hyanwong
Copy link
Member Author

hyanwong commented Jan 19, 2021

the tests do fail if it's not implemented correctly (right @hyanwong?)

Yes, they do fail if it's not implemented, or not implemented correctly.

@jeromekelleher
Copy link
Member

As a follow up, we should open an issue to track adding the functionality in Python, though, so we don't forget about it.

@hyanwong
Copy link
Member Author

As a follow up, we should open an issue to track adding the functionality in Python, though, so we don't forget about it.

Now at #1120

@petrelharp
Copy link
Contributor

petrelharp commented Jan 19, 2021

But I have tested a (admittedly single tree) case where one unary node has an individual and another one doesn't, and I've checked the number of nodes that are expected to remain after the KEEP_UNARY_IN_INDIVIDUALS flag. What other tests might you be thinking of?

Well, asyou say, there's tests for (a) equality of tables where the behavior is the same as KEEP_UNARY (b) equality of tables where it's the same as just plain simplify, and (c) for equality of number of rows where KEEP_UNARY_IN_INDIVIDUALS has new behavior. At minimum I'd compare to the tables that are expected, rather than just counting the number of rows. It'd also be nice to put in a few more odd situations, like an individual on a unary node above the root; an individual on a unary node above a single sample, unary nodes without individuals in both situations, a case where a node was not unary but becomes unary because of simplification, and a case where an individual has two nodes, and one will be retained because of this but the other not. Like so:

ex

    const char *nodes = "1  0   0   -1\n"
                        "1  0   0   0\n"
                        "0  1   0   -1\n"
                        "0  1   0   1\n"
                        "0  2   0   -1\n"
                        "0  3   0   -1\n"
                        "0  3   0   2\n"
                        "0  1   0   -1\n"
                        "0  1   0   3\n"
                        "0  0   0   0\n"
                        "0  0   0   0\n"
                        "0  1   0   3\n";
    const char *edges = "0  2   2   0\n"
                        "0  2   3   1\n"
                        "0  2   4   2,3\n"
                        "0  1   5   4\n"
                        "0  1   6   4\n"
                        "2  3   7   0\n"
                        "2  3   8   1,9\n"
                        "2  3   11   10\n";
    const char *individuals = "0    2.5\n"
                              "0    1.5,3.1\n"
                              "0    1.1\n"
                              "0    1.9\n";

@hyanwong
Copy link
Member Author

But I have tested a (admittedly single tree) case where one unary node has an individual and another one doesn't, and I've checked the number of nodes that are expected to remain after the KEEP_UNARY_IN_INDIVIDUALS flag. What other tests might you be thinking of?
At minimum I'd compare to the tables that are expected, rather than just counting the number of rows.

Yes, fair point. Or maybe, simply test that the nodes are the expected ones (e.g. return the new node ids)

It'd also be nice to put in a few more odd situations, like an individual on a unary node above the root; an individual on a unary node above a single sample, unary nodes without individuals in both situations,

Shouldn't we then test the normal KEEP_UNARY in this way too. Or if we do (I haven't checked), why would it be any different with KEEP_UNARY_IN_INDIVIDUALS

a case where a node was not unary but becomes unary because of simplification, and a case where an individual has two nodes, and one will be retained because of this but the other not.

Yep, I can see that I guess.

@petrelharp
Copy link
Contributor

petrelharp commented Jan 19, 2021

I think this is the expected result:

    const char *nodes = "1  0   0   -1\n"
                        "1  0   0   0\n"
                        "0  1   0   1\n"
                        "0  1   0   2\n"
                        "0  2   0   -1\n"
                        "0  3   0   3\n";
    const char *edges =
                        "0  2   2   1\n"
                        "2  3   3   1\n"
                        "0  2   4   0,2\n"
                        "1  2   5   4\n";
    const char *individuals = "0    2.5\n"
                              "0    1.5,3.1\n"
                              "0    1.9\n"
                              "0    1.1\n";

@petrelharp
Copy link
Contributor

That wasn't quite right, but I've got it sorted out now.

@jeromekelleher
Copy link
Member

Is this ready to go @petrelharp? It would be good to get this in and ship the next C API release so we can get this into SLiM and so we don't need to hold up #1125.

(Although, tracking individual parents properly would be very nice to be able to do in SLiM too - would it be worth pulling all this into a C API release and updating SLiM then?)

@hyanwong
Copy link
Member Author

I think we concluded in MesserLab/SLiM#139 (comment) that on the principle of least harm we would default to setting KEEP_UNARY_IN_INDIVIDUALS to false by default in SLiM simplification. So SLiM 3.5.1 is not actually waiting up on this. Perhaps that argues for waiting until the individual parents stuff is merged too?

@petrelharp
Copy link
Contributor

I think this is ready to merge - I was just waiting for someone (@hyanwong?) to look at my additional test case. I think it's good, though - merge if you like. We should discuss the C/SLiM release on slack.

@hyanwong
Copy link
Member Author

I was just waiting for someone (@hyanwong?) to look at my additional test case.

Oh sorry. I didn't realise you were waiting on me. Merge away.

@benjeffery
Copy link
Member

I am going to squash the commits and merge.

@benjeffery benjeffery force-pushed the keep_unary_in_individuals branch from 461fa9a to 929f4d2 Compare January 22, 2021 16:33
@mergify mergify bot merged commit f2d906d into tskit-dev:main Jan 22, 2021
@bhaller
Copy link

bhaller commented Jan 22, 2021

I think we concluded in MesserLab/SLiM#139 (comment) that on the principle of least harm we would default to setting KEEP_UNARY_IN_INDIVIDUALS to false by default in SLiM simplification. So SLiM 3.5.1 is not actually waiting up on this. Perhaps that argues for waiting until the individual parents stuff is merged too?

As I understood the situation, SLiM 3.5.1 is waiting on this because it seemed better not to have to roll a 3.5.2 release with this feature, or to have the feature delayed until SLiM 3.6 (whatever that turns out to be) happens, which will probably be months down the road. Anyway, I'm not sure where the right place is now to discuss this with you and @petrelharp – probably not on a merged PR! – but let's figure this out. :->

@hyanwong
Copy link
Member Author

As I understood the situation, SLiM 3.5.1 is waiting on this

Actually, I think it's not. It's only waiting on my reworking of the pyslim docs to include "retaining" individuals. That doesn't need this PR in tskit.

Anyway, I'm not sure where the right place is now to discuss this with you and @petrelharp – probably not on a merged PR!

Indeed. I think the link above, MesserLab/SLiM#139 is fine.

@benjeffery
Copy link
Member

benjeffery commented Jan 22, 2021

Is that a definitive no to a release? Happy to tag a release if SLiM does need this. Next release after that will likely be a couple of weeks looking at the milestone.

@hyanwong
Copy link
Member Author

hyanwong commented Jan 22, 2021

No hurry for a release from me any more (now that Peter and Ben H argued me out of setting KEEP_UNARY_IN_INDIVIDUALS as the default). So punt it off for a couple of weeks and go and get yourself a G&T instead. Sorry - my bad - we probably would like a release to use in SLiM, so we don't have to make another point release later.

@bhaller
Copy link

bhaller commented Jan 22, 2021

@hyanwong I don't understand your read on the situation. Could you explain why what I wrote above is incorrect?

@bhaller
Copy link

bhaller commented Jan 22, 2021

Ah, I see the discussion has moved to Slack...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keeping specific unary nodes though simplify

5 participants