Skip to content

Conversation

@hyanwong
Copy link
Member

@hyanwong hyanwong commented Feb 5, 2021

Adds python support for keep_unary="in_individuals", but does not fix #1154 . As requested by @benjeffery

Updated to include the fix + tests

@hyanwong hyanwong force-pushed the keep_unary_in_individuals_python branch from 8df06fa to b4ce0e0 Compare February 5, 2021 16:41
@hyanwong
Copy link
Member Author

hyanwong commented Feb 5, 2021

Also fixes #1120 . Note that I've used the string value "in_individuals" to turn on the KEEP_UNARY_IN_INDIVIDUALS behaviour, rather than the originally suggested "individuals", as I think simplify(keep_unary="individuals") sounds like we keep unary individuals (whatever that means) rather than keeping unary nodes if they are in an individual. Happy to change to whatever the consensus is, though.

@hyanwong hyanwong force-pushed the keep_unary_in_individuals_python branch from b4ce0e0 to 63459de Compare February 7, 2021 12:05
@hyanwong hyanwong mentioned this pull request Feb 7, 2021
3 tasks
@codecov
Copy link

codecov bot commented Feb 7, 2021

Codecov Report

Merging #1190 (e49fe6d) into main (a23b5b4) will increase coverage by 0.69%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1190      +/-   ##
==========================================
+ Coverage   93.72%   94.42%   +0.69%     
==========================================
  Files          26       22       -4     
  Lines       21511    15457    -6054     
  Branches      904      905       +1     
==========================================
- Hits        20161    14595    -5566     
+ Misses       1312      824     -488     
  Partials       38       38              
Flag Coverage Δ
c-tests 92.50% <100.00%> (ø)
lwt-tests ?
python-c-tests ?
python-tests 98.63% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/trees.py 97.42% <ø> (ø)
c/tskit/tables.c 90.83% <100.00%> (ø)
python/tskit/tables.py 99.60% <100.00%> (+<0.01%) ⬆️
c/tskit/core.h 100.00% <0.00%> (ø)
python/tskit/__init__.py
python/lwt_interface/tskit_lwt_interface.h
python/lwt_interface/example_c_module.c

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a23b5b4...e49fe6d. Read the comment docs.

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the string/bool flags I'm afraid.
Are there also some tests we could add that don't hit #1154?

@hyanwong
Copy link
Member Author

hyanwong commented Feb 8, 2021

I'm not a fan of the string/bool flags I'm afraid.

Other suggestions welcome! I'm happy to sit on the fence on this one.

Are there also some tests we could add that don't hit #1154?

Isn't that the only thing we are testing here?

@jeromekelleher
Copy link
Member

Maybe we are trying to be too clever here with the API and it would be simpler to just add keep_unary_in_individuals to mirror the low-level API. We can always add a higher-level approach later if we need to.

@petrelharp, any objections?

@hyanwong hyanwong force-pushed the keep_unary_in_individuals_python branch from 63459de to d0dbc6e Compare February 18, 2021 12:49
@hyanwong
Copy link
Member Author

Now updated & swapped back to individual parameters keep_unary and keep_unary_in_individuals. Is this OK @benjeffery / @petrelharp ? Tests in #1191 which is based off this PR. They are only separate because @benjeffery asked for that at the start. I can put them all in one PR if preferred.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost here as to what's going in what PR now. This one seems to include a fix for the unary_in_individuals bug, but doesn't include any tests.

I think it would be simpler if we brought this and #1191 together into one and tried to get them merged ASAP.

We should include tests that use the pedigree information in the forward simulator that was introduced in recent commits.

@hyanwong
Copy link
Member Author

I think it would be simpler if we brought this and #1191 together into one and tried to get them merged ASAP.

Yes, I agree. Since most of the discussion is in this PR & issue, I'll close the other one and add the tests here.

We should include tests that use the pedigree information in the forward simulator that was introduced in recent commits.

I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly?

@jeromekelleher
Copy link
Member

I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly?

We're interested in unary nodes that refer to individuals, for which the application is forward simulations that store individuals. We have a simple forward simulator that stores invididauls now, so we should use it to test this feature.

@hyanwong
Copy link
Member Author

I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly?

We're interested in unary nodes that refer to individuals, for which the application is forward simulations that store individuals. We have a simple forward simulator that stores invididauls now, so we should use it to test this feature.

Yes, that's what the tests in #1191 do, ISTR. But they don't use the pedigree information, only the individuals. Perhaps it'll be clearer when I merge those tests into this PR?

@hyanwong
Copy link
Member Author

OK @jeromekelleher - this should be done now. I added an extra test that scans though all the edges and checks (after simplification either with keep_unary or keep_unary_with_individuals) that all the individuals along the genetic lineages are kept, along with their unary nodes.

I needed to flip the parameter in wf_sim_with_individual_metadata to deep_history=False so that all nodes had an individual. I hope that's OK @benjeffery ? From my reading of the test code that uses that fixture, the deep history is irrelevant, and isn't required for use in the tests.

@hyanwong
Copy link
Member Author

P.s. ping me if you need me to squash all these commits together after review?

@hyanwong hyanwong force-pushed the keep_unary_in_individuals_python branch 2 times, most recently from d560923 to 9bdc236 Compare February 19, 2021 13:35
10,
seed=1,
deep_history=True,
deep_history=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought throught he implications of changing this for test_shuffled_individual_parent_mapping, have you?

Copy link
Member Author

@hyanwong hyanwong Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, not sure why should this make a difference? We still shuffle the individuals and check that they are (a) shuffled and (b) the original individual ids correspond.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But perhaps I'm not understanding something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the default here? Seems gratuitous to me.

Copy link
Member Author

@hyanwong hyanwong Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below. If we want to check that all nodes have an individual, and all the individuals point to their correct parents, then we can't include the deep history nodes, which have no individuals (and also the nodes at the top of the WF generated simulation will not have the correct parents).

I.e. if we have a deep history, we (almost by definition) cannot fully map the trees onto the individuals pedigree. It seemed worth testing that we were capturing all of the genetic pedigree in the individuals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(yan has thought this over)

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but some gratuitous changes included AFAICT

10,
seed=1,
deep_history=True,
deep_history=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the default here? Seems gratuitous to me.



def insert_individuals(ts, samples=None, ploidy=1):
def insert_individuals(ts, nodes=None, ploidy=1, allow_mixed_ploidy=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this allow_mixed_ploidy argument? If not, why add it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re deep_history - if we graft deep history onto the simulation we no longer have individuals for all the nodes (the new "deep history" nodes don't have individuals)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re mixed ploidy, sure, can remove. It just seemed like a useful thing to flag up to our future selves. Perhaps all that's needed is to make a comment/warning in the docstring that this utility function could result in mixed ploidies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

# check samples match
assert sts.num_samples == len(sub_samples)
for n, sn in zip(sub_samples, sts.samples()):
print("::::", n, sn)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want this print in there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops sorry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP. I do this all the time (just ask Jerome)

@petrelharp
Copy link
Contributor

I've made the WF test stricter (now it's an if-and-only-if test).

@petrelharp
Copy link
Contributor

With the new test and pending the suggestions above I'm good with this. Do you want to take care of those and remove my print statement and squash, @hyanwong?

@hyanwong hyanwong force-pushed the keep_unary_in_individuals_python branch from 2eaddea to 92bc883 Compare February 19, 2021 17:05
@hyanwong
Copy link
Member Author

OK, thanks @petrelharp and @jeromekelleher - I removed the print statement and the allow_mixed_ploidy param in the test suite (turns out we didn't have any tests with mixed ploidy, so I put the assert in, but we can remove later if we find we need it. I guess this is ready to go?

@petrelharp
Copy link
Contributor

Ah, wait - another thing to do. You've implemented the keep_unary_in_individuals in the python Simplifier, but are not verifying it does the same thing as ts.simplify(). (See the tests that do this for keep_unary, for instance.) Unless I'm missing something.

@hyanwong
Copy link
Member Author

Ah, wait - another thing to do. You've implemented the keep_unary_in_individuals in the python Simplifier, but are not verifying it does the same thing as ts.simplify(). (See the tests that do this for keep_unary, for instance.) Unless I'm missing something.

Yeah, you're right. Is what you are meaning here?

s = tests.Simplifier(ts, ts.samples(), keep_unary=keep_unary)

I had a look, and I think there might be a bug in that test. In particular, the ts-with-unary-nodes that is created is (I think) called new_ts, but the tests.Simplifier code is run on the old ts, that doesn't have any unary nodes in it. Or am I missing something?

@hyanwong
Copy link
Member Author

1296d99 adds a test for the python simplifier and also corrects what I think is a wrong variable name in the previous code (we weren't actually testing the ts with unary nodes before: oops).

If this is correct, I can squash the commit into the previous, if required

@petrelharp
Copy link
Contributor

Looks good to me. I made some minor adjustments to the WF test I wrote, so it's also testing correctness of keep_unary and the default as well (and removed the num_samples=10 case because the tests were starting to run slow). Have a look at what I did, then I think we can squash and merge?

and fix the associated simplify bug
@hyanwong hyanwong force-pushed the keep_unary_in_individuals_python branch from bc38574 to e49fe6d Compare February 20, 2021 10:06
@hyanwong
Copy link
Member Author

Looks good to me. I made some minor adjustments to the WF test I wrote, so it's also testing correctness of keep_unary and the default as well (and removed the num_samples=10 case because the tests were starting to run slow). Have a look at what I did, then I think we can squash and merge?

LGTM. Squashed now.

Looking over it I realise that I didn't do the thing of slicing through the TS at a certain time and checking that all lineages were covered; it's done by picking nodes at random instead. But I guess the random thing is a harder test anyway, so all's fine. Also there's the slicing test in the SLiM PR that's just been merged, so the idea is saved somewhere, at least.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're all happy, so good to go! Thanks @hyanwong and @petrelharp.

@mergify mergify bot merged commit 21da9b0 into tskit-dev:main Feb 22, 2021
@hyanwong
Copy link
Member Author

Looks like we're all happy, so good to go! Thanks @hyanwong and @petrelharp.

Yay. Thanks Jerome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in simplify with KEEP_UNARY_IN_INDIVIDUALS

4 participants