Implement keep_unary_in_individuals in Python #1190

hyanwong · 2021-02-05T16:40:05Z

~~Adds python support for keep_unary="in_individuals", but does not fix #1154 . As requested by @benjeffery~~

Updated to include the fix + tests

hyanwong · 2021-02-05T16:49:40Z

Also fixes #1120 . Note that I've used the string value "in_individuals" to turn on the KEEP_UNARY_IN_INDIVIDUALS behaviour, rather than the originally suggested "individuals", as I think simplify(keep_unary="individuals") sounds like we keep unary individuals (whatever that means) rather than keeping unary nodes if they are in an individual. Happy to change to whatever the consensus is, though.

python/tskit/tables.py

codecov · 2021-02-07T12:34:33Z

Codecov Report

Merging #1190 (e49fe6d) into main (a23b5b4) will increase coverage by 0.69%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1190      +/-   ##
==========================================
+ Coverage   93.72%   94.42%   +0.69%     
==========================================
  Files          26       22       -4     
  Lines       21511    15457    -6054     
  Branches      904      905       +1     
==========================================
- Hits        20161    14595    -5566     
+ Misses       1312      824     -488     
  Partials       38       38

Flag	Coverage Δ
c-tests	`92.50% <100.00%> (ø)`
lwt-tests	`?`
python-c-tests	`?`
python-tests	`98.63% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/tskit/trees.py	`97.42% <ø> (ø)`
c/tskit/tables.c	`90.83% <100.00%> (ø)`
python/tskit/tables.py	`99.60% <100.00%> (+<0.01%)`	⬆️
c/tskit/core.h	`100.00% <0.00%> (ø)`
python/tskit/__init__.py
python/lwt_interface/tskit_lwt_interface.h
python/lwt_interface/example_c_module.c

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a23b5b4...e49fe6d. Read the comment docs.

benjeffery

I'm not a fan of the string/bool flags I'm afraid.
Are there also some tests we could add that don't hit #1154?

python/tskit/tables.py

hyanwong · 2021-02-08T11:12:23Z

I'm not a fan of the string/bool flags I'm afraid.

Other suggestions welcome! I'm happy to sit on the fence on this one.

Are there also some tests we could add that don't hit #1154?

Isn't that the only thing we are testing here?

jeromekelleher · 2021-02-08T12:27:16Z

Maybe we are trying to be too clever here with the API and it would be simpler to just add keep_unary_in_individuals to mirror the low-level API. We can always add a higher-level approach later if we need to.

@petrelharp, any objections?

hyanwong · 2021-02-18T12:52:16Z

Now updated & swapped back to individual parameters keep_unary and keep_unary_in_individuals. Is this OK @benjeffery / @petrelharp ? Tests in #1191 which is based off this PR. They are only separate because @benjeffery asked for that at the start. I can put them all in one PR if preferred.

jeromekelleher

I'm a bit lost here as to what's going in what PR now. This one seems to include a fix for the unary_in_individuals bug, but doesn't include any tests.

I think it would be simpler if we brought this and #1191 together into one and tried to get them merged ASAP.

We should include tests that use the pedigree information in the forward simulator that was introduced in recent commits.

python/tskit/tables.py

python/tskit/trees.py

c/tskit/tables.c

hyanwong · 2021-02-19T09:32:33Z

I think it would be simpler if we brought this and #1191 together into one and tried to get them merged ASAP.

Yes, I agree. Since most of the discussion is in this PR & issue, I'll close the other one and add the tests here.

We should include tests that use the pedigree information in the forward simulator that was introduced in recent commits.

I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly?

jeromekelleher · 2021-02-19T09:43:35Z

I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly?

We're interested in unary nodes that refer to individuals, for which the application is forward simulations that store individuals. We have a simple forward simulator that stores invididauls now, so we should use it to test this feature.

hyanwong · 2021-02-19T09:46:06Z

I'm not sure what you mean. Why is the pedigree relevant to the unary nodes? Or do you mean that we keep all unary nodes and check that the pedigree info goes back from the samples to the roots correctly?

We're interested in unary nodes that refer to individuals, for which the application is forward simulations that store individuals. We have a simple forward simulator that stores invididauls now, so we should use it to test this feature.

Yes, that's what the tests in #1191 do, ISTR. But they don't use the pedigree information, only the individuals. Perhaps it'll be clearer when I merge those tests into this PR?

hyanwong · 2021-02-19T13:20:24Z

OK @jeromekelleher - this should be done now. I added an extra test that scans though all the edges and checks (after simplification either with keep_unary or keep_unary_with_individuals) that all the individuals along the genetic lineages are kept, along with their unary nodes.

I needed to flip the parameter in wf_sim_with_individual_metadata to deep_history=False so that all nodes had an individual. I hope that's OK @benjeffery ? From my reading of the test code that uses that fixture, the deep history is irrelevant, and isn't required for use in the tests.

hyanwong · 2021-02-19T13:21:12Z

P.s. ping me if you need me to squash all these commits together after review?

petrelharp · 2021-02-19T15:11:51Z

python/tests/test_tables.py

            10,
            seed=1,
-            deep_history=True,
+            deep_history=False,


I haven't thought throught he implications of changing this for test_shuffled_individual_parent_mapping, have you?

Erm, not sure why should this make a difference? We still shuffle the individuals and check that they are (a) shuffled and (b) the original individual ids correspond.

But perhaps I'm not understanding something?

Why change the default here? Seems gratuitous to me.

See below. If we want to check that all nodes have an individual, and all the individuals point to their correct parents, then we can't include the deep history nodes, which have no individuals (and also the nodes at the top of the WF generated simulation will not have the correct parents).

I.e. if we have a deep history, we (almost by definition) cannot fully map the trees onto the individuals pedigree. It seemed worth testing that we were capturing all of the genetic pedigree in the individuals.

(yan has thought this over)

jeromekelleher

Looks good, but some gratuitous changes included AFAICT

jeromekelleher · 2021-02-19T15:27:29Z

python/tests/test_tables.py

            10,
            seed=1,
-            deep_history=True,
+            deep_history=False,


Why change the default here? Seems gratuitous to me.

jeromekelleher · 2021-02-19T15:32:36Z

python/tests/tsutil.py



-def insert_individuals(ts, samples=None, ploidy=1):
+def insert_individuals(ts, nodes=None, ploidy=1, allow_mixed_ploidy=False):


Do we use this allow_mixed_ploidy argument? If not, why add it?

Re deep_history - if we graft deep history onto the simulation we no longer have individuals for all the nodes (the new "deep history" nodes don't have individuals)

Re mixed ploidy, sure, can remove. It just seemed like a useful thing to flag up to our future selves. Perhaps all that's needed is to make a comment/warning in the docstring that this utility function could result in mixed ploidies?

sounds good

hyanwong · 2021-02-19T16:37:21Z

python/tests/test_wright_fisher.py

+        # check samples match
+        assert sts.num_samples == len(sub_samples)
+        for n, sn in zip(sub_samples, sts.samples()):
+            print("::::", n, sn)


do you want this print in there?

whoops sorry

NP. I do this all the time (just ask Jerome)

petrelharp · 2021-02-19T16:37:44Z

I've made the WF test stricter (now it's an if-and-only-if test).

petrelharp · 2021-02-19T16:46:44Z

With the new test and pending the suggestions above I'm good with this. Do you want to take care of those and remove my print statement and squash, @hyanwong?

hyanwong · 2021-02-19T17:07:15Z

OK, thanks @petrelharp and @jeromekelleher - I removed the print statement and the allow_mixed_ploidy param in the test suite (turns out we didn't have any tests with mixed ploidy, so I put the assert in, but we can remove later if we find we need it. I guess this is ready to go?

petrelharp · 2021-02-19T18:42:11Z

Ah, wait - another thing to do. You've implemented the keep_unary_in_individuals in the python Simplifier, but are not verifying it does the same thing as ts.simplify(). (See the tests that do this for keep_unary, for instance.) Unless I'm missing something.

hyanwong · 2021-02-19T19:53:28Z

Ah, wait - another thing to do. You've implemented the keep_unary_in_individuals in the python Simplifier, but are not verifying it does the same thing as ts.simplify(). (See the tests that do this for keep_unary, for instance.) Unless I'm missing something.

Yeah, you're right. Is what you are meaning here?

tskit/python/tests/test_topology.py

Line 2403 in a23b5b4

s = tests.Simplifier(ts, ts.samples(), keep_unary=keep_unary)

I had a look, and I think there might be a bug in that test. In particular, the ts-with-unary-nodes that is created is (I think) called new_ts, but the tests.Simplifier code is run on the old ts, that doesn't have any unary nodes in it. Or am I missing something?

hyanwong · 2021-02-19T21:47:01Z

1296d99 adds a test for the python simplifier and also corrects what I think is a wrong variable name in the previous code (we weren't actually testing the ts with unary nodes before: oops).

If this is correct, I can squash the commit into the previous, if required

petrelharp · 2021-02-20T00:30:21Z

Looks good to me. I made some minor adjustments to the WF test I wrote, so it's also testing correctness of keep_unary and the default as well (and removed the num_samples=10 case because the tests were starting to run slow). Have a look at what I did, then I think we can squash and merge?

and fix the associated simplify bug

hyanwong · 2021-02-20T10:12:18Z

Looks good to me. I made some minor adjustments to the WF test I wrote, so it's also testing correctness of keep_unary and the default as well (and removed the num_samples=10 case because the tests were starting to run slow). Have a look at what I did, then I think we can squash and merge?

LGTM. Squashed now.

Looking over it I realise that I didn't do the thing of slicing through the TS at a certain time and checking that all lineages were covered; it's done by picking nodes at random instead. But I guess the random thing is a harder test anyway, so all's fine. Also there's the slicing test in the SLiM PR that's just been merged, so the idea is saved somewhere, at least.

jeromekelleher

Looks like we're all happy, so good to go! Thanks @hyanwong and @petrelharp.

hyanwong · 2021-02-22T09:34:05Z

Looks like we're all happy, so good to go! Thanks @hyanwong and @petrelharp.

Yay. Thanks Jerome.

hyanwong force-pushed the keep_unary_in_individuals_python branch from 8df06fa to b4ce0e0 Compare February 5, 2021 16:41

petrelharp reviewed Feb 5, 2021

View reviewed changes

python/tskit/tables.py Outdated Show resolved Hide resolved

hyanwong force-pushed the keep_unary_in_individuals_python branch from b4ce0e0 to 63459de Compare February 7, 2021 12:05

hyanwong mentioned this pull request Feb 7, 2021

Keep unary in individuals test #1191

Closed

3 tasks

benjeffery reviewed Feb 8, 2021

View reviewed changes

python/tskit/tables.py Outdated Show resolved Hide resolved

hyanwong force-pushed the keep_unary_in_individuals_python branch from 63459de to d0dbc6e Compare February 18, 2021 12:49

jeromekelleher reviewed Feb 19, 2021

View reviewed changes

python/tskit/tables.py Outdated Show resolved Hide resolved

python/tskit/trees.py Outdated Show resolved Hide resolved

c/tskit/tables.c Show resolved Hide resolved

hyanwong mentioned this pull request Feb 19, 2021

add keep_unary_in_individuals to python and test #1156

Closed

hyanwong force-pushed the keep_unary_in_individuals_python branch 2 times, most recently from d560923 to 9bdc236 Compare February 19, 2021 13:35

petrelharp reviewed Feb 19, 2021

View reviewed changes

jeromekelleher reviewed Feb 19, 2021

View reviewed changes

hyanwong commented Feb 19, 2021

View reviewed changes

hyanwong force-pushed the keep_unary_in_individuals_python branch from 2eaddea to 92bc883 Compare February 19, 2021 17:05

Implement keep_unary_in_individuals in Python

e49fe6d

and fix the associated simplify bug

hyanwong force-pushed the keep_unary_in_individuals_python branch from bc38574 to e49fe6d Compare February 20, 2021 10:06

hyanwong mentioned this pull request Feb 20, 2021

Support retain as well as remember tskit-dev/pyslim#125

Merged

jeromekelleher approved these changes Feb 22, 2021

View reviewed changes

jeromekelleher added the AUTOMERGE-REQUESTED label Feb 22, 2021

mergify bot merged commit 21da9b0 into tskit-dev:main Feb 22, 2021

mergify bot removed the AUTOMERGE-REQUESTED label Feb 22, 2021

jeromekelleher mentioned this pull request Jul 4, 2022

Keep unary for coalescent nodes in simplify #2381

Closed



		def insert_individuals(ts, samples=None, ploidy=1):
		def insert_individuals(ts, nodes=None, ploidy=1, allow_mixed_ploidy=False):

Implement keep_unary_in_individuals in Python #1190

Implement keep_unary_in_individuals in Python #1190

Uh oh!

Conversation

hyanwong commented Feb 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyanwong commented Feb 5, 2021

Uh oh!

Uh oh!

codecov bot commented Feb 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

benjeffery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hyanwong commented Feb 8, 2021

Uh oh!

jeromekelleher commented Feb 8, 2021

Uh oh!

hyanwong commented Feb 18, 2021

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hyanwong commented Feb 19, 2021

Uh oh!

jeromekelleher commented Feb 19, 2021

Uh oh!

hyanwong commented Feb 19, 2021

Uh oh!

hyanwong commented Feb 19, 2021

Uh oh!

hyanwong commented Feb 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyanwong Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyanwong Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petrelharp commented Feb 19, 2021

Uh oh!

petrelharp commented Feb 19, 2021

Uh oh!

hyanwong commented Feb 19, 2021

hyanwong commented Feb 5, 2021 •

edited

Loading

codecov bot commented Feb 7, 2021 •

edited

Loading

hyanwong Feb 19, 2021 •

edited

Loading

hyanwong Feb 19, 2021 •

edited

Loading