-
Notifications
You must be signed in to change notification settings - Fork 78
extend union to do "disjoint union"
#3283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3283 +/- ##
==========================================
+ Coverage 89.79% 89.80% +0.01%
==========================================
Files 29 29
Lines 31010 31026 +16
Branches 5674 5679 +5
==========================================
+ Hits 27845 27863 +18
+ Misses 1778 1777 -1
+ Partials 1387 1386 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Whoops - now this should be ready to write tests for. |
|
I made a PR to your PR, @petrelharp - the only thing I can see that might be a problem is that we might want to include empty sites when unioning one TS with another. If so, we could either:
I guess (2) gives the most flexibility for the user? |
|
Let's keep the convo here, and lmk if you don't have permissions to push here. That all looks good; I think we just need some additional checks for correctness of the union (a lot of the checks are just that the number of things is right). I guess I don't think we should have a separate It's tempting to have a |
|
There; I switched the behavior to add all sites for reals if |
|
Great! I'll add more checks on actual correctness. Thanks @petrelharp ! |
I changed to assert that entire tables were equal rather than checking lengths, and added some tests of unioning empty tables to things.
|
1bdd65d to
5eaab68
Compare
|
Also possibly todo: should we warn in the python function if the mutation/edge/population/site/node/individual table metadata schemas differ between the two tree sequences? |
|
Aha - I figured out what was going wrong in the tests. I was comparing the wrong "individuals". When we call |
No, that's not something we can (reasonably) cover. It probably just errors if it's out of memory. |
Sorry, I'm not following? This doesn't ever claim to add nodes or individuals? |
|
Tangentially, I think the docstring for I don't think mentioning I'm not sure about "the equivalent input tree sequence". Why "equivalent"? Shouldn't this just be "in the 'other' tree sequnce"? |
|
I think the tests are pretty good. Brainstorming corner cases:
So I guess I think this is maybe ready to go, with possibly some of the caveats above mentioned in the docstring. Are there other corner cases? |
|
Is this at a point where I should review? |
I agree! I'm happy to do that.
Yes, that's right. I mentioned union so that people could look up the docs for how that works, in case the documentation for
|
When you specify "new nodes" in the union (i.e. the node_mapping is -1), then those nodes in However, if It probably just needs documenting? |
If you have time @benjeffery , I think that could be helpful, yes. Meanwhile I'll update the docstrings. |
83d746d to
6faf4b4
Compare
|
Shall I squash all these commits down? |
|
We should squash at some point, let's wait for @benjeffery 's review. |
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a quick pass - haven't gone into algorithmic detail yet.
python/tskit/tables.py
Outdated
| self, | ||
| other, | ||
| node_mapping, | ||
| all_edges=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing these at the start of kwargs has broken compatibility with old code that might have been using them positionally as we had no *.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. Easy enough to put a * in now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving them to the end, but if it's not too rude then we should add a *.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, add a star, but just before your new kwargs, to not break existing positional usage.
|
Okay - perhaps someone (@hyanwong?) should have a careful read of the docstrings and then this is good? The failing coverage is because the diff is not many lines and there's a "we can't cover this" line in that diff, so the percent covered on the diff is small. |
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few doc nits.
|
Thanks, @benjeffery! Shall we merge this, @hyanwong? (Assuming the tests pass.) |
Add some python tests And fix concatenate() all_mutations implies really all_sites
1c80bba to
ddc1dba
Compare
|
Yes, please do. Sorry about the slow reply: both deep in teaching and also down with influenza, but what we have now looks good to me. I can always tweak the docstrings later if we figure out better ways of saying things. So... merge away, I reckon. |

Here's the start at what we discussed in #3183. Might you have a go at putting in the python tests, @hyanwong?
Something I haven't done that we said we might in the other PR is require that if
all_edgesorall_mutationsare True thencheck_shared_overlapis False. I don't think we actually need to require this (since in some cases all three might be true and it's fine!) but we should probably say in the docstring that the user probably wants this to be the case. (I just don't remember why that is right now.)Still TODO:
otherhas no edges that we can use this to add mutations to a subset of nodes in an existingtstsdown to two disjoint spans and then union these together we get backts