-
Notifications
You must be signed in to change notification settings - Fork 78
Add a sort to ts.subset() #2479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
95eaaf3 to
c6b97e0
Compare
Codecov Report
@@ Coverage Diff @@
## main #2479 +/- ##
=======================================
Coverage 93.43% 93.43%
=======================================
Files 28 28
Lines 27400 27401 +1
Branches 1255 1255
=======================================
+ Hits 25600 25601 +1
Misses 1766 1766
Partials 34 34
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
c6b97e0 to
0006e4a
Compare
petrelharp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; one minor change to testing.
| # We don't support simplify with migrations, so should fail. | ||
| with pytest.raises(_tskit.LibraryError): | ||
| ts.simplify() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps these should go here? And could call verify_subset there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I could do, but I thought test_tables.py was specifically for testing TableCollection functions. The new functionality only applies to the ts.subset function (should it? Shouldn't the TableCollection function mirror the TreeSequence one exactly?). That was my rationale for dumping the tests in test_highlevel.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good point. Um, I guess I think we should call sort in TableCollection.subset, instead of just here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would probably be my preference.
There may be other functions which have a table and a tree sequence version, but where extra processing is required to make a valid tree sequence. It could be worth establishing a precedence for functions like this, if there is a use-case for not doing the extra processing in the tables version. We could have a parameter make_valid_ts that any such table function has, which is always set to True if called via the ts version.
It may not be that this isn't required for this particular function, so we could punt any decision down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, and I don't think we have a use case for doing subset followed by other processing steps that don't require sorted tables before sorting (if we did, then it could be useful for efficiency to keep it out). So, let's not add the extra parameter.
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Were you ready to merge this @hyanwong and @petrelharp? Seems like the sort was intended to back into the tables version? |
|
Sorry, I misread the discussion as being about an additional change. |
|
It was meant to go into the tables version. I'll follow up with a minor PR to fix on Monday. |
Fixes #2473. Docs adjusted to imply that migrations may be reordered (as a side effect of
sort()). Simple test case used to check topological identity after reordering nodes (which would be too time consuming for large tree (sequences))PR Checklist: