Skip to content

Conversation

@hyanwong
Copy link
Member

@hyanwong hyanwong commented Aug 26, 2022

Fixes #2473. Docs adjusted to imply that migrations may be reordered (as a side effect of sort()). Simple test case used to check topological identity after reordering nodes (which would be too time consuming for large tree (sequences))

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@hyanwong hyanwong force-pushed the fix-subset-sort branch 2 times, most recently from 95eaaf3 to c6b97e0 Compare August 26, 2022 17:44
@codecov
Copy link

codecov bot commented Aug 26, 2022

Codecov Report

Merging #2479 (0006e4a) into main (9f14b36) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2479   +/-   ##
=======================================
  Coverage   93.43%   93.43%           
=======================================
  Files          28       28           
  Lines       27400    27401    +1     
  Branches     1255     1255           
=======================================
+ Hits        25600    25601    +1     
  Misses       1766     1766           
  Partials       34       34           
Flag Coverage Δ
c-tests 92.24% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 71.17% <100.00%> (+<0.01%) ⬆️
python-tests 98.95% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/trees.py 98.72% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f14b36...0006e4a. Read the comment docs.

Copy link
Contributor

@petrelharp petrelharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; one minor change to testing.

# We don't support simplify with migrations, so should fail.
with pytest.raises(_tskit.LibraryError):
ts.simplify()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps these should go here? And could call verify_subset there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I could do, but I thought test_tables.py was specifically for testing TableCollection functions. The new functionality only applies to the ts.subset function (should it? Shouldn't the TableCollection function mirror the TreeSequence one exactly?). That was my rationale for dumping the tests in test_highlevel.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point. Um, I guess I think we should call sort in TableCollection.subset, instead of just here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would probably be my preference.

There may be other functions which have a table and a tree sequence version, but where extra processing is required to make a valid tree sequence. It could be worth establishing a precedence for functions like this, if there is a use-case for not doing the extra processing in the tables version. We could have a parameter make_valid_ts that any such table function has, which is always set to True if called via the ts version.

It may not be that this isn't required for this particular function, so we could punt any decision down the line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, and I don't think we have a use case for doing subset followed by other processing steps that don't require sorted tables before sorting (if we did, then it could be useful for efficiency to keep it out). So, let's not add the extra parameter.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify mergify bot merged commit 4288e31 into tskit-dev:main Aug 30, 2022
@jeromekelleher
Copy link
Member

Were you ready to merge this @hyanwong and @petrelharp? Seems like the sort was intended to back into the tables version?

@benjeffery
Copy link
Member

Sorry, I misread the discussion as being about an additional change.

@hyanwong
Copy link
Member Author

hyanwong commented Sep 1, 2022

It was meant to go into the tables version. I'll follow up with a minor PR to fix on Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ts.subset() does not reorder edges as required for a valid tree sequence

4 participants