Allow newicks to be output without branch lengths #931

hyanwong · 2020-10-24T14:45:46Z

As discussed on slack. This only changes the Python newick generator. I also took the opportunity to make all the parameters keyword-only (which is a breaking change), but it's really not obvious to me what tree.newick(8) means - I think we should force tree.newick(precision=8).

Incidentally, I was wondering if __build_newick should be _build_newick (single underscore) for consistency. But perhaps the double underscore was deliberate?

AdminBot-tskit · 2020-10-24T14:48:12Z

📖 Docs for this PR can be previewed here

jeromekelleher

Few changes needed. The double underscore was deliberate I think, as a very-private indeed embarrassing recursive function.

python/tests/test_highlevel.py

python/tskit/trees.py

hyanwong · 2020-10-25T18:39:19Z

I've addressed all the changes now.

jeromekelleher

Probably better not to do the node_labels workaround, actually, so we don't have to test that the labels are the same in various corner cases.

python/tskit/trees.py

jeromekelleher · 2020-10-25T19:16:47Z

python/tskit/trees.py

            root = self.root
+        if not include_branch_lengths and node_labels is None:
+            # Force the python generator for simplicity, by specifying the default labels
+            node_labels = {i: str(i + 1) for i in self.leaves()}


Hmm, leaves() or samples()? I think it should be samples(). Leaves is tricky, as we can have non-sample leaves.

What does the C engine do? We should add a test for this, since we are doing this workaround.

Maybe it would be better to not do this, and to change the condition on the line below instead to

if node_labels is None and include_branch_lengths:

Then it's completely explicit and we don't need any comments explaining what we're doing.

We claim it does leaves only (regardless of whether they are samples), and that seems sensible to me. Some newick readers only bother with names for tips anyway. In particular, the docs currently say:

By default, leaf nodes are labelled with their numerical ID + 1, and internal nodes are not labelled.

Then it's completely explicit and we don't need any comments explaining what we're doing.

We would still need to set the node labels to the leaves only when node_labels == None for the python Newick generating code. But happy to move the logic around.

Ah right, what you have sounds good then. We should add a test to verify that we get the same labels with and without include_branch_lengths in a corner case when we have non-sample leaves, though.

hyanwong · 2020-10-27T22:05:57Z

This is ready for final review, @jeromekelleher. Note that to use the leaves != samples examples (specifically get_internal_samples_examples()) in test_highlevel.py, I've moved a load of tree-sequence-creating routines (e.g. get_example_tree_sequences) from test_highlevel.py to tsutil.py. I think that's probably a better place for them, as then they can all be used in different unit test files.

jeromekelleher

Looks good, but let's not do the test refactoring.

jeromekelleher · 2020-10-28T10:36:46Z

python/tests/test_phylo_formats.py


-    def verify_newick_topology(self, tree, root=None, node_labels=None):
+    @staticmethod
+    def length_parser(x):


I don't think we need a static method for this, do we? A static method implies this function is useful outside the immediate test context, whereas this is really just for newick parsing.

Seems like a good use for a lambda function:

# By default the newick lib outputs length 0.0 if no branch lengths are present newick_tree = newick.loads(ns, length_parser=lambda x: None if x is None else float(x))[0]

Happy to use lambda

jeromekelleher · 2020-10-28T10:44:58Z

python/tests/test_highlevel.py

-    yield tables.tree_sequence()
-
-
-def get_internal_samples_examples():


I don't think we want to promote this code up to utils.py. The pattern of yielding a bunch of different examples isn't a great one as it leads to long-running tests where it's hard to distinguish which test has failed. I would prefer to leave this module as-is, and continue to factor tests out of it and into more appropriate places, rather than have this deprecated testing pattern diffuse out into other modules.

Creating a ts with a non-sample leaf is pretty easy:

tables = msprime.simulate(n, random_seed=1).dump_tables() root = len(tables.nodes) - 1 leaf = tables.nodes.add_row(flags=0, time=0) tables.edges.add_row(0, 1, root, leaf) tables.sort() ts = tables.tree_sequence()

Ah, OK. I do think we want a nice range of trees & tree sequences that we call upon from different test functions, rather than writing them all out again (e.g. the example above doesn't test internal samples, which would have been another good check). But this particular refactoring seems not to be the right way to go about it.

jeromekelleher · 2020-11-02T10:51:23Z

Can you bring this up to date please @hyanwong? A few small changes and we're good to merge I think.

hyanwong · 2020-11-02T13:29:59Z

Can you bring this up to date please @hyanwong? A few small changes and we're good to merge I think.

In the latest push, I've split the get_internal_samples_examples() code into 3 separate functions (all_nodes_samples_example only_internal_samples_example mixed_node_samples_example)and put them at the start of the class, but this does lead to quite a lot of code duplication. Suggestions for improving this are welcome - it feels a bit wrong to me.

benjeffery · 2020-11-02T15:17:20Z

The pytest way to do this is a parameterised fixture:

@pytest.fixture(scope="module", params=[(None, None), (5,None), (5//2, n+5//2)], ids=['All', 'Internal', 'Mixed'])
def ts_with_varying_sample_nodes(sample_node_slice):
    ts = msprime.simulate(n, random_seed=10, mutation_rate=5)
    assert ts.num_mutations > 0
    tables = ts.dump_tables()
    nodes = tables.nodes
    flags = nodes.flags
    # Set a mixture of internal and leaf samples.
    flags[:] = 0
    flags[slice(sample_node_slice)] = tskit.NODE_IS_SAMPLE
    nodes.flags = flags
    return tables.tree_sequence()

Then your tests use the fixture:

def test_samples_differ_from_leaves(self, ts_with_varying_sample_nodes):
        for t in ts.trees():
            self.verify_newick_topology(t)

hyanwong · 2020-11-02T16:56:30Z

Great, thanks @benjeffery - should we do this for the bevvy of tests in test_highlevel.py - or at least make the various test sequences available for all files in tests? Presumably that way we can stick them in tsutils.py and grab them from there in the various test_XXX files?

benjeffery · 2020-11-02T16:58:08Z

Suite-wide fixtures are in conftest.py and where possible should be scope="session" to avoid running msprime a million times.

hyanwong force-pushed the no-newick-branch-lengths branch from 47be2a9 to dbf4ee5 Compare October 24, 2020 14:52

jeromekelleher requested changes Oct 24, 2020

View reviewed changes

python/tests/test_highlevel.py Outdated Show resolved Hide resolved

python/tskit/trees.py Outdated Show resolved Hide resolved

python/tskit/trees.py Outdated Show resolved Hide resolved

python/tskit/trees.py Outdated Show resolved Hide resolved

hyanwong force-pushed the no-newick-branch-lengths branch from dbf4ee5 to d30e188 Compare October 24, 2020 22:27

jeromekelleher requested changes Oct 25, 2020

View reviewed changes

hyanwong force-pushed the no-newick-branch-lengths branch 3 times, most recently from 62d9018 to 3903936 Compare October 27, 2020 21:26

hyanwong mentioned this pull request Oct 28, 2020

Test various functions with "empty" trees / tree sequences #943

Closed

jeromekelleher reviewed Oct 28, 2020

View reviewed changes

hyanwong force-pushed the no-newick-branch-lengths branch from 3903936 to 09c4dfc Compare November 2, 2020 13:27

jeromekelleher approved these changes Nov 2, 2020

View reviewed changes

jeromekelleher added the AUTOMERGE-REQUESTED label Nov 2, 2020

Allow newicks to be output without branch lengths

253f26e

AdminBot-tskit force-pushed the no-newick-branch-lengths branch from 09c4dfc to 253f26e Compare November 2, 2020 17:28

mergify bot merged commit d373b79 into tskit-dev:main Nov 2, 2020

mergify bot removed the AUTOMERGE-REQUESTED label Nov 2, 2020

hyanwong deleted the no-newick-branch-lengths branch November 4, 2020 14:46

This was referenced Nov 24, 2020

Add Individual flag tskit-dev/tsinfer#358

Merged

Output Newick without branch lengths #914

Closed

		yield tables.tree_sequence()


		def get_internal_samples_examples():

Allow newicks to be output without branch lengths #931

Allow newicks to be output without branch lengths #931

Uh oh!

Conversation

hyanwong commented Oct 24, 2020

Uh oh!

AdminBot-tskit commented Oct 24, 2020

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hyanwong commented Oct 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyanwong commented Oct 27, 2020

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeromekelleher commented Nov 2, 2020

Uh oh!

hyanwong commented Nov 2, 2020

Uh oh!

benjeffery commented Nov 2, 2020

Uh oh!

hyanwong commented Nov 2, 2020

Uh oh!

benjeffery commented Nov 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hyanwong commented Oct 25, 2020 •

edited

Loading