Skip to content

Conversation

@hyanwong
Copy link
Member

Description

Add Tree.create_star, Tree.create_balanced, and Tree.create_empty methods, as discussed on Slack.

The Tree.create_empty method is perhaps unlikely to be widely used, but I think is useful for completeness, and will also come in handy for testing (for example, I only just noticed that we currently bomb out with a non-obvious message when trying to plot an empty TS).

Happy to make these class methods if that is preferred.

@hyanwong hyanwong marked this pull request as draft October 25, 2020 15:09
@hyanwong
Copy link
Member Author

Leaving as a draft until #930 is merged, so I can uncomment a test for span in the unit tests.

@AdminBot-tskit
Copy link
Collaborator

📖 Docs for this PR can be previewed here

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @hyanwong. These will be really handy, I think. The main feedback is we shouldn't have special cases, and try to make simple rules about the number of nodes and edges returned hold for all n >= 0.

I agree it's nice to have all these have a common prefix. I'm not so sure about "create_" though; perhaps "make_"?

I'm not sure how useful the empty one is to be honest, it's already a one-liner, and I don't think the equivalances you're stating will hold once the special cases at 1 and 0 have been removed.

"""

def test_star_equivalent(self):
for extra_params in [{}]: # , {'span':2.5}]: # Add after #930 is merged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one extra param, so loop seems unnecessary and makes the code less readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just uncommented the bit with "span" in, so it will test with with the default, and with a span now.

flags=np.full(n, NODE_IS_SAMPLE, dtype=np.uint32),
time=np.zeros(n),
)
if n > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make this exception. We should always return a tree with num_leaves + 1 nodes and num_leaves edges. (which holds true for all num_leaves >= 0, right?)

Copy link
Member Author

@hyanwong hyanwong Oct 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to match unrank, @jeromekelleher . For num_leaves=1 that produces a TS with a single node and no edges. I think this is the right behaviour. We don't really want a unary node as the root, do we - it would be removed by simplify() if we were to run that on the TS anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Maybe we should just raise a ValueError if n < 2 for now, and avoid this discussion. It's not particularly interesting what the semantics are in these degenerate cases and we can always fix them later if it does become important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, could do, I suppose.

@staticmethod
def create_star(n, *, span=1, branch_length=1, record_provenance=True):
"""
Create a single :class:<Tree> and associated tree sequence where the ``n``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "and associated tree sequence" - see the note below.

)
if n & (n - 1) != 0:
raise ValueError("n must be an integer power of 2")
if n > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't like the special case. We shouldn't need it here, right?

raise ValueError("n must be an integer power of 2")
if n > 1:
merge_nodes = list(range(n))
while len(merge_nodes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(merge_nodes) > 0

@hyanwong
Copy link
Member Author

I agree it's nice to have all these have a common prefix. I'm not so sure about "create_" though; perhaps "make_"?

Or generate? Happy with any of those.

I'm not sure how useful the empty one is to be honest, it's already a one-liner, and I don't think the equivalances you're stating will hold once the special cases at 1 and 0 have been removed.

I think it's useful so that we can test e.g. tskit.Tree.generate_empty().draw_text() and so on, which we have currently failed to do. It's also useful to be able to compare things to an empty TS, unless there is already some function to check if a TS is empty (didn't check, sorry!)

@jeromekelleher
Copy link
Member

It's also useful to be able to compare things to an empty TS, unless there is already some function to check if a TS is empty (didn't check, sorry!)

Generating an empty ts is pretty trivial: tskit.TableCollection(1).tree_sequence() - but I don't feel particularly strongly about having an x_empty function or not. I'm not sure "empty" is the right word anyway (it's implying there's something in the tree), it's more like a "null" tree. I'm not sure it's worth the discussion of the semantics to include it.

@hyanwong
Copy link
Member Author

hyanwong commented Oct 26, 2020

Generating an empty ts is pretty trivial: tskit.TableCollection(1).tree_sequence() - but I don't feel particularly strongly about having an x_empty function or not.

That's not an obvious calling convention to people unfamiliar with tables. E.g. I might hope to get an empty ("null") TS by simply calling tskit.TreeSequence() with no params (and I did try that, when messing around). I only added it so that I could be explicit. But maybe it should be possible to create null objects by doing tskit.TreeSequence() and tskit.Tree()?

I'm not sure "empty" is the right word anyway (it's implying there's something in the tree), it's more like a "null" tree. I'm not sure it's worth the discussion of the semantics to include it.

It never struck me as lacking before. But I do think we have somewhat overlooked at the behaviour of various tskit functions for "null" tree sequences & trees, so this was a first step in doing something about that.

@jeromekelleher
Copy link
Member

But maybe it should be possible to create null objects by doing tskit.TreeSequence() and tskit.Tree()

The semantics around object initialisation are extremely tricky because of the interaction with the C API and Python C API. I'd rather avoid this can of worms for now.

@hyanwong
Copy link
Member Author

So it's not obvious to me how to create semi-balanced trees that match unrank(N - 1) without doing some sort of recursion. Given that there's disagreement over these "semi-balanced" trees anyway, I've simply removed the generate_balanced code. I'll put it in a different PR so we can discuss it there. I think this PR, that now just has generate_star, is therefore ready to merge @jeromekelleher

@hyanwong hyanwong force-pushed the example-trees branch 2 times, most recently from 901139a to b1914e0 Compare October 27, 2020 18:05
@hyanwong hyanwong marked this pull request as ready for review October 27, 2020 18:09
@hyanwong hyanwong mentioned this pull request Oct 27, 2020
@hyanwong hyanwong changed the title Add a Tree.create_star method Add a Tree.generate_star method Oct 27, 2020
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @hyanwong. Just need to fix up the changelog and we're good to merge.

@mergify mergify bot merged commit 9248fb3 into tskit-dev:main Oct 28, 2020
@hyanwong hyanwong deleted the example-trees branch November 4, 2020 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants