-
Notifications
You must be signed in to change notification settings - Fork 79
Preliminary PR for random tree generation #1037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
📖 Docs for this PR can be previewed here |
|
I've made some updates here @hyanwong, mainly just renamed the method to It's mostly working, except for a test failure where the internal node labellings of the unranked tree disagrees with the generated one. I don't follow why this is, so I'd like to get to the bottom of it - this isn't super urgent anyway, it's just a nice-to-have at some point. The disagreement is here: the first tree is the one we generate, and the second one is the result of unranking the rank of the first one: Both of these are assigning internal node labels via postorder, but the unrank one has flipped the order of children among parents for some reason. The first tree is clearly "right", so I'd like to understand what the unranked tree is doing here. @daniel-goldstein, and chance you could take a look here please? |
|
So I believe this has to do with a conflict in the canonical orientation between the ranking code and code to generate random binary trees. If you don't use the the minlex ordering when printing the unranked tree you should get the tree printed in a different layout where the assignment of the internal nodes looks "right". It is unfortunate that there isn't a single canonical orientation, but at least assigning only occurs when producing a |
I liked your previous idea of specifying |
Just doesn't seem worth the effort - there's a few hours work involved, and |
|
Thanks @daniel-goldstein, I'll take a look at this later. |
7b48f3a to
3cd7b09
Compare
|
Aha, thanks @daniel-goldstein! I think I've fixed the issue now by canonicalising the order of the children within a node by sorting them by I've marked this as ready for review, but we should keep it back until after 0.3.3 is released; we can update the changelog and merge then. |
Codecov Report
@@ Coverage Diff @@
## main #1037 +/- ##
=======================================
Coverage 93.68% 93.69%
=======================================
Files 26 26
Lines 21005 21030 +25
Branches 880 886 +6
=======================================
+ Hits 19679 19704 +25
Misses 1289 1289
Partials 37 37
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
d5d6d63 to
890bd77
Compare
|
I think this is ready to go - @hyanwong, can you take a final look please? |
hyanwong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One typo, and a possible addition to tests (but this may not be necessary, and could be done later anyway)
| ranks[random_tree.rank()] += 1 | ||
| # There are N possible binary trees here, we should have seen them | ||
| # all with high probability after 20 N attempts. | ||
| assert len(ranks) == N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reasonable test, but I'm not sure it's very sensitive to the uniformity of the distribution of topologies. It may be that we don't care about testing that too much, but lack of testing this has caught out other phylogenetic libraries (e.g. R & Dendropy) so it would, I suppose, be good practice to have something (I think I had something like that in the previous tests, but it can turn out to be rather slow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very difficult to do this sort of test rigorously, robustly and quickly enough to be a unit test. We'd need some sort of validation suite for this. Please open an issue to track if you think it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. Probably only worth it if we require a validation suite for other reasons, I suppose. I can't think of any off hand, though.
|
NB: Name-wise, I sort-of prefer the conciseness of |
I started with this, and it turned out to be quite tedious in practise having to type |
Sorry, I meant, with |
|
No, I disagree with that. I wouldn't expect a general |
Ah, I see, you mean if arity was none you would expect to pick from all trees, of any arity, with equal probability. Yes, I can see that as a more sensible default if arity is not specified (I assumed you would require an integer arity all the time, even for a default value). So don't mind me. |
890bd77 to
65f9c88
Compare
Description
A draft PR to create random trees. No changelog yet, sorry!
Fixes #1033
PR Checklist: