-
Notifications
You must be signed in to change notification settings - Fork 78
Adding B2 index #2327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding B2 index #2327
Conversation
|
All tests passed using pytest on local, I wonder why the checks here raise this errors: AttributeError: module 'math' has no attribute 'prod'Edit: I changed to numpy for the prod. |
Codecov Report
@@ Coverage Diff @@
## main #2327 +/- ##
===========================================
- Coverage 93.22% 81.29% -11.94%
===========================================
Files 28 28
Lines 26718 26573 -145
Branches 1204 1207 +3
===========================================
- Hits 24908 21602 -3306
- Misses 1776 4901 +3125
- Partials 34 70 +36
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I don't know what the best thing to do about the base is - I feel like we should put in a sensible default if the base doesn't matter all that much (which it probably doesn't?).
Is there some discussion somewhere about what the base means somewhere and what the consequences of choosing a given base are?
python/tests/test_balance_metrics.py
Outdated
| # we can remove this. | ||
|
|
||
|
|
||
| def nodes_to_root(tree, u): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
path = []
u = tree.parent(u)
while u != tskit.NULL:
paty.append(u)
u = tree.parent(u)
return path(This might be a useful library function in fact. Something like,
def path(self, u, v=None):
"""
Returns the path between two nodes u and v, such that ``len(tree.path(u, v)) == tree.path_length(u, v)``.
If the second node ``v`` is not specified, return the path from u to root.
"""There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just update the version of nodes_to_root here to use the implementation I suggested and put #2350 on the back-burner for now. Searching tree.roots on every loop iteration is unnecessarily inefficient when we can just check what the parent is in O(1) time.
python/tests/test_balance_metrics.py
Outdated
| ) | ||
|
|
||
|
|
||
| def b2_index_definition(tree, base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can put in base=10 by default here.
python/tskit/trees.py
Outdated
| total += 1 / max_path_length[u] | ||
| return total | ||
|
|
||
| def b2_index(self, base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's default base=10, and put in a paragraph justifying this decision in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I did that.
|
@jeremyguez we're nearly ready to ship the 0.5.0 release - do you want to make a final push to get this in on time? I've done some final tidy-ups for the other metrics in #2346, and it would be nice to have all four in for the release. I think the only decision we need to make here is whether we want a default base, and 10 seems OK to me? |
|
@jeromekelleher yes, I am going to add the path function needed for B2 first (PR almost done) and then push B2 once the path function is there. I'm finishing both today (or by the end of the weekend). Does this timeline fit the schedule? |
Perfect! It'll be nice to get this all wrapped up for the release. Ping me when you'd like some input. |
f399767 to
34d20d0
Compare
|
cc @jeromekelleher |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just needs a couple of small changes and we're good to merge.
python/tests/test_balance_metrics.py
Outdated
| # we can remove this. | ||
|
|
||
|
|
||
| def nodes_to_root(tree, u): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just update the version of nodes_to_root here to use the implementation I suggested and put #2350 on the back-burner for now. Searching tree.roots on every loop iteration is unnecessarily inefficient when we can just check what the parent is in O(1) time.
|
Looks great, thanks @jeremyguez! Looks like you merged rather than rebased through - can you rebase and squash please? |
I tried to rebase, but too many conflicts to resolve manually, I don't know why... |
|
Closing in favour of #2353 |
Here is a PR for adding B2 index.
I did not set 10 as a default for the base parameter (although Shao and Sokal (1990) used 10), in case people would assume another base without checking the default (e.g., Bienvenu et al. (2020) advises to use a base 2 for binary trees). I did all the tests with base=10, assuming tests that pass with one base would pass with any base.
What do you think @jeromekelleher?