Skip to content

Conversation

@jeremyguez
Copy link
Contributor

Here is a PR for adding B2 index.
I did not set 10 as a default for the base parameter (although Shao and Sokal (1990) used 10), in case people would assume another base without checking the default (e.g., Bienvenu et al. (2020) advises to use a base 2 for binary trees). I did all the tests with base=10, assuming tests that pass with one base would pass with any base.
What do you think @jeromekelleher?

@jeremyguez
Copy link
Contributor Author

jeremyguez commented Jun 8, 2022

All tests passed using pytest on local, I wonder why the checks here raise this errors:

AttributeError: module 'math' has no attribute 'prod'

Edit: I changed to numpy for the prod.

@codecov
Copy link

codecov bot commented Jun 8, 2022

Codecov Report

Merging #2327 (d978b6a) into main (bf1603d) will decrease coverage by 11.93%.
The diff coverage is 7.69%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2327       +/-   ##
===========================================
- Coverage   93.22%   81.29%   -11.94%     
===========================================
  Files          28       28               
  Lines       26718    26573      -145     
  Branches     1204     1207        +3     
===========================================
- Hits        24908    21602     -3306     
- Misses       1776     4901     +3125     
- Partials       34       70       +36     
Flag Coverage Δ
c-tests 92.24% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 71.43% <7.69%> (-0.06%) ⬇️
python-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/trees.py 46.43% <7.69%> (-51.68%) ⬇️
python/tskit/cli.py 0.00% <0.00%> (-95.94%) ⬇️
python/tskit/vcf.py 7.75% <0.00%> (-90.52%) ⬇️
python/tskit/text_formats.py 10.25% <0.00%> (-89.75%) ⬇️
python/tskit/drawing.py 10.87% <0.00%> (-88.30%) ⬇️
python/tskit/combinatorics.py 13.70% <0.00%> (-85.66%) ⬇️
python/tskit/stats.py 35.13% <0.00%> (-64.87%) ⬇️
python/tskit/genotypes.py 37.50% <0.00%> (-60.97%) ⬇️
python/tskit/util.py 59.56% <0.00%> (-40.44%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf1603d...d978b6a. Read the comment docs.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I don't know what the best thing to do about the base is - I feel like we should put in a sensible default if the base doesn't matter all that much (which it probably doesn't?).

Is there some discussion somewhere about what the base means somewhere and what the consequences of choosing a given base are?

# we can remove this.


def nodes_to_root(tree, u):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

path = []
u = tree.parent(u)
while u != tskit.NULL:
      paty.append(u)
      u = tree.parent(u)
return path

(This might be a useful library function in fact. Something like,

def path(self, u, v=None):
     """
     Returns the path between two nodes u and v, such that ``len(tree.path(u, v)) == tree.path_length(u, v)``.
     If the second node ``v`` is not specified, return the path from u to root.
     """

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just update the version of nodes_to_root here to use the implementation I suggested and put #2350 on the back-burner for now. Searching tree.roots on every loop iteration is unnecessarily inefficient when we can just check what the parent is in O(1) time.

)


def b2_index_definition(tree, base):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put in base=10 by default here.

total += 1 / max_path_length[u]
return total

def b2_index(self, base):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's default base=10, and put in a paragraph justifying this decision in the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I did that.

@jeromekelleher
Copy link
Member

@jeremyguez we're nearly ready to ship the 0.5.0 release - do you want to make a final push to get this in on time? I've done some final tidy-ups for the other metrics in #2346, and it would be nice to have all four in for the release.

I think the only decision we need to make here is whether we want a default base, and 10 seems OK to me?

@jeremyguez
Copy link
Contributor Author

jeremyguez commented Jun 17, 2022

@jeromekelleher yes, I am going to add the path function needed for B2 first (PR almost done) and then push B2 once the path function is there. I'm finishing both today (or by the end of the weekend). Does this timeline fit the schedule?

@jeromekelleher
Copy link
Member

I'm finishing both today (or by the end of the weekend). Does this timeline fit the schedule?

Perfect! It'll be nice to get this all wrapped up for the release. Ping me when you'd like some input.

@jeremyguez jeremyguez force-pushed the b2_index branch 2 times, most recently from f399767 to 34d20d0 Compare June 19, 2022 22:18
@jeremyguez
Copy link
Contributor Author

cc @jeromekelleher
So I updated here the PR, and I will modify it for using the library path function when #2350 is pushed.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just needs a couple of small changes and we're good to merge.

# we can remove this.


def nodes_to_root(tree, u):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just update the version of nodes_to_root here to use the implementation I suggested and put #2350 on the back-burner for now. Searching tree.roots on every loop iteration is unnecessarily inefficient when we can just check what the parent is in O(1) time.

@jeromekelleher
Copy link
Member

Looks great, thanks @jeremyguez! Looks like you merged rather than rebased through - can you rebase and squash please?

@jeremyguez
Copy link
Contributor Author

Looks great, thanks @jeremyguez! Looks like you merged rather than rebased through - can you rebase and squash please?

I tried to rebase, but too many conflicts to resolve manually, I don't know why...
So I made a new branch with the last version from this branch. The PR is in #2353.
Sorry for the inconvenience!

@jeromekelleher
Copy link
Member

Closing in favour of #2353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants