Skip to content

Conversation

@daniel-goldstein
Copy link
Member

@daniel-goldstein daniel-goldstein commented May 12, 2020

KC distance previously assumed that samples == leaves. However, the metric should measure the similarities of trees based on the relationships between the samples. This changes the metric to exclusively consider samples, not leaves. This means that internal samples no longer throw an exception and subtrees without any samples in them do not contribute to the KC distance.

It also makes more sense to focus on solely samples because the set of leaves is not consistent across trees in a tree sequence. Internal nodes that are roots in some trees and not in others affect how many leaves there are per tree. The sample set, however, is consistent across the sequence.

Resolves #575 #598

@daniel-goldstein daniel-goldstein force-pushed the kc-internal-samples branch 2 times, most recently from 98a4a88 to 0ebbe68 Compare May 12, 2020 20:29
@codecov
Copy link

codecov bot commented May 12, 2020

Codecov Report

Merging #610 into master will decrease coverage by 0.01%.
The diff coverage is 96.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #610      +/-   ##
==========================================
- Coverage   87.31%   87.30%   -0.02%     
==========================================
  Files          22       22              
  Lines       16743    16733      -10     
  Branches     3274     3274              
==========================================
- Hits        14619    14608      -11     
  Misses       1040     1040              
- Partials     1084     1085       +1     
Flag Coverage Δ
#c_tests 88.51% <96.55%> (-0.02%) ⬇️
#python_c_tests 90.46% <ø> (ø)
#python_tests 99.02% <ø> (ø)
Impacted Files Coverage Δ
c/tskit/core.c 90.60% <ø> (-0.07%) ⬇️
python/tskit/trees.py 98.18% <ø> (ø)
c/tskit/trees.c 90.63% <96.55%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68400d8...e8e3bc5. Read the comment docs.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @daniel-goldstein! Minor comments above.

@daniel-goldstein daniel-goldstein force-pushed the kc-internal-samples branch 2 times, most recently from 66c88b8 to 661aa03 Compare May 13, 2020 16:42
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @daniel-goldstein!

@benjeffery benjeffery force-pushed the kc-internal-samples branch from 661aa03 to e8e3bc5 Compare May 14, 2020 08:47
@mergify mergify bot merged commit bbb020f into tskit-dev:master May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable internal samples for KC distance

2 participants