-
Notifications
You must be signed in to change notification settings - Fork 78
Update KC metric to work on samples not leaves #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update KC metric to work on samples not leaves #610
Conversation
98a4a88 to
0ebbe68
Compare
Codecov Report
@@ Coverage Diff @@
## master #610 +/- ##
==========================================
- Coverage 87.31% 87.30% -0.02%
==========================================
Files 22 22
Lines 16743 16733 -10
Branches 3274 3274
==========================================
- Hits 14619 14608 -11
Misses 1040 1040
- Partials 1084 1085 +1
Continue to review full report at Codecov.
|
0ebbe68 to
6b78e72
Compare
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @daniel-goldstein! Minor comments above.
66c88b8 to
661aa03
Compare
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @daniel-goldstein!
661aa03 to
e8e3bc5
Compare
KC distance previously assumed that samples == leaves. However, the metric should measure the similarities of trees based on the relationships between the samples. This changes the metric to exclusively consider samples, not leaves. This means that internal samples no longer throw an exception and subtrees without any samples in them do not contribute to the KC distance.
It also makes more sense to focus on solely samples because the set of leaves is not consistent across trees in a tree sequence. Internal nodes that are roots in some trees and not in others affect how many leaves there are per tree. The sample set, however, is consistent across the sequence.
Resolves #575 #598