-
Notifications
You must be signed in to change notification settings - Fork 78
More flexible definition of a tree root #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More flexible definition of a tree root #462
Conversation
Codecov Report
@@ Coverage Diff @@
## master #462 +/- ##
==========================================
- Coverage 87.32% 87.28% -0.05%
==========================================
Files 21 21
Lines 15423 15452 +29
Branches 2998 3004 +6
==========================================
+ Hits 13468 13487 +19
- Misses 969 974 +5
- Partials 986 991 +5
Continue to review full report at Codecov.
|
|
It makes sense to be able to get roots without also getting the missing data. Here's my suggestion for the API. I think the way to deal with this is to add an argument, Why default to |
|
Good call on the general missing data API @petrelharp, that's an excellent idea. I'll open an issue to track the idea once this PR has been sorted out. |
|
I could use some input here please @petrelharp. As the new root tracking code requires sample counts, there doesn't seem like much point in maintaining the old TSK_SAMPLE_COUNTS option for trees. So, I added a new flag TSK_NO_SAMPLE_COUNTS which means that root tracking (and sample counts) is on by default, but if TSK_NO_SAMPLE_COUNTS is set, then we don't count samples and also don't track roots. So, There's a problem with the Python API though. Currently it's set to that when I doubt there's many (if any) people using
I'm leaning towards (1) I think. The low-level tree options really only matter when you're doing stuff in C, where we want this kind of control. It makes no difference when working with the trees in Python. |
|
I'm on board with (1) as well. It's a bit confusing already figuring out which options to turn on or off to get the info you want with trees, and this would make it simpler, which is good. (I assume we'd deprecate the |
OK, good. Yes, we'd deprecate it (probably throw a warning, I guess). |
5736b64 to
d0a406c
Compare
d0a406c to
dab6d08
Compare
After deprecating SAMPLE_COUNTS.
dab6d08 to
b4b81ea
Compare
|
OK, merging. |
|
@jeromekelleher is there a way in python to set a default |
I don't think we want to set this as a property of the ts, it seems like a tricky bit of state to be adding that could easily catch people out. I didn't add a |
|
OK, thanks. Now I think about it, you are 100% right about adding state to the TS. But an addition to the |
The roots of a tree are currently defined as the set of unique path ends starting from samples. This is unhelpful in some cases with lots of missing data and we want to get the "real" roots, which actually subtend some topology.
This makes the concept of a root more flexible, and changes it to "the set of unique path ends starting from samples, that subtend at least k samples".
Still a WIP, I'll document more when the plumbing is in place.