-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass a numpy array of booleans to python simplify? #2968
Comments
It's a bunch of work to do, and not that much different? Seems to me that you'd have to explain more by adding the boolean selector for nodes as well as the list of nodes option. |
It's only a 5 line addition at the top of try:
if len(samples) == ts.num_nodes and samples.dtype == bool:
samples = np.where(samples)[0]
except AttributeError:
pass You're right that it's not that different, and it's not really a priority, but I'm finding that every extra barrier to new tskit users puts some of them off (and takes an extra few minutes to explain). There's a reason why numpy allows both boolean and numerical indexing. I'm not sure I would actually explain in a practical that you can use both: it's obvious from the context, right? |
As usual with these things, implementation is by far the easiest thing and something like 5% of the actual effort. Testing, documenting and making sure there are no regressions against existing code make the majority of the work. |
I agree. Higher on my list would be adding an |
I guess part of the problem is nothing to do with
but that's hardly more intuitive, IMO. Or alternatively
Which is equally cryptic, but at least doesn't need you to use the Anyway, if no-one thinks it's a good idea, I'll close this. However, it's worth pointing out that I'm finding it hard to introduce |
Another option is
|
FWIW, I can't think of a single function in R that overloads arguments like that - i.e., takes either a vector of indices or a vector of booleans. Indexing works like that, but only indexing. Overloading can lead to nasty corner cases. |
For teaching purposes, it's nice to be able to do
rather than
What do people think of the idea of special-casing the
samples
argument tosimplify()
to check if it is of lengthnum_nodes
and of dtype=bool, and then simply doing a np.where(samples)[0]? At the moment this fails with aDuplicate sample value
error, as it treats the 0s and 1s as sample IDs.The text was updated successfully, but these errors were encountered: