-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partition(similar to numpy.partition) #1498
Add partition(similar to numpy.partition) #1498
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this function should be in ndarray
or in another crate. I don't have the answer so I did a normal review :)
@nilgoyette here's that question again about what belongs in the library - it's pesky! It keeps coming! I think I'm ok with this? I suppose its home could be in a sort of On the other hand, devil's advocate: this function doesn't have an issue from a user, but rather is driven by parity with NumPy. If we decide to keep accepting NumPy functions, our method-discoverability problem is gonna get worse, right? @NewBornRustacean you're welcome to join the discussion on this. In the meantime: I believe that |
@akern40 @nilgoyette what's up! thanks to kind review and suggestions! about I largely agree with you akern40 ! If there were a dedicated crate like Before submitting this PR, I reviewed the maintainer table discussions and past issues, and it seems like this "what belongs in the library" question is closely tied to the future direction of Thinking about why users rely on NumPy, I’d say (at least in my case) the two main reasons are:
For I don’t think |
Wait, I didn't even realize there was a
|
Yeah, I'm facing the issue. When I call the function with Axis(1), for example, each of the lanes could be non contigous(I think this is because of the stride, but I'm not sure). Or we can implement our own select function so don't rely on select_nth_unstable() that require contiguous memory. |
@nilgoyette yes you're right!! I am being silly - you can definitely still be non-contiguous depending on the axis. Sorry everyone. |
Any contiguous array (2D or more) will be non-contiguous on at least one axis. And I can't say which axis because it can use 'c' or 'f' order (please read about those terms if you never heard them before). Imagine a matrix (2D array)
It is contiguous if you read it in its natural order (1 2 3 4 5 6), but it's not if you read it in the other axis (1 4 2 5 3 6). Knowing this, it's a good idea to provide code that works in both cases (non-contiguous version of Btw, I honestly don't have the answer :) Allocating is slow, copying is not slow but we try to avoid it if we can. However, working on a big matrix on a non-contiguous array (can't fit in the cache) is slow too. You might discover that creating a temp array is faster only on huge matrices, but not in general. |
@nilgoyette Sorry for the late update! I was thinking about ways to avoid copying in the non-contiguous case. I was considering whether it might be possible by flattening the input array and then obtaining slices corresponding to each lane, but it didn’t work out as expected. (I referred to NumPy’s internal implementation, and it seems that NumPy also performs a copy in the non-contiguous case.) For now, I’ve only applied the changes you mentioned in the last review! stackoverflow: how can I find the actual implementation of np.partition? |
I decided to merge this PR, leaving the question "should we merge this in ndarray?" open. We will probably have to split ndarray into several smaller crates someday. In the meantime, I think we should avoid driving contributor away. |
1. Overview
This is about the
partition
function to handle both contiguous and non-contiguous memory more efficiently. The function now uses a simpler approach for handling non-contiguous memory, reducing overhead in the fallback path.2. Motivation
The
partition
function is an important operation that partially sorts an array around the k-th element along a specified axis. This is useful for:3. Note
quickselect
algorithm rather than to use existing one.4. We can go further
numpy.partition
, it is possible to pass sequence of ints as akth
parameter. It would be nice to have one in our case, maybe seperated function likepartition_multiple()
numpy.argpartition
. I think I could implement that also as a follow-up.Related issue