Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partition(similar to numpy.partition) #1498

Conversation

NewBornRustacean
Copy link
Contributor

@NewBornRustacean NewBornRustacean commented Mar 27, 2025

1. Overview

This is about the partition function to handle both contiguous and non-contiguous memory more efficiently. The function now uses a simpler approach for handling non-contiguous memory, reducing overhead in the fallback path.

2. Motivation

The partition function is an important operation that partially sorts an array around the k-th element along a specified axis. This is useful for:

  • Efficiently finding the k-th smallest/largest element without fully sorting
  • Dividing data into smaller-than-k and larger-than-k groups (e.g., for quick select algorithms)
  • Implementing median-finding algorithms

3. Note

  • The implementation relies on select_nth_unstable
    • It might be possible to write our own functions to implement quickselect algorithm rather than to use existing one.
    • let me know if that's the case😃
  • The current implementation of partition() creates a temporary array for each lane when memory is non-contiguous, which can be inefficient.

4. We can go further

  • in numpy.partition, it is possible to pass sequence of ints as a kth parameter. It would be nice to have one in our case, maybe seperated function like partition_multiple()
  • there is also interesting and useful function numpy.argpartition. I think I could implement that also as a follow-up.

Related issue

@NewBornRustacean NewBornRustacean marked this pull request as ready for review March 28, 2025 02:31
Copy link
Collaborator

@nilgoyette nilgoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this function should be in ndarray or in another crate. I don't have the answer so I did a normal review :)

@akern40
Copy link
Collaborator

akern40 commented Mar 30, 2025

@nilgoyette here's that question again about what belongs in the library - it's pesky! It keeps coming! I think I'm ok with this? I suppose its home could be in a sort of ndarray-sort crate? Certainly its implementation is complex enough that we shouldn't expect users to write it themselves. And we could always include it in some sort of future, large "crate split".

On the other hand, devil's advocate: this function doesn't have an issue from a user, but rather is driven by parity with NumPy. If we decide to keep accepting NumPy functions, our method-discoverability problem is gonna get worse, right?

@NewBornRustacean you're welcome to join the discussion on this. In the meantime: I believe that to_owned will always return a fully-contiguous array (in fact, I believe that Array must always be contiguous?). If that's true, then the non-contiguous part of your code will never be called. You may be able to get rid of it!

@NewBornRustacean
Copy link
Contributor Author

@nilgoyette here's that question again about what belongs in the library - it's pesky! It keeps coming! I think I'm ok with this? I suppose its home could be in a sort of ndarray-sort crate? Certainly its implementation is complex enough that we shouldn't expect users to write it themselves. And we could always include it in some sort of future, large "crate split".

On the other hand, devil's advocate: this function doesn't have an issue from a user, but rather is driven by parity with NumPy. If we decide to keep accepting NumPy functions, our method-discoverability problem is gonna get worse, right?

@NewBornRustacean you're welcome to join the discussion on this. In the meantime: I believe that to_owned will always return a fully-contiguous array (in fact, I believe that Array must always be contiguous?). If that's true, then the non-contiguous part of your code will never be called. You may be able to get rid of it!

@akern40 @nilgoyette what's up! thanks to kind review and suggestions!

about contiguous part : working on it! removing part to handle non-contiguous. thanks!

I largely agree with you akern40 ! If there were a dedicated crate like ndarray-sort, that would be ideal. Expanding functionality gradually and then splitting it into a separate crate at some point sounds like a good approach.

Before submitting this PR, I reviewed the maintainer table discussions and past issues, and it seems like this "what belongs in the library" question is closely tied to the future direction of ndarray development.

Thinking about why users rely on NumPy, I’d say (at least in my case) the two main reasons are:

  1. Rapid prototyping
  2. Data processing for data analysis

For ndarray users, a major motivation seems to be removing Python from the ML ecosystem. But in that case, most people probably wouldn't start developing directly with ndarray from scratch. Instead, they’re likely trying to migrate something that already exists in a Python-based environment (at least, that was my experience—I'm curious if others feel the same!). If that’s the case, wouldn’t these users be particularly interested in reason 1: "rapid prototyping"?

I don’t think ndarray needs to be fully functionally equivalent to NumPy! However, in a perspective of "rapid prototyping", I do believe that supporting at least some of NumPy’s functionality is essential for attracting those users who are trying to remove Python from their ML workflows.

@nilgoyette
Copy link
Collaborator

Wait, I didn't even realize there was a to_owned just before!

  • Why do we need it? The comment above doesn't explain. In fact, that comment is not related to this line.
  • Even if the array is owned, it will be contiguous only for one axis, no? If you iterate on the "wrong" axis, won't you still need to non-contiguous code?

@NewBornRustacean
Copy link
Contributor Author

Wait, I didn't even realize there was a to_owned just before!

  • Why do we need it? The comment above doesn't explain. In fact, that comment is not related to this line.
  • Even if the array is owned, it will be contiguous only for one axis, no? If you iterate on the "wrong" axis, won't you still need to non-contiguous code?

Yeah, I'm facing the issue. When I call the function with Axis(1), for example, each of the lanes could be non contigous(I think this is because of the stride, but I'm not sure).

Or we can implement our own select function so don't rely on select_nth_unstable() that require contiguous memory.

@akern40
Copy link
Collaborator

akern40 commented Mar 30, 2025

@nilgoyette yes you're right!! I am being silly - you can definitely still be non-contiguous depending on the axis. Sorry everyone.

@nilgoyette
Copy link
Collaborator

When I call the function with Axis(1), for example, each of the lanes could be non contigous(I think this is because of the stride, but I'm not sure).

Any contiguous array (2D or more) will be non-contiguous on at least one axis. And I can't say which axis because it can use 'c' or 'f' order (please read about those terms if you never heard them before). Imagine a matrix (2D array)

 -->
1 2 3
4 5 6

It is contiguous if you read it in its natural order (1 2 3 4 5 6), but it's not if you read it in the other axis (1 4 2 5 3 6).

Knowing this, it's a good idea to provide code that works in both cases (non-contiguous version of select_nth_unstable), or to check and optimize, like you did. Of course, you're free to choose what you want to code (we're doing open source after all). Both versions work. If you're really motivated, you can do both and create a small benchmark to know which one is the best/fastest.

Btw, I honestly don't have the answer :) Allocating is slow, copying is not slow but we try to avoid it if we can. However, working on a big matrix on a non-contiguous array (can't fit in the cache) is slow too. You might discover that creating a temp array is faster only on huge matrices, but not in general.

@NewBornRustacean
Copy link
Contributor Author

@nilgoyette Sorry for the late update! I was thinking about ways to avoid copying in the non-contiguous case. I was considering whether it might be possible by flattening the input array and then obtaining slices corresponding to each lane, but it didn’t work out as expected.

(I referred to NumPy’s internal implementation, and it seems that NumPy also performs a copy in the non-contiguous case.)

For now, I’ve only applied the changes you mentioned in the last review!

stackoverflow: how can I find the actual implementation of np.partition?

@nilgoyette nilgoyette merged commit 2324d2a into rust-ndarray:master Apr 6, 2025
15 checks passed
@nilgoyette
Copy link
Collaborator

I decided to merge this PR, leaving the question "should we merge this in ndarray?" open. We will probably have to split ndarray into several smaller crates someday. In the meantime, I think we should avoid driving contributor away.

@NewBornRustacean NewBornRustacean deleted the feature-add-numpy-like-partition branch April 7, 2025 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants