Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Parallelizing over a list of PyArrays with rayon #362

Closed
jatentaki opened this issue Dec 22, 2022 · 0 comments
Closed

[Question] Parallelizing over a list of PyArrays with rayon #362

jatentaki opened this issue Dec 22, 2022 · 0 comments

Comments

@jatentaki
Copy link

I have a simple extension which operates on a single PyReadonlyArray1<'_, f64> and now I want to let it take a list of such arrays and apply the same function to all of them, in parallel. The serial version of a simplified example (I'm not really implementing cumsum) is below:

#[pymodule]
fn repro(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    fn cumsum(array: ArrayView1<'_, f64>) -> Array1<f64> {
        let mut total = 0.;
        Array1::from_iter(
            array
                .as_slice()
                .expect("input not contiguous")
                .iter()
                .map(|v| {
                    total += v;
                    total
                }) 
        )
    }

    #[pyfn(m)]
    #[pyo3(name = "cumsum_single")]
    fn cumsum_single_py<'py>(
        py: Python<'py>,
        array: PyReadonlyArray1<'_, f64>,
    ) -> &'py PyArray1<f64> {
        cumsum(array.as_array()).into_pyarray(py)
    }

    #[pyfn(m)]
    #[pyo3(name = "cumsum_many_sequential")]
    fn cumsum_many_sequential_py<'py>(
        py: Python<'py>,
        arrays: Vec<PyReadonlyArray1<'_, f64>>,
    ) -> Vec<&'py PyArray1<f64>> {
        arrays.into_iter().map(|arr| cumsum_single_py(py, arr)).collect()
    }

    Ok(())
}

The problem is when I try to use rayon and turn into_iter to into_par_iter. The compiler complains as follows:

error[E0599]: the method `into_par_iter` exists for struct `Vec<PyReadonlyArray<'_, f64, Dim<[usize; 1]>>>`, but its trait bounds were not satisfied
  --> src/lib.rs:46:16
   |
46 |         arrays.into_par_iter().map(|arr| cumsum_single_py(py, arr)).collect()
   |                ^^^^^^^^^^^^^ method cannot be called on `Vec<PyReadonlyArray<'_, f64, Dim<[usize; 1]>>>` due to unsatisfied trait bounds
   |
   = note: the following trait bounds were not satisfied:
           `[PyReadonlyArray<'_, f64, Dim<[usize; 1]>>]: Sized`
           which is required by `[PyReadonlyArray<'_, f64, Dim<[usize; 1]>>]: rayon::iter::IntoParallelIterator`
           `[PyReadonlyArray<'_, f64, Dim<[usize; 1]>>]: rayon::iter::ParallelIterator`
           which is required by `[PyReadonlyArray<'_, f64, Dim<[usize; 1]>>]: rayon::iter::IntoParallelIterator`

The constraints look a bit weird on the rayon side (to my best understanding, [T]: !Sized regardless of T), but perhaps there's a workaround? What I arrived at is below:

    #[pyfn(m)]
    #[pyo3(name = "cumsum_many_rayon")]
    fn cumsum_many_rayon_py<'py>(
        py: Python<'py>,
        arrays: Vec<PyReadonlyArray1<'_, f64>>,
    ) -> Vec<&'py PyArray1<f64>> {
        let arrays: Vec<_> = arrays
            .iter()
            .map(|pa| pa.as_array())
            .collect();
        // first collect: for some reason cannot send PyReadonlyArray<_, _>,
        // with ArrayBase<ViewRepr<_>, _> it works. But they hold references
        // in a way that forces me to materialize a vector, instead of using
        // par_bridge() directly

        let results: Vec<_> = arrays
            .into_par_iter()
            .map(cumsum)
            .collect();
        // second collect: need to turn the parallel iterator back to sequential
        // for into_pyarray
        
        results
            .into_iter()
            .map(|result| result.into_pyarray(py))
            .collect()
        // third collect: to create the actual returned Python list
    }

This solution uses three individual collect calls. It is unclear to me how much overhead that is: are those structs all just holding references or am I copying big arrays of input data? Is there a better way to achieve my goal?

@PyO3 PyO3 locked and limited conversation to collaborators Dec 22, 2022
@adamreichold adamreichold converted this issue into discussion #363 Dec 22, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant