Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk statistics very slow for non-contiguous array data #419

Open
w-k-jones opened this issue Mar 18, 2024 · 0 comments
Open

Bulk statistics very slow for non-contiguous array data #419

w-k-jones opened this issue Mar 18, 2024 · 0 comments
Labels
enhancement Addition of new features, or improved functionality of existing features

Comments

@w-k-jones
Copy link
Member

I've recently noticed that bulk statistics can run very slowly when applied to data that is non-contiguous. This can happen when slicing dask arrays or broadcasting along the trailing dimension. Calling ravel on these arrays is ~20x slower, which, as we do this for each feature, adds up to a big slowdown. I might look into smarter ways of doing this in future to address this issue

Using np.split might be a fast approach, as shown in https://stackoverflow.com/a/43094244

@w-k-jones w-k-jones added the enhancement Addition of new features, or improved functionality of existing features label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Addition of new features, or improved functionality of existing features
Projects
None yet
Development

No branches or pull requests

1 participant