Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling _bincount and attempted acceleration with numba #63

Open
TomNicholas opened this issue Jun 14, 2021 · 3 comments
Open

Profiling _bincount and attempted acceleration with numba #63

TomNicholas opened this issue Jun 14, 2021 · 3 comments

Comments

@TomNicholas
Copy link
Contributor

TomNicholas commented Jun 14, 2021

I tried profiling the core _bincount function, and then accelerating it with numba, a naive application of which only made it about 30% faster overall. Notebook here (and gist here in case the notebook cache hasn't updated yet).

However I've never used numba before, and not done much profiling in python, so maybe there is something else I should have tried.

I also profilied to see if the block_size argument has much effect.

Also I noticed that #49 introduced a regression where the block_size argument is no-longer passed down to _bincount, so doesn't actually do anything. Fixed in #62.

@rabernat @gjoseph92 @jrbourbeau

@rabernat
Copy link
Contributor

Thanks for doing this Tom!

My impression is that most of the heavy lifting is done by np.digitize, which is already an accelerated function, there is not much room for speedup with numba.

@dougiesquire
Copy link
Contributor

This is great @TomNicholas, thanks. I'm curious about your block_size results. Larger block sizes speed things up, which I guess makes sense for small arrays. But I'm curious if you see the same thing for larger arrays. I ask because there's a note in the np.histogram source code (https://github.com/numpy/numpy/blob/v1.20.0/numpy/lib/histograms.py#L824):

    # We iterate over blocks here for two reasons: the first is that for
    # large arrays, it is actually faster (for example for a 10^8 array it
    # is 2x as fast) and it results in a memory footprint 3x lower in the
    # limit of large arrays.

@gjoseph92
Copy link
Contributor

As Ryan said, digitize is just a thin wrapper around searchsorted, which is already implemented in C.

However, I think there is a significant optimization we could lift from np.histogram: when bins have uniform widths (common), you can skip digitize entirely and calculate the bin that a value belongs to in constant-time with some basic rescaling arithmetic (as opposed to binary-searching the bins list): https://github.com/numpy/numpy/blob/main/numpy/lib/histograms.py#L814-L816. This might require relaxing our When using dask arrays, bins must be provided as numpy array(s) of edges error message to also allow n_bins + range.

there is not much room for speedup with numba

This is probably true. Instead, I think the main benefit we could get from Numba would be a more readable _bincount implementation. All the reshaping logic is a bit hard to follow—with Numba, I wonder if we could just call np.histogram in a for-loop and get similar performance? Also, Numba has its own histogram implementation, though it doesn't support weights. We could use that in some cases, or use it as reference for our own, or contribute Numba support for weights (maybe not that hard?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants