Profiling `_bincount` and attempted acceleration with numba #63

TomNicholas · 2021-06-14T22:05:58Z

I tried profiling the core _bincount function, and then accelerating it with numba, a naive application of which only made it about 30% faster overall. Notebook here (and gist here in case the notebook cache hasn't updated yet).

However I've never used numba before, and not done much profiling in python, so maybe there is something else I should have tried.

I also profilied to see if the block_size argument has much effect.

Also I noticed that #49 introduced a regression where the block_size argument is no-longer passed down to _bincount, so doesn't actually do anything. Fixed in #62.

@rabernat @gjoseph92 @jrbourbeau

The text was updated successfully, but these errors were encountered:

rabernat · 2021-06-15T02:32:20Z

Thanks for doing this Tom!

My impression is that most of the heavy lifting is done by np.digitize, which is already an accelerated function, there is not much room for speedup with numba.

dougiesquire · 2021-06-15T06:34:51Z

This is great @TomNicholas, thanks. I'm curious about your block_size results. Larger block sizes speed things up, which I guess makes sense for small arrays. But I'm curious if you see the same thing for larger arrays. I ask because there's a note in the np.histogram source code (https://github.com/numpy/numpy/blob/v1.20.0/numpy/lib/histograms.py#L824):

    # We iterate over blocks here for two reasons: the first is that for
    # large arrays, it is actually faster (for example for a 10^8 array it
    # is 2x as fast) and it results in a memory footprint 3x lower in the
    # limit of large arrays.

gjoseph92 · 2021-06-15T16:48:22Z

As Ryan said, digitize is just a thin wrapper around searchsorted, which is already implemented in C.

However, I think there is a significant optimization we could lift from np.histogram: when bins have uniform widths (common), you can skip digitize entirely and calculate the bin that a value belongs to in constant-time with some basic rescaling arithmetic (as opposed to binary-searching the bins list): https://github.com/numpy/numpy/blob/main/numpy/lib/histograms.py#L814-L816. This might require relaxing our When using dask arrays, bins must be provided as numpy array(s) of edges error message to also allow n_bins + range.

there is not much room for speedup with numba

This is probably true. Instead, I think the main benefit we could get from Numba would be a more readable _bincount implementation. All the reshaping logic is a bit hard to follow—with Numba, I wonder if we could just call np.histogram in a for-loop and get similar performance? Also, Numba has its own histogram implementation, though it doesn't support weights. We could use that in some cases, or use it as reference for our own, or contribute Numba support for weights (maybe not that hard?).

TomNicholas mentioned this issue Jun 21, 2021

Multidimensional histogram pydata/xarray#5400

Open

18 tasks

dougiesquire mentioned this issue Aug 30, 2021

divide by zero error #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling `_bincount` and attempted acceleration with numba #63

Profiling `_bincount` and attempted acceleration with numba #63

TomNicholas commented Jun 14, 2021 •

edited

rabernat commented Jun 15, 2021

dougiesquire commented Jun 15, 2021

gjoseph92 commented Jun 15, 2021

Profiling _bincount and attempted acceleration with numba #63

Profiling _bincount and attempted acceleration with numba #63

Comments

TomNicholas commented Jun 14, 2021 • edited

rabernat commented Jun 15, 2021

dougiesquire commented Jun 15, 2021

gjoseph92 commented Jun 15, 2021

Profiling `_bincount` and attempted acceleration with numba #63

Profiling `_bincount` and attempted acceleration with numba #63

TomNicholas commented Jun 14, 2021 •

edited