Numba energy permutation test #28

multimeric · 2021-03-01T06:42:01Z

Note: this builds on #27, and changes from that branch will appear here until that is merged.

This re-implements most of the energy distance functions and permutation tests using numba. This provides significant performance improvements. I have some benchmarks below, which compare numba to pure Python (note: this isn't comparing numba to original code that used numpy tricks, it's comparing my changes with and without the JIT). The results suggest that numba improves performance for any number of permutations above 250. I expect this will also be true of multiple different permutation tests in the same program.

And with slightly higher limits:

However, the costs are:

Some ugly numba workarounds, like re-implementing the permutation function using nested loops
We lose the ability to pass in arbitrary average functions, the average parameter is now a string which is either mean or median
We lose the use of the RandomState object, and have to rely on only np.random.seed()
Startup costs associated with JIT compiler. Thus, for less than 250 permutations, the JIT compilation slows down the task.

multimeric · 2021-03-01T08:16:24Z

I would appreciate some help with these doctests, I can't work out how to run them individually. Nor can I work out why they are changing from -8.0 to -7.999999999999999.

vnmabus · 2021-03-01T10:42:46Z

I think we should test the performance against the Numpy version. Numpy is VERY fast, and the Numba rewrite loses some functionality and is harder to maintain, so I would want to see a significant improvement before the rewrite (also, we may even want to have BOTH versions, to preserve the old functionality).

multimeric · 2021-03-03T01:11:23Z

Good point:

What's weird is that my downstream application definitely sped up using numba, but what I'm doing is slightly different and slower than a homogeneity test. Anyway I'll just port my numba code over to that.

vnmabus · 2021-03-03T08:04:12Z

Check that you are compiling numba in nopython mode. Also, what happens with numpy at 0? That behaviour looks like numba jit compilation instead of numpy.

multimeric · 2021-03-03T08:08:28Z

I'm was using njit in all cases. The point here is that numba is fast, it's just that numpy is equally fast, without the compilation step. The first point must be an error somehow. Each timepoint booted from scratch so it can't have been a compilation step or it would be in all timepoints.

vnmabus · 2021-03-03T08:18:38Z

Have you excluded the compilation step from the numba timings?

multimeric · 2021-03-03T08:19:38Z

No I haven't, but that's kind of the point, isn't it? We want to find the problem size at which the numba speedup (if it exists) wins against numpy despite the flagfall cost of compilation. And it seems like that never happens.

vnmabus · 2021-03-03T08:21:07Z

No, because the compilation only happens once, while the function may be called multiple times.

multimeric added 10 commits February 27, 2021 00:50

Add u-statistic argument, with test

d72f9a5

Code review

314eb1a

Code review

637aee4

Numba re-implementation of energy distance

2197f4e

Fix some tests, re-instate random_state, but it must be an integer now

50befe4

Fix tests

57a023d

Add __main__ guard

9d41ebf

More test fixes

9b25569

Add njit to more functions

61312fa

Doctests

5a51ed1

multimeric closed this Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numba energy permutation test #28

Numba energy permutation test #28

multimeric commented Mar 1, 2021 •

edited

multimeric commented Mar 1, 2021

vnmabus commented Mar 1, 2021

multimeric commented Mar 3, 2021

vnmabus commented Mar 3, 2021

multimeric commented Mar 3, 2021

vnmabus commented Mar 3, 2021

multimeric commented Mar 3, 2021

vnmabus commented Mar 3, 2021

Numba energy permutation test #28

Numba energy permutation test #28

Conversation

multimeric commented Mar 1, 2021 • edited

multimeric commented Mar 1, 2021

vnmabus commented Mar 1, 2021

multimeric commented Mar 3, 2021

vnmabus commented Mar 3, 2021

multimeric commented Mar 3, 2021

vnmabus commented Mar 3, 2021

multimeric commented Mar 3, 2021

vnmabus commented Mar 3, 2021

multimeric commented Mar 1, 2021 •

edited