Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRACKER: Numba engine performance with rolling operations #52

Open
mroeschke opened this issue Nov 12, 2021 · 5 comments
Open

TRACKER: Numba engine performance with rolling operations #52

mroeschke opened this issue Nov 12, 2021 · 5 comments

Comments

@mroeschke
Copy link
Collaborator

mroeschke commented Nov 12, 2021

As of

In [1]: pd.__version__
Out[1]: '1.4.0.dev0+1085.g01b86edbbb'

From this ASV

class NumbaVSCython:
    params = (
        ["sum", "max", "min", "median", "mean"],
        [
            ("cython", None),
            ("numba", {"parallel": True}),
            ("numba", {"parallel": False}),
        ],
        [1, 100],
    )

    param_names = ["method", "engine_kwargs", "cols"]

    def setup(self, method, engine_kwargs, cols):
        self.engine, self.engine_kwargs = engine_kwargs
        self.roll = pd.DataFrame(np.random.randn(10_000, cols)).rolling(100)
        getattr(self.roll, method)(engine=self.engine, engine_kwargs=self.engine_kwargs)

    def time_method(self, method, engine_kwargs, cols):
        getattr(self.roll, method)(engine=self.engine, engine_kwargs=self.engine_kwargs)

mean and sum have the sliding algorithms implemented. min, max, median use np.nanmethod

[ 50.00%] ··· ======== ================================ ========== ==========
              --                                                 cols
              ----------------------------------------- ---------------------
               method           engine_kwargs               1         100
              ======== ================================ ========== ==========
                sum            ('cython', None)          297±0μs    27.5±0ms
                sum     ('numba', {'parallel': True})    436±0μs    16.9±0ms
                sum     ('numba', {'parallel': False})   318±0μs    19.3±0ms
                max            ('cython', None)          439±0μs    46.0±0ms
                max     ('numba', {'parallel': True})    818±0μs    86.1±0ms
                max     ('numba', {'parallel': False})   3.15±0ms   328±0ms
                min            ('cython', None)          454±0μs    48.8±0ms
                min     ('numba', {'parallel': True})    801±0μs    93.5±0ms
                min     ('numba', {'parallel': False})   3.23±0ms   332±0ms
               median          ('cython', None)          6.36±0ms   643±0ms
               median   ('numba', {'parallel': True})    5.47±0ms   567±0ms
               median   ('numba', {'parallel': False})   15.5±0ms   1.55±0s
                mean           ('cython', None)          299±0μs    28.3±0ms
                mean    ('numba', {'parallel': True})    545±0μs    19.1±0ms
                mean    ('numba', {'parallel': False})   474±0μs    31.9±0ms
              ======== ================================ ========== ==========
@mroeschke
Copy link
Collaborator Author

As of

In [1]: pd.__version__
Out[1]: '1.5.0.dev0+110.g439906e07d'
[ 75.00%] ··· rolling.NumbaVSCython.time_gb_method                                                                          12/30 failed
[ 75.00%] ··· ======== ================================ ============= =============
              --                                                    cols
              ----------------------------------------- ---------------------------
               method           engine_kwargs                 1            100
              ======== ================================ ============= =============
                sum            ('cython', None)            351±5μs     2.33±0.02ms
                sum     ('numba', {'parallel': True})    1.34±0.01ms    8.52±0.1ms
                sum     ('numba', {'parallel': False})   1.07±0.01ms   12.1±0.05ms
                max            ('cython', None)             failed        failed
                max     ('numba', {'parallel': True})       failed        failed
                max     ('numba', {'parallel': False})      failed        failed
                min            ('cython', None)             failed        failed
                min     ('numba', {'parallel': True})       failed        failed
                min     ('numba', {'parallel': False})      failed        failed
                var            ('cython', None)            157±5μs     2.27±0.05ms
                var     ('numba', {'parallel': True})    1.38±0.01ms    10.6±0.1ms
                var     ('numba', {'parallel': False})   1.09±0.01ms   16.4±0.06ms
                mean           ('cython', None)            148±2μs     2.00±0.03ms
                mean    ('numba', {'parallel': True})    1.40±0.01ms    10.7±0.2ms
                mean    ('numba', {'parallel': False})   1.10±0.03ms   16.9±0.07ms
              ======== ================================ ============= =============

[100.00%] ··· rolling.NumbaVSCython.time_roll_method                                                                        12/30 failed
[100.00%] ··· ======== ================================ ========== ============
              --                                                  cols
              ----------------------------------------- -----------------------
               method           engine_kwargs               1          100
              ======== ================================ ========== ============
                sum            ('cython', None)          461±2μs    27.8±0.2ms
                sum     ('numba', {'parallel': True})    365±10μs   12.3±0.2ms
                sum     ('numba', {'parallel': False})   358±2μs    14.8±0.4ms
                max            ('cython', None)           failed      failed
                max     ('numba', {'parallel': True})     failed      failed
                max     ('numba', {'parallel': False})    failed      failed
                min            ('cython', None)           failed      failed
                min     ('numba', {'parallel': True})     failed      failed
                min     ('numba', {'parallel': False})    failed      failed
                var            ('cython', None)          598±9μs     34.7±2ms
                var     ('numba', {'parallel': True})    428±10μs   13.1±0.4ms
                var     ('numba', {'parallel': False})   447±3μs    19.7±0.3ms
                mean           ('cython', None)          502±4μs     30.8±2ms
                mean    ('numba', {'parallel': True})    484±20μs   14.8±0.5ms
                mean    ('numba', {'parallel': False})   496±20μs    25.6±1ms
              ======== ================================ ========== ============

@mroeschke
Copy link
Collaborator Author

Runs with different thread levels and more cols

% NUMBA_NUM_THREADS=4 asv run -b rolling.NumbaVSCython

[ 75.00%] ··· rolling.NumbaVSCython.time_gb_method                                                                                                             ok
[ 75.00%] ··· ======== ================================ ============= ============= =============
              --                                                           cols
              ----------------------------------------- -----------------------------------------
               method           engine_kwargs                 1            100           1000
              ======== ================================ ============= ============= =============
                sum            ('cython', None)            379±50μs     2.35±0.2ms    19.8±0.3ms
                sum     ('numba', {'parallel': True})    1.37±0.01ms   8.30±0.03ms    76.6±0.5ms
                sum     ('numba', {'parallel': False})   1.09±0.01ms   12.0±0.01ms    119±0.9ms
                max            ('cython', None)            307±2μs     2.40±0.01ms   20.4±0.08ms
                max     ('numba', {'parallel': True})    3.11±0.02ms     70.0±3ms      695±20ms
                max     ('numba', {'parallel': False})   2.24±0.01ms    131±0.3ms      1.36±0s
                min            ('cython', None)            309±2μs     2.44±0.01ms   21.0±0.06ms
                min     ('numba', {'parallel': True})    3.11±0.01ms     69.3±3ms      705±2ms
                min     ('numba', {'parallel': False})   2.24±0.01ms    132±0.4ms     1.37±0.01s
                var            ('cython', None)           155±0.5μs    2.24±0.08ms    21.5±0.3ms
                var     ('numba', {'parallel': True})    1.41±0.01ms   10.3±0.06ms     101±1ms
                var     ('numba', {'parallel': False})   1.13±0.01ms    16.3±0.1ms     169±1ms
                mean           ('cython', None)           146±0.7μs    1.95±0.02ms    18.7±0.5ms
                mean    ('numba', {'parallel': True})     1.45±0.1ms    10.8±0.6ms     107±3ms
                mean    ('numba', {'parallel': False})   1.13±0.01ms   16.6±0.09ms     188±1ms
              ======== ================================ ============= ============= =============

[100.00%] ··· rolling.NumbaVSCython.time_roll_method                                                                                                           ok
[100.00%] ··· ======== ================================ ============= ============ ==========
              --                                                         cols
              ----------------------------------------- -------------------------------------
               method           engine_kwargs                 1           100         1000
              ======== ================================ ============= ============ ==========
                sum            ('cython', None)            452±5μs     26.3±0.2ms   309±20ms
                sum     ('numba', {'parallel': True})      372±20μs     12.6±1ms    132±3ms
                sum     ('numba', {'parallel': False})    346±0.7μs    14.4±0.3ms   190±2ms
                max            ('cython', None)            641±3μs      46.0±1ms    504±2ms
                max     ('numba', {'parallel': True})     3.18±0.2ms   32.4±0.5ms   328±3ms
                max     ('numba', {'parallel': False})    2.87±0.3ms   59.0±0.4ms   614±2ms
                min            ('cython', None)            653±10μs    43.9±0.1ms   503±3ms
                min     ('numba', {'parallel': True})    1.13±0.06ms   30.4±0.3ms   327±2ms
                min     ('numba', {'parallel': False})     1.28±0ms    58.1±0.8ms   614±2ms
                var            ('cython', None)            586±1μs     32.7±0.7ms   359±1ms
                var     ('numba', {'parallel': True})      440±10μs    13.0±0.4ms   140±5ms
                var     ('numba', {'parallel': False})     450±6μs     20.0±0.3ms   240±1ms
                mean           ('cython', None)            468±3μs     26.9±0.2ms   319±3ms
                mean    ('numba', {'parallel': True})      457±10μs    14.2±0.5ms   152±2ms
                mean    ('numba', {'parallel': False})     472±9μs     24.3±0.3ms   286±2ms
              ======== ================================ ============= ============ ==========
%  NUMBA_NUM_THREADS=2 asv run -b rolling.NumbaVSCython
[ 75.00%] ··· rolling.NumbaVSCython.time_gb_method                                                                                                             ok
[ 75.00%] ··· ======== ================================ ============= ============= ============
              --                                                          cols
              ----------------------------------------- ----------------------------------------
               method           engine_kwargs                 1            100          1000
              ======== ================================ ============= ============= ============
                sum            ('cython', None)           438±100μs      3.90±2ms    38.1±20ms
                sum     ('numba', {'parallel': True})      1.87±1ms      12.5±6ms    80.5±10ms
                sum     ('numba', {'parallel': False})   1.12±0.08ms    12.3±0.7ms   120±0.6ms
                max            ('cython', None)            308±1μs     2.31±0.02ms   20.5±0.1ms
                max     ('numba', {'parallel': True})    2.30±0.01ms    72.7±0.5ms    758±6ms
                max     ('numba', {'parallel': False})   2.24±0.02ms    131±0.4ms     1.36±0s
                min            ('cython', None)            306±3μs     2.40±0.03ms   20.9±0.1ms
                min     ('numba', {'parallel': True})    2.31±0.01ms    71.2±0.7ms    752±7ms
                min     ('numba', {'parallel': False})     2.23±0ms     132±0.5ms     1.38±0s
                var            ('cython', None)            156±1μs     2.16±0.08ms   21.5±0.2ms
                var     ('numba', {'parallel': True})    1.17±0.01ms    10.8±0.6ms    104±1ms
                var     ('numba', {'parallel': False})   1.13±0.01ms   16.4±0.08ms   169±0.9ms
                mean           ('cython', None)            146±3μs     2.01±0.03ms   18.3±0.3ms
                mean    ('numba', {'parallel': True})    1.17±0.01ms    11.2±0.3ms    113±2ms
                mean    ('numba', {'parallel': False})   1.13±0.01ms    17.0±0.3ms    187±1ms
              ======== ================================ ============= ============= ============

[100.00%] ··· rolling.NumbaVSCython.time_roll_method                                                                                                           ok
[100.00%] ··· ======== ================================ ============= ============ ===========
              --                                                         cols
              ----------------------------------------- --------------------------------------
               method           engine_kwargs                 1           100          1000
              ======== ================================ ============= ============ ===========
                sum            ('cython', None)            455±6μs     26.4±0.4ms    306±2ms
                sum     ('numba', {'parallel': True})      286±1μs     10.4±0.4ms   131±0.3ms
                sum     ('numba', {'parallel': False})     351±3μs     14.5±0.3ms    190±1ms
                max            ('cython', None)            633±5μs     44.6±0.7ms    505±1ms
                max     ('numba', {'parallel': True})    2.04±0.02ms   33.5±0.1ms    358±3ms
                max     ('numba', {'parallel': False})   2.45±0.03ms   59.4±0.4ms    609±1ms
                min            ('cython', None)            647±5μs     45.7±0.2ms    505±3ms
                min     ('numba', {'parallel': True})      798±5μs     32.2±0.4ms    348±4ms
                min     ('numba', {'parallel': False})   1.27±0.01ms   58.2±0.4ms   608±0.8ms
                var            ('cython', None)            585±6μs     32.6±0.8ms    370±6ms
                var     ('numba', {'parallel': True})     336±0.8μs    13.1±0.4ms    157±2ms
                var     ('numba', {'parallel': False})     455±3μs     20.1±0.3ms   239±0.7ms
                mean           ('cython', None)            468±2μs     27.0±0.6ms    316±2ms
                mean    ('numba', {'parallel': True})      396±8μs     15.5±0.5ms   182±0.9ms
                mean    ('numba', {'parallel': False})     452±4μs     24.6±0.5ms    273±1ms
              ======== ================================ ============= ============ ===========

@mroeschke
Copy link
Collaborator Author

No modifications, param'd over threads (no change from above)

[ 75.00%] ··· ======== ================================ ====== ============= =============
              --                                                         threads
              ------------------------------------------------ ---------------------------
               method           engine_kwargs            cols        2             4
              ======== ================================ ====== ============= =============
                sum            ('cython', None)           1       363±10μs      376±20μs
                sum            ('cython', None)          100     2.38±0.1ms    2.45±0.7ms
                sum            ('cython', None)          1000    19.8±0.8ms     21.1±4ms
                sum     ('numba', {'parallel': True})     1     1.15±0.08ms   1.36±0.03ms
                sum     ('numba', {'parallel': True})    100     8.54±0.3ms    8.44±0.3ms
                sum     ('numba', {'parallel': True})    1000     82.3±8ms      82.1±3ms
                sum     ('numba', {'parallel': False})    1     1.08±0.01ms   1.09±0.01ms
                sum     ('numba', {'parallel': False})   100    12.0±0.03ms   12.0±0.09ms
                sum     ('numba', {'parallel': False})   1000     121±2ms       121±2ms
                max            ('cython', None)           1       315±1μs      317±0.2μs
                max            ('cython', None)          100    2.39±0.01ms   2.39±0.01ms
                max            ('cython', None)          1000    20.5±0.2ms    20.5±0.2ms
                max     ('numba', {'parallel': True})     1     2.33±0.01ms   3.10±0.01ms
                max     ('numba', {'parallel': True})    100      73.7±1ms      70.1±1ms
                max     ('numba', {'parallel': True})    1000     767±3ms      712±0.7ms
                max     ('numba', {'parallel': False})    1     2.25±0.02ms   2.26±0.04ms
                max     ('numba', {'parallel': False})   100      132±2ms       131±2ms
                max     ('numba', {'parallel': False})   1000     1.37±0s       1.37±0s
                min            ('cython', None)           1       312±1μs      315±0.3μs
                min            ('cython', None)          100    2.43±0.01ms   2.43±0.01ms
                min            ('cython', None)          1000    20.9±0.1ms    20.9±0.2ms
                min     ('numba', {'parallel': True})     1       2.31±0ms    3.13±0.01ms
                min     ('numba', {'parallel': True})    100     71.7±0.3ms     69.5±2ms
                min     ('numba', {'parallel': True})    1000     766±2ms       714±4ms
                min     ('numba', {'parallel': False})    1     2.30±0.04ms   2.26±0.02ms
                min     ('numba', {'parallel': False})   100      133±1ms       132±1ms
                min     ('numba', {'parallel': False})   1000    1.40±0.03s     1.38±0s
                var            ('cython', None)           1       163±1μs       161±3μs
                var            ('cython', None)          100    2.17±0.03ms   2.31±0.04ms
                var            ('cython', None)          1000    21.5±0.1ms   21.5±0.07ms
                var     ('numba', {'parallel': True})     1     1.17±0.01ms   1.42±0.02ms
                var     ('numba', {'parallel': True})    100    10.9±0.02ms    10.5±0.1ms
                var     ('numba', {'parallel': True})    1000     104±1ms       101±2ms
                var     ('numba', {'parallel': False})    1       1.13±0ms    1.15±0.01ms
                var     ('numba', {'parallel': False})   100     16.3±0.1ms   16.4±0.09ms
                var     ('numba', {'parallel': False})   1000     170±2ms       170±1ms
                mean           ('cython', None)           1       154±1μs      154±0.9μs
                mean           ('cython', None)          100    1.99±0.04ms   1.95±0.05ms
                mean           ('cython', None)          1000    19.1±0.9ms    18.8±0.3ms
                mean    ('numba', {'parallel': True})     1     1.18±0.01ms   1.44±0.01ms
                mean    ('numba', {'parallel': True})    100    10.9±0.03ms    10.5±0.1ms
                mean    ('numba', {'parallel': True})    1000     118±1ms       104±3ms
                mean    ('numba', {'parallel': False})    1     1.15±0.01ms   1.14±0.01ms
                mean    ('numba', {'parallel': False})   100    16.6±0.03ms    17.0±0.2ms
                mean    ('numba', {'parallel': False})   1000     186±1ms      187±0.9ms
              ======== ================================ ====== ============= =============

[100.00%] ··· rolling.NumbaVSCython.time_roll_method                                                                                                           ok
[100.00%] ··· ======== ================================ ============= ============= ============= ============= ========== ==========
              --                                                                        cols / threads
              ----------------------------------------- -----------------------------------------------------------------------------
               method           engine_kwargs               1 / 2         1 / 4        100 / 2       100 / 4     1000 / 2   1000 / 4
              ======== ================================ ============= ============= ============= ============= ========== ==========
                sum            ('cython', None)            459±7μs       461±5μs      26.3±0.2ms   26.0±0.09ms   308±3ms    308±4ms
                sum     ('numba', {'parallel': True})      293±2μs       371±20μs     10.4±0.4ms    12.1±0.5ms   131±1ms    131±3ms
                sum     ('numba', {'parallel': False})     370±8μs       367±5μs      14.4±0.3ms    14.9±0.3ms   192±1ms    191±3ms
                max            ('cython', None)            646±10μs      642±2μs       45.5±1ms     44.1±0.2ms   504±2ms    505±3ms
                max     ('numba', {'parallel': True})    2.07±0.01ms    3.33±0.2ms    33.9±0.4ms    32.0±0.5ms   362±2ms    329±2ms
                max     ('numba', {'parallel': False})   2.49±0.01ms   2.48±0.01ms    58.1±0.3ms    59.1±0.6ms   611±3ms    614±10ms
                min            ('cython', None)            654±5μs       652±5μs     44.2±0.08ms    44.7±0.8ms   506±5ms    510±5ms
                min     ('numba', {'parallel': True})      812±10μs    1.12±0.05ms    33.0±0.5ms    30.8±0.9ms   364±1ms    331±4ms
                min     ('numba', {'parallel': False})   1.31±0.02ms   1.29±0.01ms    57.7±0.4ms    57.7±0.4ms   616±2ms    614±4ms
                var            ('cython', None)            599±2μs       597±8μs      31.8±0.5ms    31.7±0.1ms   368±3ms    366±1ms
                var     ('numba', {'parallel': True})     343±0.6μs      442±20μs     12.8±0.4ms    13.2±0.4ms   158±2ms    141±3ms
                var     ('numba', {'parallel': False})     459±4μs       461±4μs      20.0±0.3ms    19.8±0.3ms   239±2ms    240±3ms
                mean           ('cython', None)            485±6μs       480±10μs     27.1±0.3ms    27.0±0.1ms   323±3ms    322±2ms
                mean    ('numba', {'parallel': True})      393±3μs       476±8μs      15.3±0.4ms    14.6±0.4ms   181±3ms    156±3ms
                mean    ('numba', {'parallel': False})     459±1μs       466±1μs      24.8±0.2ms    24.5±0.6ms   288±5ms    282±10ms
              ======== ================================ ============= ============= ============= ============= ========== ==========

@mroeschke
Copy link
Collaborator Author

xref numba/numba#4031 but our functions do not specify parallel=False in the inner function kernels

@mroeschke
Copy link
Collaborator Author

Some other parallel diagnostics from this local timeit test setup

import numba
cols = 1000
df = pd.DataFrame(np.random.randn(10_000, cols))
roll = df.rolling(100)
# cache
roll.mean(engine="numba", engine_kwargs={"nopython": True, "nogil": True, "parallel": True})
%timeit roll.mean(engine="numba", engine_kwargs={"nopython": True, "nogil": True, "parallel": True})

Threading backend

Backend timeit
omp 209 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
tbb 221 ms ± 5.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
workqueue 220 ms ± 6.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Setting threads with omp backend

Threads timeit
1 347 ms ± 26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2 201 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3 206 ms ± 4.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba function setup

Setup timeit
2D w/ np.nanmean 933 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2D w/ custom nanmean 634 ms ± 34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant