Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize parallel batch size (thrpt +95% on large input) #8

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

uhmarcel
Copy link
Owner

No description provided.

@github-actions
Copy link

Test Results

14 tests  ±0   14 ✔️ ±0   0s ⏱️ ±0s
  4 suites ±0     0 💤 ±0 
  1 files   ±0     0 ±0 

Results for commit cf14f61. ± Comparison against base commit 0050529.

@github-actions
Copy link

github-actions bot commented Nov 22, 2022

Profiling Report

encode/3                time:   [48.231 ns 48.881 ns 49.571 ns]
                        thrpt:  [57.715 MiB/s 58.531 MiB/s 59.319 MiB/s]
                 change:
+                        time:   [-7.5637% -5.6506% -3.6297%] (p = 0.00 < 0.05)
+                        thrpt:  [+3.7664% +5.9890% +8.1826%]
+                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
encode/50               time:   [111.10 ns 112.83 ns 114.63 ns]
                        thrpt:  [415.96 MiB/s 422.61 MiB/s 429.20 MiB/s]
                 change:
                        time:   [-5.3501% -3.0388% -0.7295%] (p = 0.01 < 0.05)
                        thrpt:  [+0.7349% +3.1340% +5.6525%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
encode/100              time:   [170.59 ns 173.22 ns 176.06 ns]
                        thrpt:  [541.67 MiB/s 550.55 MiB/s 559.04 MiB/s]
                 change:
+                        time:   [-11.509% -9.7094% -7.9602%] (p = 0.00 < 0.05)
+                        thrpt:  [+8.6486% +10.753% +13.006%]
+                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
encode/500              time:   [488.52 ns 496.10 ns 504.41 ns]
                        thrpt:  [945.34 MiB/s 961.18 MiB/s 976.08 MiB/s]
                 change:
                        time:   [-3.6829% -1.3284% +1.1340%] (p = 0.30 > 0.05)
                        thrpt:  [-1.1213% +1.3463% +3.8238%]
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
encode/3072             time:   [2.6036 µs 2.6431 µs 2.6831 µs]
                        thrpt:  [1.0663 GiB/s 1.0825 GiB/s 1.0989 GiB/s]
                 change:
                        time:   [-2.6390% -0.2730% +2.0255%] (p = 0.82 > 0.05)
                        thrpt:  [-1.9853% +0.2737% +2.7105%]
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
encode/51200            time:   [41.359 µs 41.878 µs 42.460 µs]
                        thrpt:  [1.1230 GiB/s 1.1386 GiB/s 1.1529 GiB/s]
                 change:
                        time:   [-5.9275% -3.0816% -0.5279%] (p = 0.02 < 0.05)
                        thrpt:  [+0.5307% +3.1795% +6.3010%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
encode/102400           time:   [82.484 µs 83.752 µs 85.135 µs]
                        thrpt:  [1.1202 GiB/s 1.1387 GiB/s 1.1562 GiB/s]
                 change:
                        time:   [-2.7356% -0.3244% +2.3488%] (p = 0.81 > 0.05)
                        thrpt:  [-2.2949% +0.3255% +2.8126%]
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
encode/512000           time:   [263.52 µs 267.46 µs 271.84 µs]
                        thrpt:  [1.7541 GiB/s 1.7828 GiB/s 1.8095 GiB/s]
                 change:
                        time:   [-10.280% -0.5727% +15.343%] (p = 0.94 > 0.05)
                        thrpt:  [-13.302% +0.5760% +11.457%]
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
encode/1048576          time:   [533.55 µs 541.93 µs 550.85 µs]
                        thrpt:  [1.7728 GiB/s 1.8020 GiB/s 1.8303 GiB/s]
                 change:
+                        time:   [-6.6874% -4.3745% -2.2134%] (p = 0.00 < 0.05)
+                        thrpt:  [+2.2635% +4.5746% +7.1667%]
+                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
encode/5242880          time:   [2.5958 ms 2.6350 ms 2.6769 ms]
                        thrpt:  [1.8240 GiB/s 1.8530 GiB/s 1.8811 GiB/s]
                 change:
                        time:   [-5.9152% -3.3551% -0.8037%] (p = 0.02 < 0.05)
                        thrpt:  [+0.8102% +3.4716% +6.2871%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
encode/10485760         time:   [5.3257 ms 5.4043 ms 5.4872 ms]
                        thrpt:  [1.7797 GiB/s 1.8070 GiB/s 1.8337 GiB/s]
                 change:
+                        time:   [-7.8278% -5.9008% -3.8301%] (p = 0.00 < 0.05)
+                        thrpt:  [+3.9826% +6.2708% +8.4925%]
+                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
encode/20971520         time:   [13.052 ms 13.248 ms 13.449 ms]
                        thrpt:  [1.4523 GiB/s 1.4743 GiB/s 1.4964 GiB/s]
                 change:
-                        time:   [+2.1988% +4.4483% +6.8025%] (p = 0.00 < 0.05)
-                        thrpt:  [-6.3692% -4.2588% -2.1515%]
-                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

decode/3                time:   [67.834 ns 69.019 ns 70.241 ns]
                        thrpt:  [40.732 MiB/s 41.452 MiB/s 42.177 MiB/s]
                 change:
                        time:   [-2.1545% +0.0549% +2.1478%] (p = 0.97 > 0.05)
                        thrpt:  [-2.1027% -0.0548% +2.2019%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
decode/50               time:   [98.910 ns 100.40 ns 101.90 ns]
                        thrpt:  [467.93 MiB/s 474.93 MiB/s 482.09 MiB/s]
                 change:
                        time:   [+0.8294% +3.0822% +5.2812%] (p = 0.00 < 0.05)
                        thrpt:  [-5.0162% -2.9900% -0.8226%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
decode/100              time:   [129.29 ns 131.54 ns 133.94 ns]
                        thrpt:  [712.00 MiB/s 725.01 MiB/s 737.64 MiB/s]
                 change:
+                        time:   [-7.5967% -5.1909% -2.8523%] (p = 0.00 < 0.05)
+                        thrpt:  [+2.9361% +5.4752% +8.2212%]
+                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
decode/500              time:   [418.42 ns 424.14 ns 430.16 ns]
                        thrpt:  [1.0825 GiB/s 1.0979 GiB/s 1.1129 GiB/s]
                 change:
                        time:   [-3.5392% -0.7827% +2.4215%] (p = 0.63 > 0.05)
                        thrpt:  [-2.3642% +0.7889% +3.6690%]
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
decode/3072             time:   [2.2996 µs 2.3282 µs 2.3566 µs]
                        thrpt:  [1.2141 GiB/s 1.2289 GiB/s 1.2442 GiB/s]
                 change:
                        time:   [-3.6788% -0.9639% +1.6016%] (p = 0.49 > 0.05)
                        thrpt:  [-1.5764% +0.9733% +3.8193%]
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
decode/51200            time:   [35.130 µs 35.741 µs 36.364 µs]
                        thrpt:  [1.3113 GiB/s 1.3341 GiB/s 1.3573 GiB/s]
                 change:
+                        time:   [-9.9149% -7.8303% -6.0052%] (p = 0.00 < 0.05)
+                        thrpt:  [+6.3889% +8.4955% +11.006%]
+                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
decode/102400           time:   [70.584 µs 71.711 µs 73.039 µs]
                        thrpt:  [1.3057 GiB/s 1.3299 GiB/s 1.3511 GiB/s]
                 change:
                        time:   [-4.1450% -2.3132% -0.3086%] (p = 0.02 < 0.05)
                        thrpt:  [+0.3095% +2.3680% +4.3243%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
decode/512000           time:   [232.66 µs 237.76 µs 243.85 µs]
                        thrpt:  [1.9554 GiB/s 2.0056 GiB/s 2.0495 GiB/s]
                 change:
                        time:   [-2.2679% +0.1261% +2.6688%] (p = 0.92 > 0.05)
                        thrpt:  [-2.5994% -0.1259% +2.3206%]
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  6 (6.00%) high severe
decode/1048576          time:   [454.89 µs 464.77 µs 476.11 µs]
                        thrpt:  [2.0511 GiB/s 2.1012 GiB/s 2.1468 GiB/s]
                 change:
-                        time:   [+1.0842% +4.4910% +7.7547%] (p = 0.01 < 0.05)
-                        thrpt:  [-7.1967% -4.2980% -1.0725%]
-                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
decode/5242880          time:   [2.1569 ms 2.1973 ms 2.2447 ms]
                        thrpt:  [2.1753 GiB/s 2.2222 GiB/s 2.2638 GiB/s]
                 change:
                        time:   [+0.4560% +2.6000% +5.0066%] (p = 0.04 < 0.05)
                        thrpt:  [-4.7679% -2.5341% -0.4540%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe
decode/10485760         time:   [4.5099 ms 4.5898 ms 4.6746 ms]
                        thrpt:  [2.0891 GiB/s 2.1277 GiB/s 2.1654 GiB/s]
                 change:
                        time:   [-1.0756% +1.4971% +4.0667%] (p = 0.26 > 0.05)
                        thrpt:  [-3.9078% -1.4750% +1.0873%]
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
decode/20971520         time:   [9.5676 ms 9.7179 ms 9.8714 ms]
                        thrpt:  [1.9786 GiB/s 2.0098 GiB/s 2.0414 GiB/s]
                 change:
                        time:   [-0.9778% +1.3046% +3.4231%] (p = 0.26 > 0.05)
                        thrpt:  [-3.3098% -1.2878% +0.9875%]
                        No change in performance detected.

@uhmarcel uhmarcel merged commit 3ead153 into main Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant