-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimize parallel batch size (thrpt +95% on large input) #8
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Profiling Reportencode/3 time: [48.231 ns 48.881 ns 49.571 ns]
thrpt: [57.715 MiB/s 58.531 MiB/s 59.319 MiB/s]
change:
+ time: [-7.5637% -5.6506% -3.6297%] (p = 0.00 < 0.05)
+ thrpt: [+3.7664% +5.9890% +8.1826%]
+ Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
encode/50 time: [111.10 ns 112.83 ns 114.63 ns]
thrpt: [415.96 MiB/s 422.61 MiB/s 429.20 MiB/s]
change:
time: [-5.3501% -3.0388% -0.7295%] (p = 0.01 < 0.05)
thrpt: [+0.7349% +3.1340% +5.6525%]
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
encode/100 time: [170.59 ns 173.22 ns 176.06 ns]
thrpt: [541.67 MiB/s 550.55 MiB/s 559.04 MiB/s]
change:
+ time: [-11.509% -9.7094% -7.9602%] (p = 0.00 < 0.05)
+ thrpt: [+8.6486% +10.753% +13.006%]
+ Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
encode/500 time: [488.52 ns 496.10 ns 504.41 ns]
thrpt: [945.34 MiB/s 961.18 MiB/s 976.08 MiB/s]
change:
time: [-3.6829% -1.3284% +1.1340%] (p = 0.30 > 0.05)
thrpt: [-1.1213% +1.3463% +3.8238%]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
encode/3072 time: [2.6036 µs 2.6431 µs 2.6831 µs]
thrpt: [1.0663 GiB/s 1.0825 GiB/s 1.0989 GiB/s]
change:
time: [-2.6390% -0.2730% +2.0255%] (p = 0.82 > 0.05)
thrpt: [-1.9853% +0.2737% +2.7105%]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
encode/51200 time: [41.359 µs 41.878 µs 42.460 µs]
thrpt: [1.1230 GiB/s 1.1386 GiB/s 1.1529 GiB/s]
change:
time: [-5.9275% -3.0816% -0.5279%] (p = 0.02 < 0.05)
thrpt: [+0.5307% +3.1795% +6.3010%]
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
encode/102400 time: [82.484 µs 83.752 µs 85.135 µs]
thrpt: [1.1202 GiB/s 1.1387 GiB/s 1.1562 GiB/s]
change:
time: [-2.7356% -0.3244% +2.3488%] (p = 0.81 > 0.05)
thrpt: [-2.2949% +0.3255% +2.8126%]
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
encode/512000 time: [263.52 µs 267.46 µs 271.84 µs]
thrpt: [1.7541 GiB/s 1.7828 GiB/s 1.8095 GiB/s]
change:
time: [-10.280% -0.5727% +15.343%] (p = 0.94 > 0.05)
thrpt: [-13.302% +0.5760% +11.457%]
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
encode/1048576 time: [533.55 µs 541.93 µs 550.85 µs]
thrpt: [1.7728 GiB/s 1.8020 GiB/s 1.8303 GiB/s]
change:
+ time: [-6.6874% -4.3745% -2.2134%] (p = 0.00 < 0.05)
+ thrpt: [+2.2635% +4.5746% +7.1667%]
+ Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
encode/5242880 time: [2.5958 ms 2.6350 ms 2.6769 ms]
thrpt: [1.8240 GiB/s 1.8530 GiB/s 1.8811 GiB/s]
change:
time: [-5.9152% -3.3551% -0.8037%] (p = 0.02 < 0.05)
thrpt: [+0.8102% +3.4716% +6.2871%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
encode/10485760 time: [5.3257 ms 5.4043 ms 5.4872 ms]
thrpt: [1.7797 GiB/s 1.8070 GiB/s 1.8337 GiB/s]
change:
+ time: [-7.8278% -5.9008% -3.8301%] (p = 0.00 < 0.05)
+ thrpt: [+3.9826% +6.2708% +8.4925%]
+ Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
encode/20971520 time: [13.052 ms 13.248 ms 13.449 ms]
thrpt: [1.4523 GiB/s 1.4743 GiB/s 1.4964 GiB/s]
change:
- time: [+2.1988% +4.4483% +6.8025%] (p = 0.00 < 0.05)
- thrpt: [-6.3692% -4.2588% -2.1515%]
- Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
decode/3 time: [67.834 ns 69.019 ns 70.241 ns]
thrpt: [40.732 MiB/s 41.452 MiB/s 42.177 MiB/s]
change:
time: [-2.1545% +0.0549% +2.1478%] (p = 0.97 > 0.05)
thrpt: [-2.1027% -0.0548% +2.2019%]
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
decode/50 time: [98.910 ns 100.40 ns 101.90 ns]
thrpt: [467.93 MiB/s 474.93 MiB/s 482.09 MiB/s]
change:
time: [+0.8294% +3.0822% +5.2812%] (p = 0.00 < 0.05)
thrpt: [-5.0162% -2.9900% -0.8226%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
decode/100 time: [129.29 ns 131.54 ns 133.94 ns]
thrpt: [712.00 MiB/s 725.01 MiB/s 737.64 MiB/s]
change:
+ time: [-7.5967% -5.1909% -2.8523%] (p = 0.00 < 0.05)
+ thrpt: [+2.9361% +5.4752% +8.2212%]
+ Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
decode/500 time: [418.42 ns 424.14 ns 430.16 ns]
thrpt: [1.0825 GiB/s 1.0979 GiB/s 1.1129 GiB/s]
change:
time: [-3.5392% -0.7827% +2.4215%] (p = 0.63 > 0.05)
thrpt: [-2.3642% +0.7889% +3.6690%]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
decode/3072 time: [2.2996 µs 2.3282 µs 2.3566 µs]
thrpt: [1.2141 GiB/s 1.2289 GiB/s 1.2442 GiB/s]
change:
time: [-3.6788% -0.9639% +1.6016%] (p = 0.49 > 0.05)
thrpt: [-1.5764% +0.9733% +3.8193%]
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
decode/51200 time: [35.130 µs 35.741 µs 36.364 µs]
thrpt: [1.3113 GiB/s 1.3341 GiB/s 1.3573 GiB/s]
change:
+ time: [-9.9149% -7.8303% -6.0052%] (p = 0.00 < 0.05)
+ thrpt: [+6.3889% +8.4955% +11.006%]
+ Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
decode/102400 time: [70.584 µs 71.711 µs 73.039 µs]
thrpt: [1.3057 GiB/s 1.3299 GiB/s 1.3511 GiB/s]
change:
time: [-4.1450% -2.3132% -0.3086%] (p = 0.02 < 0.05)
thrpt: [+0.3095% +2.3680% +4.3243%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
decode/512000 time: [232.66 µs 237.76 µs 243.85 µs]
thrpt: [1.9554 GiB/s 2.0056 GiB/s 2.0495 GiB/s]
change:
time: [-2.2679% +0.1261% +2.6688%] (p = 0.92 > 0.05)
thrpt: [-2.5994% -0.1259% +2.3206%]
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
7 (7.00%) high mild
6 (6.00%) high severe
decode/1048576 time: [454.89 µs 464.77 µs 476.11 µs]
thrpt: [2.0511 GiB/s 2.1012 GiB/s 2.1468 GiB/s]
change:
- time: [+1.0842% +4.4910% +7.7547%] (p = 0.01 < 0.05)
- thrpt: [-7.1967% -4.2980% -1.0725%]
- Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
decode/5242880 time: [2.1569 ms 2.1973 ms 2.2447 ms]
thrpt: [2.1753 GiB/s 2.2222 GiB/s 2.2638 GiB/s]
change:
time: [+0.4560% +2.6000% +5.0066%] (p = 0.04 < 0.05)
thrpt: [-4.7679% -2.5341% -0.4540%]
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
decode/10485760 time: [4.5099 ms 4.5898 ms 4.6746 ms]
thrpt: [2.0891 GiB/s 2.1277 GiB/s 2.1654 GiB/s]
change:
time: [-1.0756% +1.4971% +4.0667%] (p = 0.26 > 0.05)
thrpt: [-3.9078% -1.4750% +1.0873%]
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
decode/20971520 time: [9.5676 ms 9.7179 ms 9.8714 ms]
thrpt: [1.9786 GiB/s 2.0098 GiB/s 2.0414 GiB/s]
change:
time: [-0.9778% +1.3046% +3.4231%] (p = 0.26 > 0.05)
thrpt: [-3.3098% -1.2878% +0.9875%]
No change in performance detected.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.