Skip to content

Conversation

@0ax1
Copy link
Contributor

@0ax1 0ax1 commented Feb 3, 2026

No description provided.

@0ax1
Copy link
Contributor Author

0ax1 commented Feb 3, 2026

runend_cuda/runend/100M_i32_runlen_10
                        time:   [1.4130 ms 1.4140 ms 1.4153 ms]
                        thrpt:  [263.22 GiB/s 263.45 GiB/s 263.64 GiB/s]
                 change:
                        time:   [-0.2379% -0.1412% -0.0295%] (p = 0.02 < 0.05)
                        thrpt:  [+0.0295% +0.1414% +0.2385%]
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
Benchmarking runend_cuda/runend/100M_i32_runlen_100: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.0s or enable flat sampling.
runend_cuda/runend/100M_i32_runlen_100
                        time:   [1.0998 ms 1.1017 ms 1.1043 ms]
                        thrpt:  [337.35 GiB/s 338.15 GiB/s 338.74 GiB/s]
                 change:
                        time:   [-0.6820% -0.5048% -0.3132%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3141% +0.5074% +0.6867%]
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Benchmarking runend_cuda/runend/100M_i32_runlen_1000: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.2s or enable flat sampling.
runend_cuda/runend/100M_i32_runlen_1000
                        time:   [883.47 µs 887.69 µs 891.74 µs]
                        thrpt:  [417.75 GiB/s 419.66 GiB/s 421.66 GiB/s]
                 change:
                        time:   [-0.2266% +0.0782% +0.4080%] (p = 0.66 > 0.05)
                        thrpt:  [-0.4063% -0.0781% +0.2271%]
                        No change in performance detected.
Benchmarking runend_cuda/runend/100M_i32_runlen_10000: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.3s or enable flat sampling.
runend_cuda/runend/100M_i32_runlen_10000
                        time:   [854.77 µs 855.67 µs 857.14 µs]
                        thrpt:  [434.62 GiB/s 435.37 GiB/s 435.83 GiB/s]
                 change:
                        time:   [+0.8327% +1.1222% +1.4345%] (p = 0.00 < 0.05)
                        thrpt:  [-1.4143% -1.1097% -0.8258%]
                        Change within noise threshold.
Benchmarking runend_cuda/runend/100M_i32_runlen_100000: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.7s or enable flat sampling.
runend_cuda/runend/100M_i32_runlen_100000
                        time:   [831.53 µs 832.14 µs 833.25 µs]
                        thrpt:  [447.08 GiB/s 447.67 GiB/s 448.01 GiB/s]
                 change:
                        time:   [-0.3005% -0.1934% -0.0815%] (p = 0.01 < 0.05)
                        thrpt:  [+0.0816% +0.1937% +0.3014%]
                        Change within noise threshold.

@0ax1 0ax1 requested a review from joseph-isaacs February 3, 2026 14:09
@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Feb 3, 2026
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 enabled auto-merge (squash) February 3, 2026 15:09
@0ax1 0ax1 merged commit 332fd06 into develop Feb 3, 2026
75 of 113 checks passed
@0ax1 0ax1 deleted the ad/cuda-runend branch February 3, 2026 15:12
danking pushed a commit that referenced this pull request Feb 6, 2026
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants