Skip to content

chore: benchmark gpu ci#6107

Merged
joseph-isaacs merged 7 commits intodevelopfrom
ji/cuda-ci-benchmark
Jan 22, 2026
Merged

chore: benchmark gpu ci#6107
joseph-isaacs merged 7 commits intodevelopfrom
ji/cuda-ci-benchmark

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs commented Jan 22, 2026

Add codspeed runs for GPU kernels

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs requested a review from 0ax1 January 22, 2026 15:48
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs changed the title add gpu ci more families chore: benchmark gpu ci Jan 22, 2026
@joseph-isaacs joseph-isaacs marked this pull request as ready for review January 22, 2026 15:49
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 22, 2026

Merging this PR will degrade performance by 15.76%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 7 improved benchmarks
❌ 2 regressed benchmarks
✅ 1245 untouched benchmarks
🆕 20 new benchmarks
⏩ 1254 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 WallTime u64_FoR[1K] N/A 7 µs N/A
🆕 WallTime u8_FoR[10M] N/A 5.9 µs N/A
🆕 WallTime u32_FoR[1M] N/A 11.8 µs N/A
🆕 WallTime u32_FoR[10K] N/A 6.3 µs N/A
🆕 WallTime u64_FoR[100K] N/A 12.3 µs N/A
🆕 WallTime u32_FoR[1K] N/A 6.3 µs N/A
🆕 WallTime u16_FoR[1K] N/A 5.9 µs N/A
🆕 WallTime u8_FoR[1K] N/A 8.9 µs N/A
🆕 WallTime u16_FoR[10M] N/A 6.8 µs N/A
🆕 WallTime u32_FoR[10M] N/A 174.3 µs N/A
🆕 WallTime u64_FoR[10M] N/A 341.7 µs N/A
🆕 WallTime u8_FoR[100K] N/A 5.9 µs N/A
🆕 WallTime u64_FoR[1M] N/A 34.8 µs N/A
🆕 WallTime u32_FoR[100K] N/A 7.6 µs N/A
🆕 WallTime u16_FoR[1M] N/A 6.2 µs N/A
🆕 WallTime u16_FoR[10K] N/A 7.4 µs N/A
🆕 WallTime u8_FoR[1M] N/A 5.9 µs N/A
🆕 WallTime u16_FoR[100K] N/A 10.1 µs N/A
🆕 WallTime u8_FoR[10K] N/A 5.9 µs N/A
🆕 WallTime u64_FoR[10K] N/A 13.8 µs N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing ji/cuda-ci-benchmark (b4d59b0) with develop (9d18652)

Open in CodSpeed

Footnotes

  1. 1254 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the changelog/chore A trivial change label Jan 22, 2026
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs merged commit 3100bfa into develop Jan 22, 2026
43 of 45 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/cuda-ci-benchmark branch January 22, 2026 17:20
danking pushed a commit that referenced this pull request Feb 6, 2026
Add codspeed runs for GPU kernels

---------

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/chore A trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants