perf[gpu]: reduce register pressure in dyn dispatch #7489
Merged
CodSpeed HQ / CodSpeed Performance Analysis
succeeded
Apr 16, 2026 in 0s
Performance Gate Passed
⚡ 9 improved benchmarks
✅ 1154 untouched benchmarks
⏩ 1457 skipped benchmarks1
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | take_map[(0.1, 0.5)] |
1,154.5 µs | 980.3 µs | +17.77% |
| ⚡ | Simulation | take_map[(0.1, 1.0)] |
2 ms | 1.6 ms | +20.02% |
| ⚡ | Simulation | patched_take_10k_contiguous_patches |
258.1 µs | 227.7 µs | +13.32% |
| ⚡ | Simulation | patched_take_10k_dispersed |
316 µs | 285.8 µs | +10.58% |
| ⚡ | Simulation | patched_take_10k_contiguous_not_patches |
258.4 µs | 228.1 µs | +13.28% |
| ⚡ | Simulation | patched_take_10k_first_chunk_only |
302 µs | 271.8 µs | +11.14% |
| ⚡ | Simulation | take_10k_first_chunk_only |
270.6 µs | 225.7 µs | +19.89% |
| ⚡ | Simulation | patched_take_10k_random |
270.3 µs | 240 µs | +12.64% |
| ⚡ | Simulation | take_10k_dispersed |
284.4 µs | 239.5 µs | +18.76% |
Comparing ad/cap-values-per-tile (ac24aaf) with develop (1169d84)
Footnotes
-
1457 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Loading