Skip to content

fix: build CUDA kernels as multi-arch fatbin with PTX fallback#8047

Merged
0ax1 merged 2 commits into
developfrom
fix-cuda-ptx-gpu-invalidation
May 21, 2026
Merged

fix: build CUDA kernels as multi-arch fatbin with PTX fallback#8047
0ax1 merged 2 commits into
developfrom
fix-cuda-ptx-gpu-invalidation

Conversation

@0ax1
Copy link
Copy Markdown
Contributor

@0ax1 0ax1 commented May 21, 2026

No description provided.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 21, 2026

Merging this PR will improve performance by 24.49%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
✅ 1233 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 225.1 µs 187.9 µs +19.79%
Simulation fast_eq_out_of_range[4, 65536] 246.3 µs 188.5 µs +30.65%
Simulation fast_lt_out_of_range[16, 65536] 306.6 µs 248.8 µs +23.24%
Simulation fast_eq_out_of_range[16, 65536] 291.3 µs 233.9 µs +24.53%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing fix-cuda-ptx-gpu-invalidation (b890192) with develop (a8fb30e)

Open in CodSpeed

@0ax1 0ax1 force-pushed the fix-cuda-ptx-gpu-invalidation branch from 1d396b0 to a9de7f2 Compare May 21, 2026 13:41
@0ax1 0ax1 requested a review from a team May 21, 2026 13:41
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the fix-cuda-ptx-gpu-invalidation branch from a9de7f2 to 4ccd5e2 Compare May 21, 2026 14:55
@0ax1 0ax1 changed the title fix: rebuild CUDA PTX when GPU device configuration changes fix: build CUDA kernels as multi arch fatbin with PTX fallbacks May 21, 2026
@0ax1 0ax1 disabled auto-merge May 21, 2026 14:56
@0ax1 0ax1 enabled auto-merge (squash) May 21, 2026 14:56
@0ax1 0ax1 changed the title fix: build CUDA kernels as multi arch fatbin with PTX fallbacks fix: build CUDA kernels as multi-arch fatbin with PTX fallbacks May 21, 2026
@0ax1 0ax1 disabled auto-merge May 21, 2026 15:01
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the fix-cuda-ptx-gpu-invalidation branch from 4ccd5e2 to b890192 Compare May 21, 2026 15:06
@0ax1 0ax1 changed the title fix: build CUDA kernels as multi-arch fatbin with PTX fallbacks fix: build CUDA kernels as multi-arch fatbin with PTX fallback May 21, 2026
@0ax1 0ax1 enabled auto-merge (squash) May 21, 2026 15:08
@0ax1 0ax1 disabled auto-merge May 21, 2026 15:08
@0ax1 0ax1 merged commit f852d72 into develop May 21, 2026
62 checks passed
@0ax1 0ax1 deleted the fix-cuda-ptx-gpu-invalidation branch May 21, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants