Skip to content

fix[gpu]: retain device buffers for dyn dispatch kernel#7980

Merged
0ax1 merged 1 commit into
developfrom
ad/retain-buffers-for-launch
May 18, 2026
Merged

fix[gpu]: retain device buffers for dyn dispatch kernel#7980
0ax1 merged 1 commit into
developfrom
ad/retain-buffers-for-launch

Conversation

@0ax1
Copy link
Copy Markdown
Contributor

@0ax1 0ax1 commented May 18, 2026

@0ax1 0ax1 added the changelog/fix A bug fix label May 18, 2026
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a lifetime to catch this going forwards?

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1219 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 239.9 µs 202.5 µs +18.49%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 188 µs 225.2 µs -16.51%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/retain-buffers-for-launch (b0e0f2a) with develop (c8d915a)

Open in CodSpeed

@0ax1
Copy link
Copy Markdown
Contributor Author

0ax1 commented May 18, 2026

Can we use a lifetime to catch this going forwards?

Dynamic dispatch is really a special case, as we use the raw u64 pointer values, and not CUDA views like for other kernels. For all other kernels, CUDA views are passed to the launch args. cudarc under the hood takes care of retaining read records like we do here for dyn dispatch manually.

In general, I'm not sure if this is solvable with lifetimes given that we describe buffer ownership across host and device, which can be freed C async style.

Maybe there's a way to more fundamentally rework our CUDA setup with: https://github.com/NVlabs/cuda-oxide or get inspiration from that.

@0ax1 0ax1 force-pushed the ad/retain-buffers-for-launch branch from 0cbd2bd to 799fdf1 Compare May 18, 2026 12:43
@0ax1 0ax1 requested a review from joseph-isaacs May 18, 2026 12:44
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the ad/retain-buffers-for-launch branch from 799fdf1 to b0e0f2a Compare May 18, 2026 12:47
@0ax1 0ax1 enabled auto-merge (squash) May 18, 2026 13:35
@0ax1 0ax1 merged commit bf1527e into develop May 18, 2026
61 of 62 checks passed
@0ax1 0ax1 deleted the ad/retain-buffers-for-launch branch May 18, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gpu-scan-cli scan --gpu-file panics with "as_slice must be called on host buffer"

2 participants