fix[gpu]: retain device buffers for dyn dispatch kernel#7980
Conversation
joseph-isaacs
left a comment
There was a problem hiding this comment.
Can we use a lifetime to catch this going forwards?
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_into_canonical[(1000, 10)] |
239.9 µs | 202.5 µs | +18.49% |
| ❌ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
188 µs | 225.2 µs | -16.51% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ad/retain-buffers-for-launch (b0e0f2a) with develop (c8d915a)
Dynamic dispatch is really a special case, as we use the raw u64 pointer values, and not CUDA views like for other kernels. For all other kernels, CUDA views are passed to the launch args. cudarc under the hood takes care of retaining read records like we do here for dyn dispatch manually. In general, I'm not sure if this is solvable with lifetimes given that we describe buffer ownership across host and device, which can be freed C async style. Maybe there's a way to more fundamentally rework our CUDA setup with: https://github.com/NVlabs/cuda-oxide or get inspiration from that. |
0cbd2bd to
799fdf1
Compare
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
799fdf1 to
b0e0f2a
Compare
fixes: #7838 (comment)