Skip to content

feat(cuda): hybrid GPU dispatch - fuse dyn + standalone kernels#7127

Merged
0ax1 merged 1 commit intodevelopfrom
ad/fallback
Mar 23, 2026
Merged

feat(cuda): hybrid GPU dispatch - fuse dyn + standalone kernels#7127
0ax1 merged 1 commit intodevelopfrom
ad/fallback

Conversation

@0ax1
Copy link
Contributor

@0ax1 0ax1 commented Mar 23, 2026

Add a hybrid_dispatch module that integrates subtrees with separate kernel dispatch with dynamic dispatch kernels. Subtrees with unsupported encodings (e.g. Zstd) are executed separately and their device buffers are fed back as LOAD ops in the fused plan.

Note that this implicitly enables filtering via the CUDA CUB filter implementation.

@0ax1 0ax1 added the changelog/feature A new feature label Mar 23, 2026
@0ax1 0ax1 changed the title feat(cuda): hybrid dispatch for fused GPU decompression feat(cuda): hybrid GPU dispatch - fuse dyn + standalone kernels Mar 23, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 23, 2026

Merging this PR will degrade performance by 10.2%

❌ 1 regressed benchmark
✅ 1015 untouched benchmarks
⏩ 1522 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation map_each[BufferMut<i32>, 128] 770.6 ns 858.1 ns -10.2%

Comparing ad/fallback (c5a1ed5) with develop (1c8667c)

Open in CodSpeed

Footnotes

  1. 1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@0ax1 0ax1 requested review from onursatici and robert3005 March 23, 2026 13:34
Add hybrid_dispatch module that fuses encoding trees into dynamic-dispatch
kernel launches. Subtrees with unsupported encodings (e.g. Zstd) are
executed separately and their device buffers fed back as LOAD ops in the
fused plan.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 requested a review from a10y March 23, 2026 13:40
@0ax1 0ax1 merged commit 7f541db into develop Mar 23, 2026
59 of 60 checks passed
@0ax1 0ax1 deleted the ad/fallback branch March 23, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants