perf: branchless boolean zip kernel#8275
Open
joseph-isaacs wants to merge 2 commits into
Open
Conversation
Add a dedicated `ZipKernel for Bool` that blends the two value bitmaps with the mask in a single bitwise pass -- `(true & mask) | (false & !mask)` -- instead of the generic per-run builder, so boolean zips are branch-free and mask-shape independent. Also add a shared `zip_validity` helper to the zip module that builds the result validity as a (lazy) zip over the two boolean validity bitmaps, reusing this kernel. This gives the per-encoding zip kernels one shared validity-selection path; the recursion terminates immediately because validity bitmaps are non-nullable. Adds a small `bool_zip` divan benchmark (nonnull ~7us, nullable ~14us). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
The doc comment on the public ZipKernel impl linked to the pub(crate) zip_validity, which -D rustdoc::private-intra-doc-links rejects. Use a plain code span instead. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
31.6 µs | 46.6 µs | -32.16% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
161.9 µs | 198.2 µs | -18.33% |
| ❌ | Simulation | compare[15] |
119.9 µs | 145.8 µs | -17.78% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.2 µs | 213.5 µs | -17.02% |
| ❌ | Simulation | compare[14] |
117.5 µs | 141.5 µs | -16.97% |
| ❌ | Simulation | compare[13] |
115.5 µs | 137.7 µs | -16.1% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
275.3 ns | 216.9 ns | +26.89% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
336.9 ns | 278.6 ns | +20.94% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[2048] |
400.6 ns | 342.2 ns | +17.05% |
| ⚡ | Simulation | compare[5] |
76.9 µs | 69.2 µs | +11.16% |
| 🆕 | Simulation | nonnull |
N/A | 77.5 µs | N/A |
| 🆕 | Simulation | nullable |
N/A | 135.4 µs | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/bool-branchless-zip (7ad4b18) with develop (e06d80b)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a dedicated, branchless
ZipKernel for Bool. Booleans are bit-packed, so selectingif_truewhere the mask is set andif_falsewhere it is not is a single bitwise blend over the packed words —(true & mask) | (false & !mask)— instead of the generic per-run builder (which degrades to per-element work on fragmented masks). The blend is mask-shape-independent.It also introduces a shared
zip_validityhelper in the zip module: the result validity is itself a zip over the two boolean validity bitmaps, so it's built as a (lazy) zip array reusing this kernel. This gives the per-encoding zip kernels one shared validity-selection path, and the recursion terminates immediately because validity bitmaps are alwaysBool(NonNullable).This lands first; the primitive and listview branchless zip kernels build on it (their nullable validity selection becomes a fast bool zip rather than the slow generic builder).
Changes
vortex-array/src/arrays/bool/compute/zip.rs(new): the kernel.vortex-array/src/arrays/bool/compute/mod.rs,.../vtable/kernel.rs: register it.vortex-array/src/scalar_fn/fns/zip/mod.rs: sharedpub(crate) fn zip_validity.vortex-array/benches/bool_zip.rs(new): a small divan bench.Performance (divan, 65,536 bools, median)
Testing
Validity::Arrayinputs spanning the 64-bit mask chunk boundary + remainder.vortex-arraylib suite passes (2933 tests);cargo +nightly fmtandclippy -D warnings(default + all-features) clean.https://claude.ai/code/session_01N5ivPiCJy7dGQjMEP7ips9
Generated by Claude Code