perf: branchless boolean zip kernel by joseph-isaacs · Pull Request #8275 · vortex-data/vortex

joseph-isaacs · 2026-06-05T16:37:35Z

Summary

Adds a dedicated, branchless ZipKernel for Bool. Booleans are bit-packed, so selecting if_true where the mask is set and if_false where it is not is a single bitwise blend over the packed words — (true & mask) | (false & !mask) — instead of the generic per-run builder (which degrades to per-element work on fragmented masks). The blend is mask-shape-independent.

It also introduces a shared zip_validity helper in the zip module: the result validity is itself a zip over the two boolean validity bitmaps, so it's built as a (lazy) zip array reusing this kernel. This gives the per-encoding zip kernels one shared validity-selection path, and the recursion terminates immediately because validity bitmaps are always Bool(NonNullable).

This lands first; the primitive and listview branchless zip kernels build on it (their nullable validity selection becomes a fast bool zip rather than the slow generic builder).

Changes

vortex-array/src/arrays/bool/compute/zip.rs (new): the kernel.
vortex-array/src/arrays/bool/compute/mod.rs, .../vtable/kernel.rs: register it.
vortex-array/src/scalar_fn/fns/zip/mod.rs: shared pub(crate) fn zip_validity.
vortex-array/benches/bool_zip.rs (new): a small divan bench.

Performance (divan, 65,536 bools, median)

case	time
nonnull	~7 µs
nullable	~14 µs

Testing

Unit tests for non-nullable and Validity::Array inputs spanning the 64-bit mask chunk boundary + remainder.
Full vortex-array lib suite passes (2933 tests); cargo +nightly fmt and clippy -D warnings (default + all-features) clean.

https://claude.ai/code/session_01N5ivPiCJy7dGQjMEP7ips9

Generated by Claude Code

Add a dedicated `ZipKernel for Bool` that blends the two value bitmaps with the mask in a single bitwise pass -- `(true & mask) | (false & !mask)` -- instead of the generic per-run builder, so boolean zips are branch-free and mask-shape independent. Also add a shared `zip_validity` helper to the zip module that builds the result validity as a (lazy) zip over the two boolean validity bitmaps, reusing this kernel. This gives the per-encoding zip kernels one shared validity-selection path; the recursion terminates immediately because validity bitmaps are non-nullable. Adds a small `bool_zip` divan benchmark (nonnull ~7us, nullable ~14us). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

The doc comment on the public ZipKernel impl linked to the pub(crate) zip_validity, which -D rustdoc::private-intra-doc-links rejects. Use a plain code span instead. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

codspeed-hq · 2026-06-05T16:51:10Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 6 regressed benchmarks
✅ 1503 untouched benchmarks
🆕 2 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	31.6 µs	46.6 µs	-32.16%
❌	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	161.9 µs	198.2 µs	-18.33%
❌	Simulation	`compare[15]`	119.9 µs	145.8 µs	-17.78%
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	177.2 µs	213.5 µs	-17.02%
❌	Simulation	`compare[14]`	117.5 µs	141.5 µs	-16.97%
❌	Simulation	`compare[13]`	115.5 µs	137.7 µs	-16.1%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	275.3 ns	216.9 ns	+26.89%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	336.9 ns	278.6 ns	+20.94%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[2048]`	400.6 ns	342.2 ns	+17.05%
⚡	Simulation	`compare[5]`	76.9 µs	69.2 µs	+11.16%
🆕	Simulation	`nonnull`	N/A	77.5 µs	N/A
🆕	Simulation	`nullable`	N/A	135.4 µs	N/A

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/bool-branchless-zip (7ad4b18) with develop (e06d80b)}

joseph-isaacs requested a review from a team June 5, 2026 16:37

joseph-isaacs added the changelog/performance A performance improvement label Jun 5, 2026 — with Claude

Avoid private intra-doc link to zip_validity in bool zip kernel

7ad4b18

The doc comment on the public ZipKernel impl linked to the pub(crate) zip_validity, which -D rustdoc::private-intra-doc-links rejects. Use a plain code span instead. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs mentioned this pull request Jun 5, 2026

perf: branchless primitive zip kernel #8270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: branchless boolean zip kernel#8275

perf: branchless boolean zip kernel#8275
joseph-isaacs wants to merge 2 commits into
developfrom
claude/bool-branchless-zip

joseph-isaacs commented Jun 5, 2026

Uh oh!

codspeed-hq Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented Jun 5, 2026

Summary

Changes

Performance (divan, 65,536 bools, median)

Testing

Uh oh!

codspeed-hq Bot commented Jun 5, 2026

Merging this PR will not alter performance

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants