Skip to content

Use unaligned bit buffer iterators where possible#6880

Merged
robert3005 merged 4 commits intodevelopfrom
rk/faster_bit_buffer
Mar 11, 2026
Merged

Use unaligned bit buffer iterators where possible#6880
robert3005 merged 4 commits intodevelopfrom
rk/faster_bit_buffer

Conversation

@robert3005
Copy link
Contributor

Use unalingned bit iterators where possible, they are faster as they avoid
unaligned reads. They're always fine to use for unary operations and can be used
for n-ary operations if all buffers are aligned

Summary

We used to have this optimisation when we added BitBuffer but we reverted to
arrow implementation of bit iterators and it got lost in the process.

Closes: #6836

Testing

Existing tests

Signed-off-by: Robert Kruszewski github@robertk.io

@robert3005 robert3005 force-pushed the rk/faster_bit_buffer branch from 91a3c94 to c79af28 Compare March 11, 2026 11:53
@robert3005 robert3005 added the changelog/performance A performance improvement label Mar 11, 2026
Signed-off-by: Robert Kruszewski <github@robertk.io>
@robert3005 robert3005 force-pushed the rk/faster_bit_buffer branch from c79af28 to 7aa51ca Compare March 11, 2026 11:57
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 11, 2026

Merging this PR will degrade performance by 20.89%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 14 improved benchmarks
❌ 5 regressed benchmarks
✅ 1010 untouched benchmarks
⏩ 1466 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 3.8 ms 3.3 ms +15.82%
Simulation chunked_varbinview_opt_canonical_into[(100, 100)] 4.2 ms 3.7 ms +14.53%
Simulation chunked_varbinview_opt_into_canonical[(100, 100)] 4.2 ms 3.7 ms +14.41%
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 3.8 ms 3.3 ms +15.77%
Simulation varbinview_zip_block_mask 27.4 ms 24 ms +14.29%
Simulation dict_canonicalize_zipfian[16, 1000] 56.2 µs 63.4 µs -11.39%
Simulation varbinview_zip_fragmented_mask 31.5 ms 28.1 ms +12.18%
Simulation patched_take_200k_first_chunk_only 4.8 ms 5.4 ms -10.68%
Simulation patched_take_200k_dispersed 4.7 ms 5.6 ms -16.56%
Simulation take_200k_first_chunk_only 3.3 ms 4.2 ms -20.89%
Simulation take_200k_dispersed 3.6 ms 4.5 ms -19.6%
Simulation bitwise_and_vortex_buffer[16384] 13.7 µs 11 µs +23.77%
Simulation bitwise_and_vortex_buffer[65536] 40.9 µs 29.7 µs +37.42%
Simulation bitwise_not_vortex_buffer[16384] 10.4 µs 9.3 µs +12.14%
Simulation bitwise_not_vortex_buffer[2048] 5.5 µs 4.6 µs +18.18%
Simulation bitwise_or_vortex_buffer[16384] 13.7 µs 11.1 µs +23.71%
Simulation bitwise_or_vortex_buffer[65536] 40.9 µs 29.8 µs +37.38%
Simulation bitwise_not_vortex_buffer[65536] 26.7 µs 21.5 µs +24.23%
Simulation from_iter_bit_buffer[128] 5.3 µs 4.8 µs +10.18%

Comparing rk/faster_bit_buffer (1f588c9) with develop (11a2733)

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
@0ax1 0ax1 self-requested a review March 11, 2026 12:21
@robert3005
Copy link
Contributor Author

I have to revert the last commit before merging

This reverts commit 8f132d9.
@robert3005 robert3005 merged commit 2d1d952 into develop Mar 11, 2026
53 of 54 checks passed
@robert3005 robert3005 deleted the rk/faster_bit_buffer branch March 11, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use unaligned bitwise iterators for unary and for binary operations where possible

2 participants