Perf: Optimized bool take #5701

connortsui20 · 2025-12-11T18:21:58Z

Adds an optimization for BoolVector::take which is similar to the optimization we have for BoolArray's canonicalize function, but does an additional check for zero or one falses (instead of just zero or one trues). That code is located at https://github.com/vortex-data/vortex/blob/develop/vortex-array/src/arrays/dict/vtable/canonical.rs.

The difference here is that I use a heuristic check on the default take implementation on BoolVector (instead of only use this optimization for dictionary decompression) because I don't think there is any reason not to utilize this in general.

I still need to add some benchmarks.

codecov · 2025-12-11T18:30:00Z

Codecov Report

❌ Patch coverage is 68.13187% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.93%. Comparing base (7738f09) to head (622991b).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
vortex-compute/src/take/vector/bool.rs	68.13%	29 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codspeed-hq · 2025-12-11T21:46:03Z

CodSpeed Performance Report

Merging #5701 will not alter performance

_{Comparing ct/optimized-bool-take (622991b) with develop (7738f09)}

Summary

✅ 1202 untouched
🆕 54 new
⏩ 621 skipped¹

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
🆕	`default[100000_all_false]`	N/A	788.8 µs	N/A
🆕	`default[100000_mixed]`	N/A	788.9 µs	N/A
🆕	`default[100000_mixed_nulls]`	N/A	1.4 ms	N/A
🆕	`default[100000_null_with_true]`	N/A	1.4 ms	N/A
🆕	`default[100000_all_true]`	N/A	788.8 µs	N/A
🆕	`default[100000_single_true]`	N/A	788.9 µs	N/A
🆕	`default[100000_null_with_false]`	N/A	1.4 ms	N/A
🆕	`default[10000_mixed_nulls]`	N/A	148.6 µs	N/A
🆕	`default[10000_all_false]`	N/A	84.1 µs	N/A
🆕	`default[10000_all_true]`	N/A	84.1 µs	N/A
🆕	`default[10000_all_null]`	N/A	84.1 µs	N/A
🆕	`default[10000_null_with_false]`	N/A	148.6 µs	N/A
🆕	`default[100000_single_false]`	N/A	788.9 µs	N/A
🆕	`default[10000_mixed]`	N/A	84.1 µs	N/A
🆕	`default[10000_single_true]`	N/A	84.1 µs	N/A
🆕	`default[100000_all_null]`	N/A	788.8 µs	N/A
🆕	`default[10000_single_false]`	N/A	84.1 µs	N/A
🆕	`default[10000_null_with_true]`	N/A	148.6 µs	N/A
🆕	`default[1000_null_with_true]`	N/A	22 µs	N/A
🆕	`default[1000_mixed_nulls]`	N/A	21.7 µs	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

621 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

gatesn

No chance without a benchmark 😅

connortsui20 · 2025-12-12T20:24:38Z

locally:

vortex on  ct/optimized-bool-take [$!?] is 📦 v0.1.0 via 🐍 v3.14.2 via 🦀 v1.89.0
❯ cargo bench -p vortex-compute --bench bool_take
   Compiling vortex-compute v0.1.0 (/Users/connor/spiral/vortex-data/vortex/vortex-compute)
    Finished `bench` profile [optimized + debuginfo] target(s) in 0.43s
     Running benches/bool_take.rs (target/release/deps/bool_take-a0a6fdca86cd4b80)
Timer precision: 41 ns
bool_take                     fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ default                                  │               │               │               │         │
│  ├─ 1000_all_false          280.9 ns      │ 1.247 µs      │ 320 ns        │ 314 ns        │ 1000    │ 16000
│  ├─ 1000_all_null           283.5 ns      │ 965.9 ns      │ 320 ns        │ 317.4 ns      │ 1000    │ 16000
│  ├─ 1000_all_true           283.5 ns      │ 666.3 ns      │ 322.6 ns      │ 323.3 ns      │ 1000    │ 16000
│  ├─ 1000_mixed              299.2 ns      │ 1.072 µs      │ 338.3 ns      │ 341.1 ns      │ 1000    │ 16000
│  ├─ 1000_mixed_nulls        645.2 ns      │ 2.082 µs      │ 728.7 ns      │ 730.1 ns      │ 1000    │ 2000
│  ├─ 1000_null_with_false    619.4 ns      │ 1.015 µs      │ 707.9 ns      │ 721.8 ns      │ 1000    │ 8000
│  ├─ 1000_null_with_true     614.2 ns      │ 2.077 µs      │ 692.4 ns      │ 697.6 ns      │ 1000    │ 8000
│  ├─ 1000_single_false       301.8 ns      │ 2.507 µs      │ 353.9 ns      │ 365.2 ns      │ 1000    │ 16000
│  ├─ 1000_single_true        348.6 ns      │ 1.004 µs      │ 392.9 ns      │ 387 ns        │ 1000    │ 16000
│  ├─ 10000_all_false         2.499 µs      │ 6.52 µs       │ 2.812 µs      │ 2.763 µs      │ 1000    │ 2000
│  ├─ 10000_all_null          2.499 µs      │ 6.812 µs      │ 2.812 µs      │ 2.769 µs      │ 1000    │ 2000
│  ├─ 10000_all_true          2.499 µs      │ 8.353 µs      │ 2.812 µs      │ 2.775 µs      │ 1000    │ 2000
│  ├─ 10000_mixed             2.624 µs      │ 8.395 µs      │ 2.958 µs      │ 2.931 µs      │ 1000    │ 2000
│  ├─ 10000_mixed_nulls       5.249 µs      │ 17.37 µs      │ 5.916 µs      │ 5.917 µs      │ 1000    │ 1000
│  ├─ 10000_null_with_false   5.124 µs      │ 32.95 µs      │ 5.79 µs       │ 5.744 µs      │ 1000    │ 1000
│  ├─ 10000_null_with_true    5.207 µs      │ 13.24 µs      │ 5.457 µs      │ 5.664 µs      │ 1000    │ 1000
│  ├─ 10000_single_false      2.603 µs      │ 8.624 µs      │ 2.957 µs      │ 2.892 µs      │ 1000    │ 2000
│  ├─ 10000_single_true       2.624 µs      │ 8.124 µs      │ 2.957 µs      │ 2.902 µs      │ 1000    │ 2000
│  ├─ 100000_all_false        24.95 µs      │ 50.16 µs      │ 28.12 µs      │ 27.65 µs      │ 1000    │ 1000
│  ├─ 100000_all_null         24.87 µs      │ 42.7 µs       │ 27.87 µs      │ 27.22 µs      │ 1000    │ 1000
│  ├─ 100000_all_true         24.95 µs      │ 55.29 µs      │ 28.08 µs      │ 27.62 µs      │ 1000    │ 1000
│  ├─ 100000_mixed            26.37 µs      │ 40.16 µs      │ 29.66 µs      │ 29.42 µs      │ 1000    │ 1000
│  ├─ 100000_mixed_nulls      52.79 µs      │ 86.99 µs      │ 59.12 µs      │ 59.25 µs      │ 1000    │ 1000
│  ├─ 100000_null_with_false  51.33 µs      │ 96.95 µs      │ 57.54 µs      │ 57.1 µs       │ 1000    │ 1000
│  ├─ 100000_null_with_true   52.08 µs      │ 81.37 µs      │ 58.45 µs      │ 58.23 µs      │ 1000    │ 1000
│  ├─ 100000_single_false     26.16 µs      │ 45.66 µs      │ 29.37 µs      │ 28.98 µs      │ 1000    │ 1000
│  ╰─ 100000_single_true      26.2 µs       │ 44.41 µs      │ 29.54 µs      │ 28.98 µs      │ 1000    │ 1000
╰─ optimized                                │               │               │               │         │
   ├─ 1000_all_false          17.81 ns      │ 89.58 ns      │ 20.25 ns      │ 20.55 ns      │ 1000    │ 256000
   ├─ 1000_all_null           16.67 ns      │ 49.22 ns      │ 18.78 ns      │ 18.86 ns      │ 1000    │ 256000
   ├─ 1000_all_true           19.76 ns      │ 134.1 ns      │ 22.53 ns      │ 22.48 ns      │ 1000    │ 256000
   ├─ 1000_mixed              299.1 ns      │ 955.4 ns      │ 338.2 ns      │ 330.5 ns      │ 1000    │ 16000
   ├─ 1000_mixed_nulls        452.8 ns      │ 1.963 µs      │ 515.3 ns      │ 510.5 ns      │ 1000    │ 8000
   ├─ 1000_null_with_false    333 ns        │ 1.137 µs      │ 377.3 ns      │ 374.1 ns      │ 1000    │ 16000
   ├─ 1000_null_with_true     335.6 ns      │ 1.059 µs      │ 379.9 ns      │ 376.2 ns      │ 1000    │ 16000
   ├─ 1000_single_false       165.7 ns      │ 5.665 µs      │ 207.7 ns      │ 216.6 ns      │ 1000    │ 1000
   ├─ 1000_single_true        142.9 ns      │ 481.5 ns      │ 162.4 ns      │ 159.8 ns      │ 1000    │ 32000
   ├─ 10000_all_false         39.78 ns      │ 202.2 ns      │ 43.69 ns      │ 45.23 ns      │ 1000    │ 128000
   ├─ 10000_all_null          38.8 ns       │ 120.1 ns      │ 42.38 ns      │ 43.29 ns      │ 1000    │ 128000
   ├─ 10000_all_true          39.45 ns      │ 116.9 ns      │ 42.71 ns      │ 44.07 ns      │ 1000    │ 128000
   ├─ 10000_mixed             2.54 µs       │ 8.165 µs      │ 2.916 µs      │ 2.892 µs      │ 1000    │ 1000
   ├─ 10000_mixed_nulls       3.749 µs      │ 27.08 µs      │ 3.958 µs      │ 4.188 µs      │ 1000    │ 1000
   ├─ 10000_null_with_false   2.791 µs      │ 8.979 µs      │ 3.145 µs      │ 3.115 µs      │ 1000    │ 2000
   ├─ 10000_null_with_true    2.665 µs      │ 10.33 µs      │ 3.04 µs       │ 2.99 µs       │ 1000    │ 1000
   ├─ 10000_single_false      1.239 µs      │ 4.062 µs      │ 1.312 µs      │ 1.364 µs      │ 1000    │ 4000
   ├─ 10000_single_true       1.155 µs      │ 3.864 µs      │ 1.301 µs      │ 1.262 µs      │ 1000    │ 4000
   ├─ 100000_all_false        140.3 ns      │ 684.6 ns      │ 162.5 ns      │ 181.7 ns      │ 1000    │ 32000
   ├─ 100000_all_null         141.6 ns      │ 825.2 ns      │ 158.5 ns      │ 168.4 ns      │ 1000    │ 32000
   ├─ 100000_all_true         146.8 ns      │ 494.5 ns      │ 166.4 ns      │ 176.9 ns      │ 1000    │ 32000
   ├─ 100000_mixed            26.37 µs      │ 44.33 µs      │ 29.7 µs       │ 29.61 µs      │ 1000    │ 1000
   ├─ 100000_mixed_nulls      37.37 µs      │ 74.74 µs      │ 42.08 µs      │ 41.62 µs      │ 1000    │ 1000
   ├─ 100000_null_with_false  26.29 µs      │ 46.83 µs      │ 29.64 µs      │ 29.28 µs      │ 1000    │ 1000
   ├─ 100000_null_with_true   26.33 µs      │ 41.87 µs      │ 29.66 µs      │ 29.38 µs      │ 1000    │ 1000
   ├─ 100000_single_false     11.7 µs       │ 36.45 µs      │ 12.08 µs      │ 12.6 µs       │ 1000    │ 1000
   ╰─ 100000_single_true      11.24 µs      │ 20.99 µs      │ 12.62 µs      │ 12.18 µs      │ 1000    │ 1000

So basically no overhead to check for this optimization and at a minimum 2x faster if we do it (sometimes it is 10x faster)

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 requested review from gatesn, joseph-isaacs and robert3005 December 11, 2025 18:21

connortsui20 added the performance Release label indicating an improvement to performance label Dec 11, 2025

connortsui20 marked this pull request as ready for review December 11, 2025 20:25

connortsui20 marked this pull request as draft December 11, 2025 21:43

connortsui20 force-pushed the ct/optimized-bool-take branch from 2d06cdd to 1ebacee Compare December 12, 2025 14:37

gatesn requested changes Dec 12, 2025

View reviewed changes

connortsui20 force-pushed the ct/optimized-bool-take branch from 1ebacee to edd1e34 Compare December 12, 2025 18:51

connortsui20 marked this pull request as ready for review December 12, 2025 20:25

connortsui20 added 2 commits December 12, 2025 15:25

optimize bool take

639f8ff

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

add benchmarks

622991b

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/optimized-bool-take branch from fc0d937 to 622991b Compare December 12, 2025 20:26

connortsui20 enabled auto-merge (squash) December 12, 2025 20:26

connortsui20 requested a review from gatesn December 12, 2025 20:27

gatesn approved these changes Dec 12, 2025

View reviewed changes

connortsui20 merged commit 5c5f7d1 into develop Dec 12, 2025
47 checks passed

connortsui20 deleted the ct/optimized-bool-take branch December 12, 2025 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: Optimized bool take #5701

Perf: Optimized bool take #5701

Uh oh!

connortsui20 commented Dec 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

gatesn left a comment

Uh oh!

connortsui20 commented Dec 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Perf: Optimized bool take #5701

Perf: Optimized bool take #5701

Uh oh!

Conversation

connortsui20 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #5701 will not alter performance

Summary

Benchmarks breakdown

Footnotes

Uh oh!

gatesn left a comment

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

connortsui20 commented Dec 11, 2025 •

edited

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading

codspeed-hq bot commented Dec 11, 2025 •

edited

Loading

connortsui20 commented Dec 12, 2025 •

edited

Loading