Skip to content

Conversation

@connortsui20
Copy link
Contributor

@connortsui20 connortsui20 commented Dec 11, 2025

Adds an optimization for BoolVector::take which is similar to the optimization we have for BoolArray's canonicalize function, but does an additional check for zero or one falses (instead of just zero or one trues). That code is located at https://github.com/vortex-data/vortex/blob/develop/vortex-array/src/arrays/dict/vtable/canonical.rs.

The difference here is that I use a heuristic check on the default take implementation on BoolVector (instead of only use this optimization for dictionary decompression) because I don't think there is any reason not to utilize this in general.

I still need to add some benchmarks.

@connortsui20 connortsui20 added the performance Release label indicating an improvement to performance label Dec 11, 2025
@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 68.13187% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.93%. Comparing base (7738f09) to head (622991b).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-compute/src/take/vector/bool.rs 68.13% 29 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@connortsui20 connortsui20 marked this pull request as ready for review December 11, 2025 20:25
@connortsui20 connortsui20 marked this pull request as draft December 11, 2025 21:43
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 11, 2025

CodSpeed Performance Report

Merging #5701 will not alter performance

Comparing ct/optimized-bool-take (622991b) with develop (7738f09)

Summary

✅ 1202 untouched
🆕 54 new
⏩ 621 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 default[100000_all_false] N/A 788.8 µs N/A
🆕 default[100000_mixed] N/A 788.9 µs N/A
🆕 default[100000_mixed_nulls] N/A 1.4 ms N/A
🆕 default[100000_null_with_true] N/A 1.4 ms N/A
🆕 default[100000_all_true] N/A 788.8 µs N/A
🆕 default[100000_single_true] N/A 788.9 µs N/A
🆕 default[100000_null_with_false] N/A 1.4 ms N/A
🆕 default[10000_mixed_nulls] N/A 148.6 µs N/A
🆕 default[10000_all_false] N/A 84.1 µs N/A
🆕 default[10000_all_true] N/A 84.1 µs N/A
🆕 default[10000_all_null] N/A 84.1 µs N/A
🆕 default[10000_null_with_false] N/A 148.6 µs N/A
🆕 default[100000_single_false] N/A 788.9 µs N/A
🆕 default[10000_mixed] N/A 84.1 µs N/A
🆕 default[10000_single_true] N/A 84.1 µs N/A
🆕 default[100000_all_null] N/A 788.8 µs N/A
🆕 default[10000_single_false] N/A 84.1 µs N/A
🆕 default[10000_null_with_true] N/A 148.6 µs N/A
🆕 default[1000_null_with_true] N/A 22 µs N/A
🆕 default[1000_mixed_nulls] N/A 21.7 µs N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Footnotes

  1. 621 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20 connortsui20 force-pushed the ct/optimized-bool-take branch from 2d06cdd to 1ebacee Compare December 12, 2025 14:37
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No chance without a benchmark 😅

@connortsui20 connortsui20 force-pushed the ct/optimized-bool-take branch from 1ebacee to edd1e34 Compare December 12, 2025 18:51
@connortsui20
Copy link
Contributor Author

connortsui20 commented Dec 12, 2025

locally:

vortex on  ct/optimized-bool-take [$!?] is 📦 v0.1.0 via 🐍 v3.14.2 via 🦀 v1.89.0
❯ cargo bench -p vortex-compute --bench bool_take
   Compiling vortex-compute v0.1.0 (/Users/connor/spiral/vortex-data/vortex/vortex-compute)
    Finished `bench` profile [optimized + debuginfo] target(s) in 0.43s
     Running benches/bool_take.rs (target/release/deps/bool_take-a0a6fdca86cd4b80)
Timer precision: 41 ns
bool_take                     fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ default                                  │               │               │               │         │
│  ├─ 1000_all_false          280.9 ns      │ 1.247 µs      │ 320 ns        │ 314 ns        │ 1000    │ 16000
│  ├─ 1000_all_null           283.5 ns      │ 965.9 ns      │ 320 ns        │ 317.4 ns      │ 1000    │ 16000
│  ├─ 1000_all_true           283.5 ns      │ 666.3 ns      │ 322.6 ns      │ 323.3 ns      │ 1000    │ 16000
│  ├─ 1000_mixed              299.2 ns      │ 1.072 µs      │ 338.3 ns      │ 341.1 ns      │ 1000    │ 16000
│  ├─ 1000_mixed_nulls        645.2 ns      │ 2.082 µs      │ 728.7 ns      │ 730.1 ns      │ 1000    │ 2000
│  ├─ 1000_null_with_false    619.4 ns      │ 1.015 µs      │ 707.9 ns      │ 721.8 ns      │ 1000    │ 8000
│  ├─ 1000_null_with_true     614.2 ns      │ 2.077 µs      │ 692.4 ns      │ 697.6 ns      │ 1000    │ 8000
│  ├─ 1000_single_false       301.8 ns      │ 2.507 µs      │ 353.9 ns      │ 365.2 ns      │ 1000    │ 16000
│  ├─ 1000_single_true        348.6 ns      │ 1.004 µs      │ 392.9 ns      │ 387 ns        │ 1000    │ 16000
│  ├─ 10000_all_false         2.499 µs      │ 6.52 µs       │ 2.812 µs      │ 2.763 µs      │ 1000    │ 2000
│  ├─ 10000_all_null          2.499 µs      │ 6.812 µs      │ 2.812 µs      │ 2.769 µs      │ 1000    │ 2000
│  ├─ 10000_all_true          2.499 µs      │ 8.353 µs      │ 2.812 µs      │ 2.775 µs      │ 1000    │ 2000
│  ├─ 10000_mixed             2.624 µs      │ 8.395 µs      │ 2.958 µs      │ 2.931 µs      │ 1000    │ 2000
│  ├─ 10000_mixed_nulls       5.249 µs      │ 17.37 µs      │ 5.916 µs      │ 5.917 µs      │ 1000    │ 1000
│  ├─ 10000_null_with_false   5.124 µs      │ 32.95 µs      │ 5.79 µs       │ 5.744 µs      │ 1000    │ 1000
│  ├─ 10000_null_with_true    5.207 µs      │ 13.24 µs      │ 5.457 µs      │ 5.664 µs      │ 1000    │ 1000
│  ├─ 10000_single_false      2.603 µs      │ 8.624 µs      │ 2.957 µs      │ 2.892 µs      │ 1000    │ 2000
│  ├─ 10000_single_true       2.624 µs      │ 8.124 µs      │ 2.957 µs      │ 2.902 µs      │ 1000    │ 2000
│  ├─ 100000_all_false        24.95 µs      │ 50.16 µs      │ 28.12 µs      │ 27.65 µs      │ 1000    │ 1000
│  ├─ 100000_all_null         24.87 µs      │ 42.7 µs       │ 27.87 µs      │ 27.22 µs      │ 1000    │ 1000
│  ├─ 100000_all_true         24.95 µs      │ 55.29 µs      │ 28.08 µs      │ 27.62 µs      │ 1000    │ 1000
│  ├─ 100000_mixed            26.37 µs      │ 40.16 µs      │ 29.66 µs      │ 29.42 µs      │ 1000    │ 1000
│  ├─ 100000_mixed_nulls      52.79 µs      │ 86.99 µs      │ 59.12 µs      │ 59.25 µs      │ 1000    │ 1000
│  ├─ 100000_null_with_false  51.33 µs      │ 96.95 µs      │ 57.54 µs      │ 57.1 µs       │ 1000    │ 1000
│  ├─ 100000_null_with_true   52.08 µs      │ 81.37 µs      │ 58.45 µs      │ 58.23 µs      │ 1000    │ 1000
│  ├─ 100000_single_false     26.16 µs      │ 45.66 µs      │ 29.37 µs      │ 28.98 µs      │ 1000    │ 1000
│  ╰─ 100000_single_true      26.2 µs       │ 44.41 µs      │ 29.54 µs      │ 28.98 µs      │ 1000    │ 1000
╰─ optimized                                │               │               │               │         │
   ├─ 1000_all_false          17.81 ns      │ 89.58 ns      │ 20.25 ns      │ 20.55 ns      │ 1000    │ 256000
   ├─ 1000_all_null           16.67 ns      │ 49.22 ns      │ 18.78 ns      │ 18.86 ns      │ 1000    │ 256000
   ├─ 1000_all_true           19.76 ns      │ 134.1 ns      │ 22.53 ns      │ 22.48 ns      │ 1000    │ 256000
   ├─ 1000_mixed              299.1 ns      │ 955.4 ns      │ 338.2 ns      │ 330.5 ns      │ 1000    │ 16000
   ├─ 1000_mixed_nulls        452.8 ns      │ 1.963 µs      │ 515.3 ns      │ 510.5 ns      │ 1000    │ 8000
   ├─ 1000_null_with_false    333 ns        │ 1.137 µs      │ 377.3 ns      │ 374.1 ns      │ 1000    │ 16000
   ├─ 1000_null_with_true     335.6 ns      │ 1.059 µs      │ 379.9 ns      │ 376.2 ns      │ 1000    │ 16000
   ├─ 1000_single_false       165.7 ns      │ 5.665 µs      │ 207.7 ns      │ 216.6 ns      │ 1000    │ 1000
   ├─ 1000_single_true        142.9 ns      │ 481.5 ns      │ 162.4 ns      │ 159.8 ns      │ 1000    │ 32000
   ├─ 10000_all_false         39.78 ns      │ 202.2 ns      │ 43.69 ns      │ 45.23 ns      │ 1000    │ 128000
   ├─ 10000_all_null          38.8 ns       │ 120.1 ns      │ 42.38 ns      │ 43.29 ns      │ 1000    │ 128000
   ├─ 10000_all_true          39.45 ns      │ 116.9 ns      │ 42.71 ns      │ 44.07 ns      │ 1000    │ 128000
   ├─ 10000_mixed             2.54 µs       │ 8.165 µs      │ 2.916 µs      │ 2.892 µs      │ 1000    │ 1000
   ├─ 10000_mixed_nulls       3.749 µs      │ 27.08 µs      │ 3.958 µs      │ 4.188 µs      │ 1000    │ 1000
   ├─ 10000_null_with_false   2.791 µs      │ 8.979 µs      │ 3.145 µs      │ 3.115 µs      │ 1000    │ 2000
   ├─ 10000_null_with_true    2.665 µs      │ 10.33 µs      │ 3.04 µs       │ 2.99 µs       │ 1000    │ 1000
   ├─ 10000_single_false      1.239 µs      │ 4.062 µs      │ 1.312 µs      │ 1.364 µs      │ 1000    │ 4000
   ├─ 10000_single_true       1.155 µs      │ 3.864 µs      │ 1.301 µs      │ 1.262 µs      │ 1000    │ 4000
   ├─ 100000_all_false        140.3 ns      │ 684.6 ns      │ 162.5 ns      │ 181.7 ns      │ 1000    │ 32000
   ├─ 100000_all_null         141.6 ns      │ 825.2 ns      │ 158.5 ns      │ 168.4 ns      │ 1000    │ 32000
   ├─ 100000_all_true         146.8 ns      │ 494.5 ns      │ 166.4 ns      │ 176.9 ns      │ 1000    │ 32000
   ├─ 100000_mixed            26.37 µs      │ 44.33 µs      │ 29.7 µs       │ 29.61 µs      │ 1000    │ 1000
   ├─ 100000_mixed_nulls      37.37 µs      │ 74.74 µs      │ 42.08 µs      │ 41.62 µs      │ 1000    │ 1000
   ├─ 100000_null_with_false  26.29 µs      │ 46.83 µs      │ 29.64 µs      │ 29.28 µs      │ 1000    │ 1000
   ├─ 100000_null_with_true   26.33 µs      │ 41.87 µs      │ 29.66 µs      │ 29.38 µs      │ 1000    │ 1000
   ├─ 100000_single_false     11.7 µs       │ 36.45 µs      │ 12.08 µs      │ 12.6 µs       │ 1000    │ 1000
   ╰─ 100000_single_true      11.24 µs      │ 20.99 µs      │ 12.62 µs      │ 12.18 µs      │ 1000    │ 1000

So basically no overhead to check for this optimization and at a minimum 2x faster if we do it (sometimes it is 10x faster)

@connortsui20 connortsui20 marked this pull request as ready for review December 12, 2025 20:25
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 force-pushed the ct/optimized-bool-take branch from fc0d937 to 622991b Compare December 12, 2025 20:26
@connortsui20 connortsui20 enabled auto-merge (squash) December 12, 2025 20:26
@connortsui20 connortsui20 requested a review from gatesn December 12, 2025 20:27
@connortsui20 connortsui20 merged commit 5c5f7d1 into develop Dec 12, 2025
47 checks passed
@connortsui20 connortsui20 deleted the ct/optimized-bool-take branch December 12, 2025 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Release label indicating an improvement to performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants