perf: aggregate min/max by joseph-isaacs · Pull Request #8061 · vortex-data/vortex

joseph-isaacs · 2026-05-22T12:28:52Z

Adds a divan benchmark exercising the min/max aggregation over primitive
arrays (i32/i64/f64, with and without nulls) so we can measure and inspect
the codegen of the max reduction path.

Signed-off-by: Joe Isaacs joe.isaacs@live.co.uk

Adds a divan benchmark exercising the min/max aggregation over primitive arrays (i32/i64/f64, with and without nulls) so we can measure and inspect the codegen of the max reduction path. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

The all-valid primitive min/max path used `itertools::minmax_by` with a `total_compare` closure preceded by a NaN filter, which the autovectorizer could not lower to packed min/max, leaving a scalar cmov reduction. Route the all-true mask case for integer ptypes through a plain reduction. Integers have no NaNs, so the NaN filter is unnecessary and LLVM vectorizes the loop (pmaxub/pmaxsw, and pcmpgtd-based blends for i32/i64). Floats keep the existing NaN-aware path. Benchmarked over 1M elements: i32 all-valid ~2.93ms -> ~0.36ms, i64 ~3.02ms -> ~0.55ms. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

codspeed-hq · 2026-05-22T12:41:06Z

Merging this PR will improve performance by 14.84%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 7 improved benchmarks
❌ 1 regressed benchmark
✅ 1243 untouched benchmarks
🆕 6 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	Simulation	`max_f64`	N/A	1.1 ms	N/A
🆕	Simulation	`max_i32`	N/A	223.3 µs	N/A
🆕	Simulation	`max_i64`	N/A	486.1 µs	N/A
🆕	Simulation	`sum_i32`	N/A	222.3 µs	N/A
🆕	Simulation	`sum_i64`	N/A	600.7 µs	N/A
🆕	Simulation	`sum_u32`	N/A	269.6 µs	N/A
❌	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	273.1 µs	308 µs	-11.32%
⚡	Simulation	`encode_primitives[u8, (10000, 2)]`	313.9 µs	278 µs	+12.9%
⚡	Simulation	`encode_primitives[u8, (10000, 32)]`	318.4 µs	282.3 µs	+12.81%
⚡	Simulation	`encode_primitives[u8, (10000, 4)]`	314.3 µs	278.2 µs	+12.95%
⚡	Simulation	`encode_primitives[u8, (10000, 512)]`	335.2 µs	299 µs	+12.09%
⚡	Simulation	`encode_primitives[u8, (10000, 8)]`	315.1 µs	279 µs	+12.93%
⚡	Simulation	`for_compress_i32`	753.4 µs	443.8 µs	+69.76%
⚡	Simulation	`take_10k_contiguous`	309.7 µs	280.6 µs	+10.38%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/great-edison-jrGY0 (8b98b5d) with develop (ae19fe7)}

Keep a single all-valid bench for i32, i64, and f64 instead of the per-type all-valid/half-null pairs. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

robert3005 · 2026-05-22T15:14:29Z

+        .with_inputs(|| PrimitiveArray::from_iter(data.iter().copied()).into_array())
+        .bench_refs(|a| {
+            a.statistics()
+                .compute_max::<i32>(&mut LEGACY_SESSION.create_execution_ctx())


can you create a local session here?

The all-valid integer sum did a per-element `checked_add`, whose overflow early-return branch blocked autovectorization, leaving a scalar loop. Sum narrower-than-64-bit integers in chunks of 65536 into a widened 64-bit accumulator with no per-element check: a chunk of <64-bit values cannot overflow the 64-bit accumulator (2^16 * (2^32-1) < 2^64), so only one checked add per chunk is needed. This lets the inner loop vectorize to packed widening adds (paddq + unpck). 64-bit inputs keep the per-element checked path since a chunk of 64-bit values could itself overflow. This observes overflow at chunk boundaries rather than per element, so a signed sum whose running total transiently leaves i64 range but ends in range now returns the true total instead of null. The final result is unchanged whenever the existing per-batch combine did not already overflow. Benchmarked over 100k elements: sum_i32 ~19us, sum_u32 ~15us, sum_i64 ~51us. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

claude added 2 commits May 22, 2026 11:13

Add aggregate max divan benchmark

1831743

Adds a divan benchmark exercising the min/max aggregation over primitive arrays (i32/i64/f64, with and without nulls) so we can measure and inspect the codegen of the max reduction path. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs changed the title ~~Add aggregate max divan benchmark~~ [claude] Add aggregate max divan benchmark May 22, 2026

claude added 2 commits May 22, 2026 13:18

Simplify aggregate max benchmark to one bench per type

ac47a14

Keep a single all-valid bench for i32, i64, and f64 instead of the per-type all-valid/half-null pairs. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Reduce aggregate max benchmark array length to 100k

08abd6a

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs requested a review from robert3005 May 22, 2026 14:30

joseph-isaacs changed the title ~~[claude] Add aggregate max divan benchmark~~ perf: aggregate min/max May 22, 2026

joseph-isaacs added the changelog/performance A performance improvement label May 22, 2026

robert3005 approved these changes May 22, 2026

View reviewed changes

robert3005 reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: aggregate min/max#8061

perf: aggregate min/max#8061
joseph-isaacs wants to merge 5 commits into
developfrom
claude/great-edison-jrGY0

joseph-isaacs commented May 22, 2026

Uh oh!

codspeed-hq Bot commented May 22, 2026 •

edited

Loading

Uh oh!

robert3005 May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joseph-isaacs commented May 22, 2026

Uh oh!

codspeed-hq Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 14.84%

Performance Changes

Uh oh!

robert3005 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented May 22, 2026 •

edited

Loading