Skip to content

Add Sparse pushdown kernels for is_constant, sum, and compare#8028

Open
joseph-isaacs wants to merge 3 commits into
developfrom
claude/sparse-pushdown-kernels-qxU4x
Open

Add Sparse pushdown kernels for is_constant, sum, and compare#8028
joseph-isaacs wants to merge 3 commits into
developfrom
claude/sparse-pushdown-kernels-qxU4x

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs commented May 20, 2026

Sparse arrays previously had no aggregate or compare pushdown, so
is_constant, sum, and <column> op <scalar> on a Sparse column all
fell through to full canonical materialization — O(N) work regardless of
patch density.

claude added 2 commits May 19, 2026 23:35
Sparse arrays previously had no aggregate or compare pushdown, so
`is_constant`, `sum`, and `<column> op <scalar>` on a Sparse column all
fell through to full canonical materialization — O(N) work regardless of
patch density.

Each new kernel pushes the operation into the patches:

- `SparseIsConstantKernel` checks `is_constant(patch_values)` and
  whether the common patch value equals the fill scalar.
- `SparseSumKernel` folds `fill * (N - P) + sum(patch_values)` through
  the existing `Sum` accumulator so overflow saturation is preserved.
- `CompareKernel for Sparse` maps a constant-RHS comparison through
  `patches.map_values` and rebuilds a `Sparse<Bool>` with `scalar_cmp`
  applied to the fill, preserving downstream sparsity (the filter
  parent kernel already handles `Sparse<Bool>` masks).

All three are O(P) instead of O(N). Benchmarks on a 1M-element Sparse i32
with non-null fill show:

- `is_constant`:  78-93x speedup (137us -> 1.7us at P=10..1000)
- `sum`:         109-581x speedup (768us -> 1.3us at P=10)
- `compare`:      19-84x speedup (777us -> 9us at P=10 with downstream
                  canonicalization; bigger when consumers stay sparse)

Aggregate kernels are wired through the session-scoped registry via a
new `vortex_sparse::initialize` (called from `vortex-file`'s default
encodings). Compare is wired through `PARENT_KERNELS` so it fires
during `execute_parent` on `ScalarFn(Binary, cmp)` nodes whose child is
Sparse.

Signed-off-by: Claude <noreply@anthropic.com>
CodSpeed tracked 24 entries (canonical+kernel × 4 args × 3 ops). Collapse
to exactly three benchmarks — one per kernel, single config each, sized so
each lands in the ~10-100µs range:

- sparse_is_constant: ~87µs  (150k constant patches, worst case: full scan)
- sparse_sum:         ~33µs  (100k patches)
- sparse_compare:     ~41µs  (10k patches, materialized result)

The canonical baselines are dropped; CodSpeed only needs to track the
kernel path going forward.

Signed-off-by: Claude <noreply@anthropic.com>
@joseph-isaacs joseph-isaacs added the changelog/performance A performance improvement label May 20, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will improve performance by 19.73%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 1236 untouched benchmarks
🆕 5 new benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 224.6 µs 187.6 µs +19.73%
🆕 Simulation sparse_null_count N/A 34.5 µs N/A
🆕 Simulation sparse_compare N/A 393.2 µs N/A
🆕 Simulation sparse_min_max N/A 233.5 µs N/A
🆕 Simulation sparse_is_constant N/A 250.7 µs N/A
🆕 Simulation sparse_sum N/A 455.6 µs N/A

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/sparse-pushdown-kernels-qxU4x (56d5689) with develop (f97805d)

Open in CodSpeed

Extends the Sparse pushdown set with the remaining high-value kernels, all
O(num_patches) instead of O(N):

Aggregates (session-registered in `initialize`):
- SparseMinMaxKernel:  folds min/max(patch_values) with the fill scalar when
  the fill is reachable (P < N) and non-null.
- SparseNullCountKernel:  null_count(patch_values) + (fill null ? N-P : 0);
  O(1) when the patch null-count stat is cached.
- SparseNanCountKernel:  nan_count(patch_values) + (fill NaN ? N-P : 0);
  declines for non-float dtypes.

Filter pushdowns (wired via PARENT_KERNELS):
- BetweenKernel:  range predicate with constant bounds → Sparse<Bool>, same
  shape as the compare kernel.
- FillNullKernel:  replaces null fill/patches with the constant, stays sparse.

MinMax and NullCount in particular are the zone-map/pruning kernels that Dict
and RunEnd already had and Sparse lacked.

Deliberately skipped: Mask (a dense mask masks unpatched fill positions, so the
result can't stay sparse), IsSorted (rarely true for sparse, position-dependent
and error-prone), Like/Zip/ListContainsElement (niche string/list cases), and
Mean (already free via Combined<Sum, Count>).

Benches: added sparse_min_max and sparse_null_count alongside the existing
three (skipping between/fill_null/nan_count, which mirror compare/null_count
cost profiles). All five single-config, ~50-80µs.

Tests compare each kernel against the canonical baseline (aggregates via an
unregistered session; parent kernels by canonicalizing the input first).

Signed-off-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants