Add Sparse pushdown kernels for is_constant, sum, and compare#8028
Open
joseph-isaacs wants to merge 3 commits into
Open
Add Sparse pushdown kernels for is_constant, sum, and compare#8028joseph-isaacs wants to merge 3 commits into
joseph-isaacs wants to merge 3 commits into
Conversation
Sparse arrays previously had no aggregate or compare pushdown, so
`is_constant`, `sum`, and `<column> op <scalar>` on a Sparse column all
fell through to full canonical materialization — O(N) work regardless of
patch density.
Each new kernel pushes the operation into the patches:
- `SparseIsConstantKernel` checks `is_constant(patch_values)` and
whether the common patch value equals the fill scalar.
- `SparseSumKernel` folds `fill * (N - P) + sum(patch_values)` through
the existing `Sum` accumulator so overflow saturation is preserved.
- `CompareKernel for Sparse` maps a constant-RHS comparison through
`patches.map_values` and rebuilds a `Sparse<Bool>` with `scalar_cmp`
applied to the fill, preserving downstream sparsity (the filter
parent kernel already handles `Sparse<Bool>` masks).
All three are O(P) instead of O(N). Benchmarks on a 1M-element Sparse i32
with non-null fill show:
- `is_constant`: 78-93x speedup (137us -> 1.7us at P=10..1000)
- `sum`: 109-581x speedup (768us -> 1.3us at P=10)
- `compare`: 19-84x speedup (777us -> 9us at P=10 with downstream
canonicalization; bigger when consumers stay sparse)
Aggregate kernels are wired through the session-scoped registry via a
new `vortex_sparse::initialize` (called from `vortex-file`'s default
encodings). Compare is wired through `PARENT_KERNELS` so it fires
during `execute_parent` on `ScalarFn(Binary, cmp)` nodes whose child is
Sparse.
Signed-off-by: Claude <noreply@anthropic.com>
CodSpeed tracked 24 entries (canonical+kernel × 4 args × 3 ops). Collapse to exactly three benchmarks — one per kernel, single config each, sized so each lands in the ~10-100µs range: - sparse_is_constant: ~87µs (150k constant patches, worst case: full scan) - sparse_sum: ~33µs (100k patches) - sparse_compare: ~41µs (10k patches, materialized result) The canonical baselines are dropped; CodSpeed only needs to track the kernel path going forward. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will improve performance by 19.73%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
224.6 µs | 187.6 µs | +19.73% |
| 🆕 | Simulation | sparse_null_count |
N/A | 34.5 µs | N/A |
| 🆕 | Simulation | sparse_compare |
N/A | 393.2 µs | N/A |
| 🆕 | Simulation | sparse_min_max |
N/A | 233.5 µs | N/A |
| 🆕 | Simulation | sparse_is_constant |
N/A | 250.7 µs | N/A |
| 🆕 | Simulation | sparse_sum |
N/A | 455.6 µs | N/A |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/sparse-pushdown-kernels-qxU4x (56d5689) with develop (f97805d)
Extends the Sparse pushdown set with the remaining high-value kernels, all O(num_patches) instead of O(N): Aggregates (session-registered in `initialize`): - SparseMinMaxKernel: folds min/max(patch_values) with the fill scalar when the fill is reachable (P < N) and non-null. - SparseNullCountKernel: null_count(patch_values) + (fill null ? N-P : 0); O(1) when the patch null-count stat is cached. - SparseNanCountKernel: nan_count(patch_values) + (fill NaN ? N-P : 0); declines for non-float dtypes. Filter pushdowns (wired via PARENT_KERNELS): - BetweenKernel: range predicate with constant bounds → Sparse<Bool>, same shape as the compare kernel. - FillNullKernel: replaces null fill/patches with the constant, stays sparse. MinMax and NullCount in particular are the zone-map/pruning kernels that Dict and RunEnd already had and Sparse lacked. Deliberately skipped: Mask (a dense mask masks unpatched fill positions, so the result can't stay sparse), IsSorted (rarely true for sparse, position-dependent and error-prone), Like/Zip/ListContainsElement (niche string/list cases), and Mean (already free via Combined<Sum, Count>). Benches: added sparse_min_max and sparse_null_count alongside the existing three (skipping between/fill_null/nan_count, which mirror compare/null_count cost profiles). All five single-config, ~50-80µs. Tests compare each kernel against the canonical baseline (aggregates via an unregistered session; parent kernels by canonicalizing the input first). Signed-off-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sparse arrays previously had no aggregate or compare pushdown, so
is_constant,sum, and<column> op <scalar>on a Sparse column allfell through to full canonical materialization — O(N) work regardless of
patch density.