Skip to content

vortex-row: compute_sizes helper and RowSize ScalarFn#7991

Closed
joseph-isaacs wants to merge 1 commit into
claude/row-c05-codec-nestedfrom
claude/row-c06-rowsize-scalarfn
Closed

vortex-row: compute_sizes helper and RowSize ScalarFn#7991
joseph-isaacs wants to merge 1 commit into
claude/row-c05-codec-nestedfrom
claude/row-c06-rowsize-scalarfn

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs commented May 18, 2026

Part 6 of 25 in the stacked PR series adding vortex-row.

This PR contains exactly one commit; review just that diff in isolation.

What this commit does

Adds the size-pass machinery used by both RowSize and the upcoming RowEncode pipeline. compute_sizes walks the N input columns once, classifying each via row_width_for_dtype and accumulating fixed-width-prefix sums in fixed_per_row while pushing per-row sums of variable-length columns into a lazily allocated var_lengths vec. The classification result (ColKind + SizePassResult) is private to the crate; RowEncode consumes it in a later commit to choose between the arithmetic and cursor encode paths. RowSize returns a Struct { fixed: U32, var: U32 } so callers can read the per-row width without realizing the constant fixed slot as a per-row buffer. dispatch_size is the fallback-only path here (canonicalize, then codec::field_size). The RowSizeKernel trait exists but is unused; per-encoding fast paths and the inventory registry arrive in PR 3. initialize() does NOT register RowSize yet — that lands once RowEncode is in place.

Stack

# PR Title Branch
1 #7986 vortex-row: crate scaffolding claude/row-c01-crate-scaffolding
2 #7987 vortex-row: add SortField and RowEncodeOptions claude/row-c02-sortfield-options
3 #7988 vortex-row: codec for fixed-width canonical types claude/row-c03-codec-fixed-width
4 #7989 vortex-row: codec for varlen canonical types claude/row-c04-codec-varlen
5 #7990 vortex-row: codec for nested canonical types claude/row-c05-codec-nested
6 #7991 vortex-row: compute_sizes helper and RowSize ScalarFn claude/row-c06-rowsize-scalarfn
7 #7992 vortex-row: RowEncode ScalarFn claude/row-c07-rowencode-scalarfn
8 #7993 vortex-row: convert_columns + tests + bench scaffolding claude/row-c08-convert-columns-tests-bench
9 #7994 Skip ListView validation in row encoder output claude/row-c09-skip-listview-validation
10 #7995 Add validity fast-path helper for the four pattern-matching encoders claude/row-c10-validity-fast-path
11 #7996 Skip zero-init of output buffer claude/row-c11-skip-zero-init
12 #7997 Auto-vectorize pure-fixed offsets construction claude/row-c12-vectorize-pure-fixed-offsets
13 #7998 Auto-vectorize mixed-path offsets construction claude/row-c13-vectorize-mixed-offsets
14 #7999 Rewrite varlen 32-byte block encoder with copy_nonoverlapping claude/row-c14-varlen-block-copy-nonoverlapping
15 #8000 Walk VarBinView rows directly in row encoder hot loop claude/row-c15-walk-varbinview-directly
16 #8001 Add arithmetic-write fast path for fixed-before-varlen columns claude/row-c16-arith-write-fast-path
17 #8002 Specialize Constant for the arithmetic-write fast path claude/row-c17-specialize-constant-arith
18 #8003 RowSizeKernel and RowEncodeKernel dispatch helpers claude/row-c18-kernel-dispatch-helpers
19 #8004 Inventory-based registry for downstream encoding kernels claude/row-c19-inventory-registry
20 #8005 Constant row-encode kernel claude/row-c20-constant-kernel
21 #8006 Dict row-encode kernel claude/row-c21-dict-kernel
22 #8007 Patched row-encode kernel claude/row-c22-patched-kernel
23 #8008 RunEnd row-encode kernel (vortex-runend) claude/row-c23-runend-kernel
24 #8009 BitPacked row-encode kernel (vortex-fastlanes) claude/row-c24-bitpacked-kernel
25 #7985 FoR and Delta row-encode kernels (vortex-fastlanes) claude/row-pr3-kernels

Base of this PR: #7990 (claude/row-c05-codec-nested)
Next in stack: #7992 (claude/row-c07-rowencode-scalarfn)

Combined context

For the full design + rationale, see PR #7985 (top of stack).

Add the size-pass machinery used by both RowSize and the upcoming
RowEncode pipeline. `compute_sizes` walks the N input columns once,
classifying each via `row_width_for_dtype` and accumulating
fixed-width-prefix sums in `fixed_per_row` while pushing per-row sums
of variable-length columns into a lazily allocated `var_lengths` vec.

The classification result (`ColKind` + `SizePassResult`) is private to
the crate; RowEncode consumes it in a later commit to choose between
the arithmetic and cursor encode paths.

`RowSize` returns a `Struct { fixed: U32, var: U32 }` so callers can
read the per-row width without realizing the constant `fixed` slot as
a per-row buffer (it's a `ConstantArray`); the `var` slot is a
`ConstantArray(0)` when no varlen column is present.

`dispatch_size` is the fallback-only path for PR 1 (canonicalize, then
codec::field_size). The `RowSizeKernel` trait exists but is unused; per-
encoding fast paths and the inventory registry arrive in PR 3.

`initialize()` does NOT register RowSize yet - that lands once
RowEncode is in place, so the session-registered pair appears together.

Signed-off-by: Claude <noreply@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 3 regressed benchmarks
✅ 1216 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_canonical_into[(1000, 10)] 197.6 µs 161.9 µs +22.08%
Simulation chunked_varbinview_into_canonical[(100, 100)] 358.2 µs 325 µs +10.19%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 187.6 µs 224.9 µs -16.56%
Simulation new_alp_prim_test_between[f32, 16384] 103.8 µs 118.3 µs -12.3%
Simulation new_alp_prim_test_between[f32, 32768] 153.1 µs 182.1 µs -15.89%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/row-c06-rowsize-scalarfn (5374f3b) with claude/row-c05-codec-nested (570d358)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants