Skip to content

Array VTables Migration 1a#7181

Merged
gatesn merged 4 commits intodevelopfrom
ngates/array-vtables-1a
Mar 26, 2026
Merged

Array VTables Migration 1a#7181
gatesn merged 4 commits intodevelopfrom
ngates/array-vtables-1a

Conversation

@gatesn
Copy link
Copy Markdown
Contributor

@gatesn gatesn commented Mar 26, 2026

This introduces the end-state struct Array<V> per the vtables design doc: https://docs.vortex.dev/developer-guide/internals/vtables

It keeps the old ArrayAdapter around for now, to be removed at a later date.

Array VTable Migration Plan

Context

The Array vtable system is the last major plugin point that hasn't been migrated to the unified
vtable pattern described in docs/developer-guide/internals/vtables.md. ScalarFn, AggregateFn,
and ExtDType already follow the target pattern. Array is the largest migration (~40 encodings
across vortex-array, encodings/*, vortex-python, vortex-cuda).

Current state: VTable trait + ArrayAdapter<V> (#[repr(transparent)]) + DynArray public
sealed trait + vtable! macro. Each concrete array struct (e.g. PrimitiveArray) owns dtype, len,
stats and derefs to dyn DynArray via unsafe casts through ArrayAdapter.

Target state: ArrayVTable trait + Array<V> generic data struct (owns dtype, len, stats,
derefs to V::Array) + sealed DynArray thin forwarder + ArrayRef newtype. Follows the
FooRef → DynFoo → Foo<V> → FooVTable dispatch chain.


Phase 1: Introduce Array<V>, rename trait, change signatures

Goal: Replace the core VTable machinery in one atomic change. All encodings switch from
ArrayAdapter<V> to Array<V> at once since VTable method signatures change from &Self::Array
to &Array<Self>.

This is a large but mechanical change — each encoding's update follows a predictable recipe.

1.1 Rename VTableArrayVTable

  • vortex-array/src/vtable/mod.rs: rename trait
  • Codebase-wide find-and-replace: impl VTable forimpl ArrayVTable for, V: VTable
    V: ArrayVTable, etc.

1.2 Define Array<V: ArrayVTable>

New file vortex-array/src/vtable/typed.rs:

pub struct Array<V: ArrayVTable> {
    vtable: V,
    dtype: DType,
    len: usize,
    array: V::Array,    // encoding-specific data
    stats: ArrayStats,
}

impl<V: ArrayVTable> Deref for Array<V> {
    type Target = V::Array;
    fn deref(&self) -> &V::Array { &self.array }
}

Inherent methods on Array<V>: dtype(), len(), is_empty(), statistics(), encoding_id(),
into_inner(), constructors.

1.3 Change ArrayVTable method signatures

All VTable methods change from &Self::Array to &Array<Self>:

// Before
fn len(array: &Self::Array) -> usize;
fn dtype(array: &Self::Array) -> &DType;

// After
fn len(array: &Array<Self>) -> usize;
fn dtype(array: &Array<Self>) -> &DType;

Since Array<V> derefs to V::Array, encoding impls that access array.buffer etc. continue
to work unchanged via deref. The method signature annotation is the only change in most cases.

Remove fn vtable(array: &Self::Array) -> &Self — vtable now stored in Array<V>.

1.4 Implement DynArray for Array<V>

Replace impl<V: VTable> DynArray for ArrayAdapter<V> with impl<V: ArrayVTable> DynArray for Array<V>. This is the blanket forwarder — all the logic that currently lives in the
ArrayAdapter impl (bounds checking, stat propagation, slice/filter/take wrapping) moves to
Array<V> inherent methods, and the DynArray impl becomes thin forwarders.

Also move these trait impls from ArrayAdapter<V> to Array<V>:

  • ArrayHash, ArrayEq
  • ArrayVisitor
  • ReduceNode
  • private::Sealed

1.5 Migrate all encodings

Per-encoding recipe (example: Primitive):

  1. Hoist common fields out of PrimitiveArray:

    • Remove dtype: DType → now in Array<V>
    • Remove stats_set: ArrayStats → now in Array<V>
    • PrimitiveArray keeps only: buffer: BufferHandle, validity: Validity
  2. Update VTable impl signatures:

    impl ArrayVTable for Primitive {
        fn len(array: &Array<Self>) -> usize {
            array.buffer_handle().len() / array.ptype().byte_width()  // deref to PrimitiveArray
        }
        fn dtype(array: &Array<Self>) -> &DType {
            &array.dtype  // Array<V> inherent field
        }
        fn stats(array: &Array<Self>) -> StatsSetRef<'_> {
            array.stats.to_ref(...)  // Array<V> field
        }
        // ...
    }
  3. Move constructors to vtable ZST:

    • PrimitiveArray::new(buffer, ptype, validity)Primitive::new(buffer, ptype, validity) -> ArrayRef
    • Or return Array<Primitive> for typed construction
    • Keep PrimitiveArray as the inner type with encoding-specific methods
  4. Remove vtable!(Primitive) — no longer needed. Array<V> handles IntoArray via its
    own impl IntoArray for Array<V>.

  5. Remove PrimitiveArray's Deref to dyn DynArray — this was generated by vtable! and
    is no longer needed. Array<V> implements DynArray directly.

1.6 Update Matcher

impl<V: ArrayVTable> Matcher for V {
    type Match<'a> = &'a V::Array;  // keep returning inner type for backward compat

    fn try_match(array: &dyn DynArray) -> Option<&V::Array> {
        DynArray::as_any(array)
            .downcast_ref::<Array<V>>()
            .map(|typed| &typed.array)
    }
}

1.7 Remove old machinery

  • Delete ArrayAdapter<V> struct
  • Delete vtable! macro
  • Delete old Matcher for V impl that used ArrayAdapter
  • Simplify downcast/downcast_owned/upcast_array helpers — no more #[repr(transparent)]
    tricks needed; use standard Arc::downcast::<Array<V>>()

Key files modified

File Change
vortex-array/src/vtable/mod.rs Rename trait, remove vtable! macro, remove helpers
vortex-array/src/vtable/typed.rs NEW — Array struct, inherent methods, DynArray blanket
vortex-array/src/vtable/dyn_.rs Simplify downcast helpers, update DynVTable blanket
vortex-array/src/vtable/operations.rs Change &V::Array&Array<V>
vortex-array/src/vtable/validity.rs Change &V::Array&Array<V>
vortex-array/src/array/mod.rs Remove ArrayAdapter, update Matcher, move DynArray impl
vortex-array/src/arrays/*/vtable/*.rs Update all ~20 in-tree encoding VTable impls
vortex-array/src/arrays/*/array/*.rs Hoist dtype/stats from all ~20 inner array types
encodings/*/src/*.rs Update all ~15 external encoding VTable impls
vortex-python/src/arrays/py/vtable.rs Update Python bindings
vortex-cuda/src/layout.rs Update CUDA encoding

Verification

  • cargo build across entire workspace
  • cargo test across entire workspace
  • cargo clippy --all-targets --all-features
  • cargo +nightly fmt --all
  • cargo xtask public-api (public API changes expected — new Array<V>, renamed trait)

Phase 2: Clean up erased layer

Goal: Migrate ArrayRef from type alias to newtype struct. Move public API from
impl dyn DynArray to impl ArrayRef. Make DynArray private. Introduce ArrayPlugin.

This can be done incrementally after Phase 1.

2.1 Make ArrayRef a newtype

// Before
pub type ArrayRef = Arc<dyn DynArray>;

// After
#[derive(Clone)]
pub struct ArrayRef(Arc<dyn DynArray>);

Add Deref<Target = dyn DynArray> on ArrayRef so existing call sites that use array_ref.len()
etc. continue to work. This provides a compat bridge while we migrate callers.

2.2 Move impl dyn DynArray methods to impl ArrayRef

Methods currently on impl dyn DynArray + '_:

  • is::<V>(), as_::<V>(), as_opt::<V>(), try_into::<V>()impl ArrayRef
  • as_constant(), nbytes(), is_arrow(), is_canonical()impl ArrayRef
  • with_child()impl ArrayRef

Remove impl DynArray for Arc<dyn DynArray> forwarding impl (no longer needed with newtype).

2.3 Make DynArray private

Change DynArray to pub(crate). External callers use ArrayRef methods only.
The sealed trait becomes truly internal plumbing.

2.4 Introduce ArrayPlugin

Replace DynVTable role in registry with ArrayPlugin trait following the pattern from
ScalarFn's plugin:

pub trait ArrayPlugin: 'static + Send + Sync {
    fn id(&self) -> ArrayId;
    fn build(&self, dtype: &DType, len: usize, metadata: &[u8],
             buffers: &[BufferHandle], children: &dyn ArrayChildren,
             session: &VortexSession) -> VortexResult<ArrayRef>;
}

Update ArraySession registry to use ArrayPlugin instead of DynVTableRef.

2.5 Update Matcher return type (optional, separate step)

Change Match<'a> from &'a V::Array to &'a Array<V>. This gives callers access to common
fields (dtype, len, stats) directly on the typed handle, plus deref to V::Array for
encoding-specific fields.

This is a breaking change for any code that explicitly annotates the match type, but most code
uses type inference and gets V::Array methods via deref anyway.

Key files modified

File Change
vortex-array/src/array/mod.rs ArrayRef newtype, move methods, remove DynArray forwarding
vortex-array/src/vtable/dyn_.rs Remove DynVTable, introduce ArrayPlugin
vortex-array/src/session/mod.rs Update registry to use ArrayPlugin
vortex-array/src/matcher.rs Optionally update return type
Many call sites Arc<dyn DynArray>ArrayRef (mostly handled by newtype + Deref)

Verification

Same as Phase 1, plus:

  • Verify Python bindings still work (vortex-python)
  • Verify file read/write round-trips (cargo test -p vortex-file)

Risk Areas

  1. Size of Phase 1 — touching ~40 encodings in one PR is large. Consider splitting into
    sub-PRs: (a) infrastructure (Array, trait rename, DynArray blanket), (b) in-tree encodings,
    (c) external encodings. The repo won't compile between (a) and (b).

  2. Unsafe code removalArrayAdapter used #[repr(transparent)] for zero-cost transmutes.
    Array<V> has multiple fields so Arc::downcast replaces transmute. Verify no performance
    regression in downcast-heavy paths (e.g., execution engine).

  3. Constructor migration — moving constructors from PrimitiveArray::new() to
    Primitive::new() touches many call sites across the workspace.

  4. DynArray for Arc<dyn DynArray> — removing this forwarding impl in Phase 2 may break
    generic code that passes ArrayRef where &dyn DynArray is expected. Audit usage patterns.

  5. External encoding cratesencodings/* are in-repo but separate crates. They'll need
    coordinated updates in Phase 1.

  6. Python/CUDA bindings — these have special vtable implementations (non-ZST vtables for
    language bindings). Verify Array<V> works correctly with non-ZST V.

gatesn added 2 commits March 26, 2026 11:54
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the changelog/chore A trivial change label Mar 26, 2026
@gatesn gatesn requested a review from a10y March 26, 2026 15:56
@gatesn gatesn enabled auto-merge (squash) March 26, 2026 15:57
gatesn added 2 commits March 26, 2026 11:57
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 26, 2026

Merging this PR will degrade performance by 10.18%

❌ 1 regressed benchmark
✅ 1105 untouched benchmarks
⏩ 1522 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_into_canonical[(10, 1000)] 3.4 ms 3.8 ms -10.18%

Comparing ngates/array-vtables-1a (818b5cd) with develop (1e0e6d0)2

Open in CodSpeed

Footnotes

  1. 1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (be4761a) during the generation of this report, so 1e0e6d0 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@a10y
Copy link
Copy Markdown
Contributor

a10y commented Mar 26, 2026

why do we want to do this?

@gatesn
Copy link
Copy Markdown
Contributor Author

gatesn commented Mar 26, 2026

I've put a bit of a description in the PR now.

@gatesn gatesn changed the title Array VTables Migration Array VTables Migration 1 Mar 26, 2026
@gatesn gatesn changed the title Array VTables Migration 1 Array VTables Migration 1a Mar 26, 2026
@gatesn gatesn merged commit 656b3fe into develop Mar 26, 2026
63 of 64 checks passed
@gatesn gatesn deleted the ngates/array-vtables-1a branch March 26, 2026 17:13
gatesn added a commit that referenced this pull request Mar 26, 2026
See #7181

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
gatesn added a commit that referenced this pull request Mar 26, 2026
See #7181

Signed-off-by: Nicholas Gates <nick@nickgates.com>
gatesn added a commit that referenced this pull request Mar 26, 2026
See #7181

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/chore A trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants