Skip to content

Conversation

@joseph-isaacs
Copy link
Contributor

@joseph-isaacs joseph-isaacs commented Aug 21, 2025

⏺ Vortex Vector Pipeline: Stepped Execution Engine for Columnar Compute

This PR introduces a vectorized pipeline execution engine for Vortex that processes data in fixed-size chunks to maximize cache locality and enable efficient query planning. The design
implements a DAG-based execution model where compute operations are broken into simple, composable kernels that can be optimized at planning time and executed efficiently at runtime.

This should be the new way of implementing compute function going forwards.

Solution: Stepped Execution with Cache-Resident Chunks

Core Concept

Instead of processing entire arrays operation by operation, we process small chunks through the entire operation pipeline while data remains cache-resident:

chunk[0..1024]
→ filter → expr_1 → expr_2 → expr_3 → output[0..n]
chunk[1024..2048]
→ filter → expr_1 → expr_2 → expr_3 → output[n..m]
...

Each 1024-element chunk flows through all operations while staying in L1 cache, dramatically improving memory bandwidth utilization.

Concepts

  1. Kernel Interface

Each kernel implements a single, simple operation:

  pub trait Kernel {
      fn step(
          &mut self,
          ctx: &KernelContext,      // Access to child vectors
          selected: BitView,         // Which elements to process
          out: &mut ViewMut,         // Output buffer
      ) -> VortexResult<()>;
  }

This simplicity makes kernels:

  • Easy to implement correctly
  • Easy to optimize (vectorization, SIMD)
  • Easy to test in isolation
  1. Physical Type System (VType)

Operations work on canonical physical representations, not logical types:

  • Bool - Byte-sized booleans (for SIMD efficiency)
  • Primitive(PType) - Native numeric types (and decimals)
  • Binary - Variable-length data with 16-byte views

This eliminates encoding complexity from kernel implementations.

  1. Compile-Time Dispatch

Operations use trait-based dispatch for zero-overhead abstraction:

impl<T: Element + NativePType> Kernel for ComparePrimitiveKernel<T> {
    fn step(
        &mut self,
        ctx: &KernelContext,
        selected: BitView,
        out: &mut ViewMut,
    ) -> VortexResult<()> {
        let lhs_vec = ctx.vector(self.lhs);
        let lhs = lhs_vec.as_slice::<T>();
        let rhs_vec = ctx.vector(self.rhs);
        let rhs = rhs_vec.as_slice::<T>();
        let bools = out.as_slice_mut::<bool>();

        assert_eq!(
            lhs.len(),
            rhs.len(),
            "LHS and RHS must have the same length"
        );

        lhs.iter()
            .zip(rhs.iter())
            .zip(bools)
            .for_each(|((lhs, rhs), bool)| *bool = lhs > rhs;

        Ok(())
    }
}
  1. Planning-Time Optimization

The DAG structure enables powerful optimizations before execution:

  • Operator Fusion: compare(array, lit(2)) → compare_scalar(array, 2)
  • Common Sub-expression Elimination: Reuse computed intermediates
  • Dead Code Elimination: Remove unused branches
  • Buffer Management: Optimal allocation and reuse of intermediate vectors

Implementation Status

Implemented Operators & Kernels

  1. Primitive, FoR, bitpacking, Compare, Constant.

Future Work

  1. Expression → Operation Conversion (Next PR)
    - Convert vortex-expr to operators and a operator DAGs
  2. Array Encoding Integration
    - Replace to_canonical for specific encodings (BitPacking, FOR, Primitive)
    - Transparent handling of encoded data
  3. Advanced Optimizations
    - In-place operations for unary functions

Performance Impact

For typical analytical workloads (filter + projections):

  • 1-3x gain in filter + decode of FoR bitpacked kernels (see benchmarks)
  • 1-3x gain in filter + compare kernels

Integration Path

  1. Phase 1 (Experimental): Internal use for specific array encodings
  2. Phase 2: Replace to_canonical for performance-critical paths
  3. Phase 3: Full integration with expression evaluation

Handling other arrays:

This can be done with multiple pipeline where each one can be composed with either materialise nodes or IO nodes.

  • Dict(codes, value) operator would be defined as decompressing the array in the usual way. However we would likely want to either decompress all the values at once [or take the values optimal]. This would be modelled as two pipelines one to materialise to values into a full array [this can be stepped or a compute function] and then another to take values using a codes in a stepped pipeline.
           ┌──────────────┐
           │   Output     │
           │  (result)    │
           └──────▲───────┘
                  │
           ┌──────┴───────┐
           │     Add      │
           │              │
           └──────▲───────┘
                  │
          ┌───────┴────────┐
          │                │
     ┌────▼─────┐    ┌─────▼─────┐
     │   Dict   │    │     b     │
     │  Lookup  │    │(Primitive)│
     └────▲─────┘    └───────────┘
          │
     ┌────┴─────────────┐
     │                  │
  ┌─────┐               |
  │codes│               │            
  │     │               │             
  └─────┘               |                 
                         │
          ╔══════════════╧═══════════════╗
          ║  Pipeline 1 (Embedded)       ║
          ║                               ║
          ║      ┌──────────────┐        ║
          ║      │ScalarCompare │        ║
          ║      │   (== 2)     │        ║
          ║      └──────▲───────┘        ║
          ║             │                ║
          ║      ┌──────┴───────┐        ║
          ║      │  Primitive   │        ║
          ║      │   (values)   │        ║
          ║      └──────────────┘        ║
          ╚═══════════════════════════════╝

Missing elements:

  • optimise the pipeline execution [order, intermediate vector allocation]

gatesn added 30 commits July 26, 2025 21:42
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs marked this pull request as ready for review August 22, 2025 11:37
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
many_single_char_names = "deny"
mem_forget = "deny"
multiple_crate_versions = "allow"
needless_range_loop = "allow"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) August 22, 2025 14:58
@joseph-isaacs joseph-isaacs requested a review from gatesn August 22, 2025 14:58
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's gooooo

@joseph-isaacs joseph-isaacs merged commit 517e293 into develop Aug 22, 2025
36 of 37 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/vector-pipeline branch August 22, 2025 15:21
@robert3005 robert3005 mentioned this pull request Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Release label indicating a new feature or request performance Release label indicating an improvement to performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants