-
Notifications
You must be signed in to change notification settings - Fork 113
feat[vortex-array]: add stepped pipeline execution #4312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
| many_single_char_names = "deny" | ||
| mem_forget = "deny" | ||
| multiple_crate_versions = "allow" | ||
| needless_range_loop = "allow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this https://spiraldb.slack.com/archives/C07BV3GKAJ2/p1755518451455489
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
5d831d4 to
212ccab
Compare
gatesn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's gooooo
⏺ Vortex Vector Pipeline: Stepped Execution Engine for Columnar Compute
This PR introduces a vectorized pipeline execution engine for Vortex that processes data in fixed-size chunks to maximize cache locality and enable efficient query planning. The design
implements a DAG-based execution model where compute operations are broken into simple, composable kernels that can be optimized at planning time and executed efficiently at runtime.
This should be the new way of implementing compute function going forwards.
Solution: Stepped Execution with Cache-Resident Chunks
Core Concept
Instead of processing entire arrays operation by operation, we process small chunks through the entire operation pipeline while data remains cache-resident:
chunk[0..1024]
→ filter → expr_1 → expr_2 → expr_3 → output[0..n]
chunk[1024..2048]
→ filter → expr_1 → expr_2 → expr_3 → output[n..m]
...
Each 1024-element chunk flows through all operations while staying in L1 cache, dramatically improving memory bandwidth utilization.
Concepts
Each kernel implements a single, simple operation:
This simplicity makes kernels:
Operations work on canonical physical representations, not logical types:
This eliminates encoding complexity from kernel implementations.
Operations use trait-based dispatch for zero-overhead abstraction:
The DAG structure enables powerful optimizations before execution:
Implementation Status
Implemented Operators & Kernels
Future Work
- Convert
vortex-exprtooperators and a operator DAGs- Replace to_canonical for specific encodings (BitPacking, FOR, Primitive)
- Transparent handling of encoded data
- In-place operations for unary functions
Performance Impact
For typical analytical workloads (filter + projections):
Integration Path
Handling other arrays:
This can be done with multiple pipeline where each one can be composed with either materialise nodes or IO nodes.
Dict(codes, value) operator would be defined as decompressing the array in the usual way. However we would likely want to either decompress all the values at once [ortakethe values optimal]. This would be modelled as two pipelines one to materialise to values into a full array [this can be stepped or a compute function] and then another to take values using a codes in a stepped pipeline.Missing elements: