Join GitHub today
Types for Batch Execution #3660
What have you changed? (mandatory)
This PR implements primitive type
What are the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
See new unit tests introduced by this PR.
Benchmark result if necessary (optional)
For the followings, items in italic means that it is the implementation used in this PR.
Notice that this PR does not provide most efficient implementation for every kind of payloads. However it tries it best. For example, SmallVec is used to accelerate 75% performance for small and common data types like Int, Real, while it will introduce 4% overhead for large data types like Json and String.
Decode Multiple Datums
Decode 1000 datums
SmallVec vs Vec: Push Item
When implementing BatchRows, SmallVec instead of Vec is used. This shows how it benefits performance (1000 elements):
When datum is small enough to put inline (applicable for Int, Real, DateTime, Duration):
When datum is large so that it cannot be put inline (applicable for other kind of data):
SmallVec vs Vec vs Optimized SmallVec: Clone
When datum is small and SmallVec is in inline mode:
When datum is large SmallVec is not in inline mode:
Encoded clone vs Decoded clone
For BatchRows, each column may be either encoded or decoded (in a specific data type). This shows their clone performance (1000 elements):
For Selection executor, we will retain rows according to expression evaluation result. This benchmark shows its performance (1000 elements):