-
Notifications
You must be signed in to change notification settings - Fork 133
Open
Description
Discussed in #5917
Problem
Currently, only top-level field stats are stored in the vortex file footer:
| /// Note: for now this only collects top-level struct fields. |
While chunk-level pruning on nested fields already works, file-level stats are missing, preventing whole-file pruning on nested fields or other optimizations. For example, we have some queries that can 100% be satisfied just by nested file stats, removing extra object storage reads.
Proposed approach
Store stats via post-order DType walk. Each leaf and nullable struct gets a stats set entry:
{name=utf8?, age=i32} → [name stats_set, age stats_set]
{name=utf8?, age=i32}? → [name stats_set, age stats_set, struct stats_set (null count)]
Extend the stats set flatbuffer with an is_nested: bool = false field for backward compatibility. When false, existing behavior is preserved; when true, entries follow the post-order walk.
Lists: For list(T), store stats only for the element column.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels