Skip to content

Epic: Multi-resolution zone maps #7939

@gatesn

Description

@gatesn

Goal

Allow ZonedLayout to store and evaluate multiple zone maps with independent partitioning. This lets cheap, fine-grained stats such as min/max run at small zones while larger or more expensive stats such as Bloom filters run at coarser zones, without stacking ZonedLayout nodes in the layout tree.

Direction

Make zone maps first-class entries owned by a single ZonedLayout. The layout should have one data child plus one or more auxiliary zone-map children. Each zone map should describe its own partitioning and stored aggregate descriptors, and the reader should evaluate zone maps in configured order so cheap maps can prune before more expensive maps are loaded or evaluated.

This work is orthogonal to #7707. The aggregate-stats work should continue to model stored zone stats as aggregate-function descriptors; this epic changes how zone maps are partitioned, stored, and coordinated by ZonedLayout.

Phase 1: Writer-Owned Fixed Partitioning

  • Change ZonedStrategy to partition stats internally for fixed-size zones.
    • Do not require the parent strategy to provide one input chunk per zone.
    • Keep the data child write path independent from stats partitioning.
  • Preserve the current single-zone-map metadata shape while fixed partitioning is introduced.
  • Keep fixed-size zone pruning behavior unchanged for readers.
  • Add tests where input chunks are smaller than, larger than, and not aligned with the configured zone size.

Phase 2: Zone-Map Partitioning Metadata

  • Introduce per-zone-map metadata describing partitioning.
    • Support compact fixed partitioning with zone_len.
    • Support explicit variable partitioning with stored row lengths or row ends.
  • Move zone partitioning out of global ZonedLayout state and into each zone-map descriptor.
  • Decode legacy zoned metadata as a single fixed-size zone map.
  • Update ZoneMap and reader helpers to derive row-to-zone mapping from zone-map metadata.

Phase 3: Variable Zones From Input Chunks

  • Add writer configuration for fixed versus input-chunk zone partitioning.
    • Some(zone_len) partitions internally into fixed-size zones.
    • None uses input chunk boundaries as zones.
  • Store variable zone boundaries in the zone map when fixed zone_len is not specified.
  • Update pruning mask expansion for variable-size zones.
  • Add tests for variable zones, including non-uniform chunks and final partial zones.

Phase 4: Multiple Zone Maps In One ZonedLayout

  • Change zoned metadata to store zone_maps: repeated ZoneMapMetadata.
    • Each entry has partitioning metadata and aggregate descriptors.
  • Change ZonedLayout children from exactly two children to one data child plus one child per zone map.
  • Change writer options to accept multiple zone-map configurations.
    • Each zone map has its own partitioning strategy and aggregate functions.
  • Write one auxiliary stats-table child per configured zone map.
  • Update display, Python, and TUI layout summaries for multiple zone maps.

Phase 5: Ordered Pruning Execution

  • Teach ZonedReader to evaluate multiple zone maps in metadata/configuration order.
  • Cache loaded zone maps and pruning results per zone map.
  • Intersect pruning masks as each zone map runs.
  • Stop loading or evaluating later zone maps once the current mask is all false.
  • Add tests proving a later zone map is not loaded when an earlier one fully prunes the row range.

Phase 6: Bloom-Ready Integration

  • Add a writer configuration example with fine-grained min/max stats and coarse-grained Bloom stats.
  • Verify the multi-zone-map reader can combine different zone sizes for one predicate.
  • Connect this infrastructure to the Bloom-filter proof work from Epic: Stats and AggregateFns #7707.

Status

Proposed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicPublic roadmap umbrella for a major initiative, with work tracked in sub-issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions