Skip to content

feat: unified chunk grid with rectilinear chunk/shard support#3802

Open
maxrjones wants to merge 95 commits intozarr-developers:mainfrom
maxrjones:poc/unified-chunk-grid
Open

feat: unified chunk grid with rectilinear chunk/shard support#3802
maxrjones wants to merge 95 commits intozarr-developers:mainfrom
maxrjones:poc/unified-chunk-grid

Conversation

@maxrjones
Copy link
Member

@maxrjones maxrjones commented Mar 21, 2026

Summary

This PR contains an alternative implementation of the rectilinear chunk grid extension, building on the work in #3534 (RLE helpers, validation logic, and test cases were directly adopted). While the core feature of variable-sized chunks is the same, the internal architecture differs in ways that impact extensibility, performance, and release safety.

I appreciate the patience of those who contributed to #3534, and everyone who's been waiting on this feature. I know it's frustrating to see a new PR after #3534 was so close. That PR provided fundamental components, and I hope people will see the value here. I really believe it is worth the churn for the following reasons:

Key differences from #3534

  1. Extensibility. Each dimension is represented by a type implementing the DimensionGrid protocol (FixedDimension, VaryingDimension). Adding a new dimension type (e.g. TiledDimension for periodic patterns like days-per-month) requires implementing that protocol — no changes to indexing, codecs, or the ChunkGrid class. A prototype was built to verify this.
  2. Performance. The indexing pipeline queries each dimension independently with scalar calls rather than constructing N-d coordinate tuples per chunk lookup. This avoids allocation overhead in the inner loop of every indexer. VaryingDimension uses precomputed prefix sums for O(log n) lookups via binary search. See https://github.com/maxrjones/zarr-chunk-grid-tests for a performance comparison.
  3. Feature flag. Rectilinear chunk grids are gated behind zarr.config.set({'array.rectilinear_chunks': True}) (or ZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives downstream libraries time to adapt before the API is finalized, and us an opportunity to gracefully finalize the API.
  4. Rectilinear sharding. Shard boundaries can be rectilinear while inner chunks remain regular, with validation that each shard edge is divisible by the inner chunk size. This is tested end-to-end and documented in the user guide.

Design document: docs/design/chunk-grid.md covers the full design, rationale, and a suggested PR sequence for splitting this into reviewable increments, if needed.

Downstream POCs (all passing):

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants