chunk encoding performance improvements

the two PRs #3715 and #3719 demonstrate that we can attain substantial performance improvements and completely new functionality by applying a series of changes to stores, codecs, and the codec pipeline logic. We get the performance via the following changes:

- Avoid async overhead when doing IO against low-latency stores
- Avoid overhead related to creating unnecessary python objects per-chunk
- Formally separate IO from compute in chunk encoding / decoding
- Use range writes when applicable to write individual subchunks inside a shard.

The two PRs that demonstrate these changes are far too large to merge on their own, and reviewing is burdened with off-target changes made inadvertantly by claude.

I think the performance wins are too great to pass up, and the cost is manageable if we break the changes into smaller PRs. This issue tracks the progress of these changes, and can serve as a discussion site as needed. 

Here's an outline of the PRs I'd like to open shortly:

- [ ] PR1  SupportsSyncCodec + _encode_sync/_decode_sync on all codecs
- [ ] PR2  CodecChain (pure-compute codec wrapper)
- [ ] PR3  Store layer: set_range + sync methods
- [ ] PR4  PreparedWrite + prepare/finalize pattern on ArrayBytesCodec
- [ ] PR5  BatchedCodecPipeline refactor (uses CodecChain, read_sync/write_sync)
- [ ] PR6  ShardingCodec refactor (prepare_write, finalize_write, set_range, dense layout)
- [ ] PR7  Vectorized shard encoding (_encode_vectorized, _encode_vectorized_sparse)
- [ ] PR8  Sync fast path in Array (opt-in config flag)

I will update this issue when I actually open these PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chunk encoding performance improvements #3720

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

chunk encoding performance improvements #3720

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions