-
-
Notifications
You must be signed in to change notification settings - Fork 382
Description
the two PRs #3715 and #3719 demonstrate that we can attain substantial performance improvements and completely new functionality by applying a series of changes to stores, codecs, and the codec pipeline logic. We get the performance via the following changes:
- Avoid async overhead when doing IO against low-latency stores
- Avoid overhead related to creating unnecessary python objects per-chunk
- Formally separate IO from compute in chunk encoding / decoding
- Use range writes when applicable to write individual subchunks inside a shard.
The two PRs that demonstrate these changes are far too large to merge on their own, and reviewing is burdened with off-target changes made inadvertantly by claude.
I think the performance wins are too great to pass up, and the cost is manageable if we break the changes into smaller PRs. This issue tracks the progress of these changes, and can serve as a discussion site as needed.
Here's an outline of the PRs I'd like to open shortly:
- PR1 SupportsSyncCodec + _encode_sync/_decode_sync on all codecs
- PR2 CodecChain (pure-compute codec wrapper)
- PR3 Store layer: set_range + sync methods
- PR4 PreparedWrite + prepare/finalize pattern on ArrayBytesCodec
- PR5 BatchedCodecPipeline refactor (uses CodecChain, read_sync/write_sync)
- PR6 ShardingCodec refactor (prepare_write, finalize_write, set_range, dense layout)
- PR7 Vectorized shard encoding (_encode_vectorized, _encode_vectorized_sparse)
- PR8 Sync fast path in Array (opt-in config flag)
I will update this issue when I actually open these PRs.