-
-
Notifications
You must be signed in to change notification settings - Fork 365
optimize shard writing #3561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize shard writing #3561
Conversation
Writing to sharded arrays was up to 10x slower for largish chunk sizes because the _ShardBuilder object has many calls to np.concatenate. This commit coalesces these into a single concatenate call, and improves write performance by a factor of 10 on the benchmarking script in zarr-developers#3560. Added a new core.Buffer.combine API Resolves zarr-developers#3560 Signed-off-by: Noah D. Brenowitz <nbren12@gmail.com>
|
Benchmarking results After |
Signed-off-by: Noah D. Brenowitz <nbren12@gmail.com>
Signed-off-by: Noah D. Brenowitz <nbren12@gmail.com>
|
thanks so much to tackling this problem! |
|
hmmm. lot's of failing tests. I'm having some trouble grokking the implementation of the sharding codec. |
|
I'll have a look. I don't know this section of the codebase very well, but maybe I can figure something out. And BTW if you see any opportunities to refactor / simplify the logic here, feel free to move in that direction. I think making this code easier to understand would be a big win. |
diff --git a/src/zarr/codecs/sharding.py b/src/zarr/codecs/sharding.py
index 0e36f50f..31d68027 100644
--- a/src/zarr/codecs/sharding.py
+++ b/src/zarr/codecs/sharding.py
@@ -227,6 +227,7 @@ class _ShardBuilder(_ShardReader, ShardMutableMapping):
buffers: list[Buffer]
index: _ShardIndex
buf: Buffer
+ _chunk_buffers: dict[tuple[int, ...], Buffer] # Map chunk_coords to buffers
@classmethod
def merge_with_morton_order(
@@ -255,13 +256,24 @@ class _ShardBuilder(_ShardReader, ShardMutableMapping):
obj = cls()
obj.buf = buffer_prototype.buffer.create_zero_length()
obj.buffers = []
+ obj._chunk_buffers = {}
obj.index = _ShardIndex.create_empty(chunks_per_shard)
return obj
+ def __getitem__(self, chunk_coords: tuple[int, ...]) -> Buffer:
+ # Override to use _chunk_buffers instead of self.buf
+ if chunk_coords in self._chunk_buffers:
+ return self._chunk_buffers[chunk_coords]
+ raise KeyError
+
def __setitem__(self, chunk_coords: tuple[int, ...], value: Buffer) -> None:
chunk_start = sum(len(buf) for buf in self.buffers)
chunk_length = len(value)
- self.buffers.append(value)
+ # Store the buffer for later retrieval
+ self._chunk_buffers[chunk_coords] = value
+ # Only append non-empty buffers to avoid messing up offset calculations
+ if chunk_length > 0:
+ self.buffers.append(value)
self.index.set_chunk_slice(chunk_coords, slice(chunk_start, chunk_start + chunk_length))
def __delitem__(self, chunk_coords: tuple[int, ...]) -> None:
@@ -280,7 +292,8 @@ class _ShardBuilder(_ShardReader, ShardMutableMapping):
self.buffers.insert(0, index_bytes)
else:
self.buffers.append(index_bytes)
- self.buf = self.buf.combine(self.buffers)makes this work for me locally |
|
@nbren12 please take a look at https://github.com/d-v-b/zarr-python/blob/chore/sharding-refactor/src/zarr/codecs/sharding.py, it's a refactor of the sharding logic that might be easier to reason about. |
remove inheritance, hide the index attribute and remove some indirection Signed-off-by: Noah D. Brenowitz <nbren12@gmail.com>
just use dicts Signed-off-by: Noah D. Brenowitz <nbren12@gmail.com>
|
Thanks @d-v-b. I ended up pursuing an alternative refactor. I removed the various shard builder objects and things seem to be working now. |
Signed-off-by: Noah D. Brenowitz <nbren12@gmail.com>
|
@d-v-b Hopefully my latest commit fixes the tests. For a future PR, the partial write case could definitely use further optimization. When the index is stored at the end it should be possible to write just the new chunks...rather than the whole shard. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3561 +/- ##
==========================================
+ Coverage 61.77% 61.87% +0.10%
==========================================
Files 85 85
Lines 10165 10134 -31
==========================================
- Hits 6279 6270 -9
+ Misses 3886 3864 -22
🚀 New features to boost your workflow:
|
Awesome work! could you write up a release note? If not, I'm happy to write one. |
Done. Thanks for the instructions. |
|
are these results expected? main: This PR: |
|
No. We expect the PR to improve the write performance. Here is what I get when running locally. Is it possible your |
|
Yah. The version in your bechmarking script is marked as zarr==3.1.3, whereas in mine it is |
|
my bad, I forgot the |
|
thanks @nbren12! |
|
Thanks @d-v-b for the quick feedback and merge. It's my first zarr contribution in a while and seemed pretty seamless. |
Writing to sharded arrays was up to 10x slower for largish chunk sizes because the _ShardBuilder object has many calls to np.concatenate. This commit coalesces these into a single concatenate call, and improves write performance by a factor of 10 on the benchmarking script in #3560.
Added a new core.Buffer.combine API
Resolves #3560
[Description of PR]
TODO:
docs/user-guide/*.mdchanges/