Codec does not support buffers of > 2147483647 bytes - reason for this error message #487

mrava87 · 2019-10-20T16:18:12Z

Hello,
I have problems understanding the reason why this error occurs:

related to the size of chunks of the zarr file
due to the size of the data that is written to a portion zaar file, e.g. doing z[start1:end1, start2:end2] = nparray.

If this is due to the size of chunks, could an error be returned at initialization and not when trying to write to a zarr file? If this is due to the second case, how would you suggest transferring multiple npz files to a zarr file - noting that as part of the transfer process the user may want to perform some manipulation of the input arrays prior to writing the manipulated arrays to zarr?

Either way, it would be great to have a more understandable error message (at least converting the number of bytes into human readable format would be a good idea).

Thank you!

The text was updated successfully, but these errors were encountered:

alimanfoo · 2019-10-20T20:38:59Z

Hi @mrava87, it is due to (1), the chunks will be too large for the compressor codec to handle. Some compressor codecs, like the default Blosc codec, have a maximum buffer size that they can accept during encoding.

The error message originates from here, any suggestions for improving this would be welcome.

It might also be possible to raise an exception earlier from within zarr at array creation time, at least for codecs that use the same convention of a class attribute named max_buffer_size to store the maximum buffer size. I.e., within zarr array creation, check the full size of the chunk, check if the first codec in the codec chain has a max_buffer_size attribute, if so compare and raise if max_buffer_size is too small. I would have no objections to a PR in that direction.

mrava87 · 2019-10-21T05:53:32Z

Thanks a lot @alimanfoo! That makes more sense now :)

I think something like, ‘Consider reducing chunk size’ as part of the message could help understand where the problem is and how to solve it. And perhaps also adding the current size of arr.nbytes could be useful to understand how much smaller the chunks should be.

I agree with the early exception raise if that does not require much change to current code base.

I can try to make a proposal PR for both if you think makes
sense :)

alimanfoo · 2019-10-21T07:49:09Z

On Mon, 21 Oct 2019 at 06:53, Matteo Ravasi ***@***.***> wrote: Thanks a lot @alimanfoo <https://github.com/alimanfoo>! That makes more sense now :) I think something like, ‘Consider reducing chunk size’ as part of the message could help understand where the problem is and how to solve it. And perhaps also adding the current size of arr.nbytes could be useful to understand how much smaller the chunks should be.

The slight complication here is that the error comes from within the numcodecs package, which is most often used with zarr to compress chunks, but is separate from zarr and can also be used in other applications. So if we added a message like "Consider reducing chunk size" then that might not make sense when numcodecs is being used elsewhere. Perhaps a more general message, like "Consider reducing the size of the buffer you are trying to encode."?

I agree with the early exception raise if that does not require much change to current code base.

Cool. If we raise an early exception, that would come from within zarr, so the error message could be very specific about chunks being too big for the compressor codec.

I can try to make a proposal PR for both if you think makes sense :)

PRs welcome :)

mrava87 changed the title ~~Codec does not support buffers of > 2147483647 bytes~~ Codec does not support buffers of > 2147483647 bytes - reason for this error message Oct 20, 2019

edyoshikun mentioned this issue Feb 22, 2023

Chunking option for writing dataset czbiohub-sf/iohub#64

Closed

rt17603 mentioned this issue Jan 16, 2024

Umbrella: OpenGHG Storage Sprint openghg/openghg#867

Closed

timothymillar mentioned this issue Mar 7, 2024

vcf_to_zarr raises error instead of truncating INFO fields of length R sgkit-dev/sgkit#1210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codec does not support buffers of > 2147483647 bytes - reason for this error message #487

Codec does not support buffers of > 2147483647 bytes - reason for this error message #487

mrava87 commented Oct 20, 2019

alimanfoo commented Oct 20, 2019

mrava87 commented Oct 21, 2019

alimanfoo commented Oct 21, 2019 via email

Codec does not support buffers of > 2147483647 bytes - reason for this error message #487

Codec does not support buffers of > 2147483647 bytes - reason for this error message #487

Comments

mrava87 commented Oct 20, 2019

alimanfoo commented Oct 20, 2019

mrava87 commented Oct 21, 2019

alimanfoo commented Oct 21, 2019 via email