Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec does not support buffers of > 2147483647 bytes - reason for this error message #487

Open
mrava87 opened this issue Oct 20, 2019 · 3 comments

Comments

@mrava87
Copy link

mrava87 commented Oct 20, 2019

Hello,
I have problems understanding the reason why this error occurs:

  1. related to the size of chunks of the zarr file
  2. due to the size of the data that is written to a portion zaar file, e.g. doing z[start1:end1, start2:end2] = nparray.

If this is due to the size of chunks, could an error be returned at initialization and not when trying to write to a zarr file? If this is due to the second case, how would you suggest transferring multiple npz files to a zarr file - noting that as part of the transfer process the user may want to perform some manipulation of the input arrays prior to writing the manipulated arrays to zarr?

Either way, it would be great to have a more understandable error message (at least converting the number of bytes into human readable format would be a good idea).

Thank you!

@mrava87 mrava87 changed the title Codec does not support buffers of > 2147483647 bytes Codec does not support buffers of > 2147483647 bytes - reason for this error message Oct 20, 2019
@alimanfoo
Copy link
Member

Hi @mrava87, it is due to (1), the chunks will be too large for the compressor codec to handle. Some compressor codecs, like the default Blosc codec, have a maximum buffer size that they can accept during encoding.

The error message originates from here, any suggestions for improving this would be welcome.

It might also be possible to raise an exception earlier from within zarr at array creation time, at least for codecs that use the same convention of a class attribute named max_buffer_size to store the maximum buffer size. I.e., within zarr array creation, check the full size of the chunk, check if the first codec in the codec chain has a max_buffer_size attribute, if so compare and raise if max_buffer_size is too small. I would have no objections to a PR in that direction.

@mrava87
Copy link
Author

mrava87 commented Oct 21, 2019

Thanks a lot @alimanfoo! That makes more sense now :)

I think something like, ‘Consider reducing chunk size’ as part of the message could help understand where the problem is and how to solve it. And perhaps also adding the current size of arr.nbytes could be useful to understand how much smaller the chunks should be.

I agree with the early exception raise if that does not require much change to current code base.

I can try to make a proposal PR for both if you think makes
sense :)

@alimanfoo
Copy link
Member

alimanfoo commented Oct 21, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants