-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Itemsize cannot be zero in type #255
Comments
In numpy using dtype='bytes' means use a fixed length byte string dtype
with the length determined by examining the longest string in the initial
data. For some reason, when the dask arrays are being created, the dtype is
ending up as 'S0' which is a zero length bytes sequence dtype. This is a
bit of a crazy dtype and I'm surprised numpy or dask don't complain about
at this stage. This dtype ends up getting passed through xarray and down
into zarr, at which point zarr tries to use it with np.frombuffer() and
then numpy complains.
Do you really want a fixed length byte string dtype? If so, do you know
ahead of time the max string length, and so can provide e.g. dtype='S1'
instead of dtype='bytes'?
…On Thu, 19 Apr 2018, 20:28 Tim Crone, ***@***.***> wrote:
Not sure if this is for xarray or zarr, but I'll start here. I am using
xarray/zarr to save a dataset that includes a Dask array with
dtype='bytes', binary compressed image data. I have lots of questions about
whether this is a good idea, how to efficiently work with these data, and
reasons why I want to do this. But for now, I will focus on an issue with
saving these data to a zarr group, and reading them back in.
After saving my dataset to a zarr file using ds.to_zarr(), I get an
"Itemsize cannot be zero in type" when trying to open and load a chunk from
this group. I'm sure I'm just doing something wrong. Any thoughts on how to
properly read this sort of data back in? A full example of my workflow can
be found here:
https://github.com/tjcrone/rte-camhd/blob/master/examples/prores_tozarr.ipynb.
Any help you can provide would be greatly appreciated! Thank you.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#255>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qkp2jZ73SZmYWM_2JCB9Ru9Co2ZJks5tqOVggaJpZM4TcXSf>
.
|
Thank you @alimanfoo. For some reason I thought that a numpy array could hold strings of different lengths, but now I see that the only way to do this is to store objects, which I am pretty sure zarr would not be able to deal with. After further considering the problem, I am starting to think that zarr might not be the right way for me to go on this. I'll loop back if it looks like I can figure out a way. One a separate note, I have created a mutable mapping for Azure Blob storage, similar to the one that @rabernat created for GCS. I'll start another issue to discuss after we have done more testing. |
No problem. Btw zarr is capable of storing variable length byte strings (or variable length unicode strings), see docs here: http://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays. Note sure if this functionality can be accessed via xarray. |
Oh very cool! I will look into the possibility of building a codec for my data and storing them as objects. That might work out. |
Also very cool if you have a mutable mapping for Azure storage, would be happy to advertise that in the zarr docs once it's ready. |
It's here: https://github.com/tjcrone/zarr/blob/abs_store/zarr/storage.py#L2068. Needs docs, more testing, and possibly more functionality before it can be considered. @friedrichknuth and I will create a new issue here when we think it is ready for others to test. |
Not sure if this is for xarray or zarr, but I'll start here. I am using xarray/zarr to save a dataset that includes a Dask array with dtype='bytes', binary compressed image data. I have lots of questions about whether this is a good idea, how to efficiently work with these data, and reasons why I want to do this. But for now, I will focus on an issue with saving these data to a zarr group, and reading them back in.
After saving my dataset to a zarr file using ds.to_zarr(), I get an "Itemsize cannot be zero in type" when trying to open and load a chunk from this group. I'm sure I'm just doing something wrong. Any thoughts on how to properly read this sort of data back in? A full example of my workflow can be found here: https://github.com/tjcrone/rte-camhd/blob/master/examples/prores_tozarr.ipynb. Any help you can provide would be greatly appreciated! Thank you.
The text was updated successfully, but these errors were encountered: