Itemsize cannot be zero in type #255

tjcrone · 2018-04-19T19:28:31Z

Not sure if this is for xarray or zarr, but I'll start here. I am using xarray/zarr to save a dataset that includes a Dask array with dtype='bytes', binary compressed image data. I have lots of questions about whether this is a good idea, how to efficiently work with these data, and reasons why I want to do this. But for now, I will focus on an issue with saving these data to a zarr group, and reading them back in.

After saving my dataset to a zarr file using ds.to_zarr(), I get an "Itemsize cannot be zero in type" when trying to open and load a chunk from this group. I'm sure I'm just doing something wrong. Any thoughts on how to properly read this sort of data back in? A full example of my workflow can be found here: https://github.com/tjcrone/rte-camhd/blob/master/examples/prores_tozarr.ipynb. Any help you can provide would be greatly appreciated! Thank you.

alimanfoo · 2018-04-19T21:30:20Z

In numpy using dtype='bytes' means use a fixed length byte string dtype with the length determined by examining the longest string in the initial data. For some reason, when the dask arrays are being created, the dtype is ending up as 'S0' which is a zero length bytes sequence dtype. This is a bit of a crazy dtype and I'm surprised numpy or dask don't complain about at this stage. This dtype ends up getting passed through xarray and down into zarr, at which point zarr tries to use it with np.frombuffer() and then numpy complains. Do you really want a fixed length byte string dtype? If so, do you know ahead of time the max string length, and so can provide e.g. dtype='S1' instead of dtype='bytes'?

…

On Thu, 19 Apr 2018, 20:28 Tim Crone, ***@***.***> wrote: Not sure if this is for xarray or zarr, but I'll start here. I am using xarray/zarr to save a dataset that includes a Dask array with dtype='bytes', binary compressed image data. I have lots of questions about whether this is a good idea, how to efficiently work with these data, and reasons why I want to do this. But for now, I will focus on an issue with saving these data to a zarr group, and reading them back in. After saving my dataset to a zarr file using ds.to_zarr(), I get an "Itemsize cannot be zero in type" when trying to open and load a chunk from this group. I'm sure I'm just doing something wrong. Any thoughts on how to properly read this sort of data back in? A full example of my workflow can be found here: https://github.com/tjcrone/rte-camhd/blob/master/examples/prores_tozarr.ipynb. Any help you can provide would be greatly appreciated! Thank you. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#255>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8Qkp2jZ73SZmYWM_2JCB9Ru9Co2ZJks5tqOVggaJpZM4TcXSf> .

tjcrone · 2018-04-20T10:27:46Z

Thank you @alimanfoo. For some reason I thought that a numpy array could hold strings of different lengths, but now I see that the only way to do this is to store objects, which I am pretty sure zarr would not be able to deal with. After further considering the problem, I am starting to think that zarr might not be the right way for me to go on this. I'll loop back if it looks like I can figure out a way.

One a separate note, I have created a mutable mapping for Azure Blob storage, similar to the one that @rabernat created for GCS. I'll start another issue to discuss after we have done more testing.

alimanfoo · 2018-04-20T10:31:25Z

No problem. Btw zarr is capable of storing variable length byte strings (or variable length unicode strings), see docs here: http://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays. Note sure if this functionality can be accessed via xarray.

tjcrone · 2018-04-20T11:07:38Z

Oh very cool! I will look into the possibility of building a codec for my data and storing them as objects. That might work out.

alimanfoo · 2018-04-20T11:26:50Z

Also very cool if you have a mutable mapping for Azure storage, would be happy to advertise that in the zarr docs once it's ready.

tjcrone · 2018-04-20T12:37:37Z

It's here: https://github.com/tjcrone/zarr/blob/abs_store/zarr/storage.py#L2068. Needs docs, more testing, and possibly more functionality before it can be considered. @friedrichknuth and I will create a new issue here when we think it is ready for others to test.

tjcrone mentioned this issue Aug 2, 2018

AZURE deployment pangeo-data/pangeo#82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Itemsize cannot be zero in type #255

Itemsize cannot be zero in type #255

tjcrone commented Apr 19, 2018

alimanfoo commented Apr 19, 2018 via email

tjcrone commented Apr 20, 2018

alimanfoo commented Apr 20, 2018

tjcrone commented Apr 20, 2018

alimanfoo commented Apr 20, 2018

tjcrone commented Apr 20, 2018

Itemsize cannot be zero in type #255

Itemsize cannot be zero in type #255

Comments

tjcrone commented Apr 19, 2018

alimanfoo commented Apr 19, 2018 via email

tjcrone commented Apr 20, 2018

alimanfoo commented Apr 20, 2018

tjcrone commented Apr 20, 2018

alimanfoo commented Apr 20, 2018

tjcrone commented Apr 20, 2018