Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Itemsize cannot be zero in type #255

Open
tjcrone opened this issue Apr 19, 2018 · 6 comments
Open

Itemsize cannot be zero in type #255

tjcrone opened this issue Apr 19, 2018 · 6 comments

Comments

@tjcrone
Copy link
Member

tjcrone commented Apr 19, 2018

Not sure if this is for xarray or zarr, but I'll start here. I am using xarray/zarr to save a dataset that includes a Dask array with dtype='bytes', binary compressed image data. I have lots of questions about whether this is a good idea, how to efficiently work with these data, and reasons why I want to do this. But for now, I will focus on an issue with saving these data to a zarr group, and reading them back in.

After saving my dataset to a zarr file using ds.to_zarr(), I get an "Itemsize cannot be zero in type" when trying to open and load a chunk from this group. I'm sure I'm just doing something wrong. Any thoughts on how to properly read this sort of data back in? A full example of my workflow can be found here: https://github.com/tjcrone/rte-camhd/blob/master/examples/prores_tozarr.ipynb. Any help you can provide would be greatly appreciated! Thank you.

@alimanfoo
Copy link
Member

alimanfoo commented Apr 19, 2018 via email

@tjcrone
Copy link
Member Author

tjcrone commented Apr 20, 2018

Thank you @alimanfoo. For some reason I thought that a numpy array could hold strings of different lengths, but now I see that the only way to do this is to store objects, which I am pretty sure zarr would not be able to deal with. After further considering the problem, I am starting to think that zarr might not be the right way for me to go on this. I'll loop back if it looks like I can figure out a way.

One a separate note, I have created a mutable mapping for Azure Blob storage, similar to the one that @rabernat created for GCS. I'll start another issue to discuss after we have done more testing.

@alimanfoo
Copy link
Member

No problem. Btw zarr is capable of storing variable length byte strings (or variable length unicode strings), see docs here: http://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays. Note sure if this functionality can be accessed via xarray.

@tjcrone
Copy link
Member Author

tjcrone commented Apr 20, 2018

Oh very cool! I will look into the possibility of building a codec for my data and storing them as objects. That might work out.

@alimanfoo
Copy link
Member

Also very cool if you have a mutable mapping for Azure storage, would be happy to advertise that in the zarr docs once it's ready.

@tjcrone
Copy link
Member Author

tjcrone commented Apr 20, 2018

It's here: https://github.com/tjcrone/zarr/blob/abs_store/zarr/storage.py#L2068. Needs docs, more testing, and possibly more functionality before it can be considered. @friedrichknuth and I will create a new issue here when we think it is ready for others to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants