Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipStore fails to handle scalar string arrays #551

Open
hmaarrfk opened this issue Mar 26, 2020 · 9 comments
Open

ZipStore fails to handle scalar string arrays #551

hmaarrfk opened this issue Mar 26, 2020 · 9 comments

Comments

@hmaarrfk
Copy link

Minimal, reproducible code sample, a copy-pastable example if possible

import zarr
import numpy as np
name = 'hello'
data = np.array('world', dtype='<U5')
store = zarr.ZipStore('test_store.zip', mode='w')
root = zarr.open(store , mode='w')
zarr_array = root.create_dataset(name, data=data, shape=data.shape, dtype=data.dtype)
zarr_array[...]

# zarr_array = root.create_dataset(name, shape=data.shape, dtype=data.dtype)
# root[name][...] = data
# zarr_array[...]

Problem description

Scalar coordinates are useful as coordinates in xarray and likely other situations. Serializing them in zarr in a zipstore would be cool!.

xref: pydata/xarray#3815

I think this works in the typical directory store.

Version and installation information

Please provide the following:

  • Value of zarr.__version__: 2.4.0
  • Value of numcodecs.__version__: 0.6.4
  • Version of Python interpreter: 3.7
  • Operating system (Linux/Windows/Mac): linux
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda, conda-forge

Also, if you think it might be relevant, please provide the output from pip freeze or
conda env export depending on which was used to install Zarr.

@jakirkham
Copy link
Member

Ah missed this was string related. Sorry about that. On the bright side this may be an easy resolution.

Basically we need an object_codec specified for things that are not bytes-like, which includes strings. There's a good example in this string section.

@jakirkham
Copy link
Member

Thoughts @hmaarrfk? 🙂

@hmaarrfk
Copy link
Author

I may be able to work on this stuff after October.

Thanks for looking into this with me.

@hmaarrfk
Copy link
Author

honestly, i ligitimitely might have to revisit this now.

For this, why is it not a problem with the standard store?

Shouldn't this be definied higher up, and not specifically related to the ZipStore?

@hmaarrfk
Copy link
Author

I guess the correct location to put this is in normalize_dtype

diff --git a/zarr/util.py b/zarr/util.py
index 241009c..c432ed3 100644
--- a/zarr/util.py
+++ b/zarr/util.py
@@ -135,6 +135,9 @@ def normalize_chunks(chunks, shape, typesize):
 
 def normalize_dtype(dtype, object_codec):
 
+    # Ensure that all types of numpy unicode strings are treaded as strings
+    if np.issubdtype(np.unicode_, dtype):
+        dtype = str
     # convenience API for object arrays
     if inspect.isclass(dtype):
         dtype = dtype.__name__

@hmaarrfk hmaarrfk changed the title ZipStore fails to handle scalar arrays ZipStore fails to handle scalar string arrays Aug 31, 2020
@jakirkham
Copy link
Member

Did you try using an object codec as noted here ( #551 (comment) )? That's typically how we recommend handling Python objects (like str).

@hmaarrfk
Copy link
Author

unfortunately, it ignores it because dtype != object

@joshmoore
Copy link
Member

Recent work by @abergou may have improved the situation with object codecs.

@jakirkham
Copy link
Member

Guessing that is referring to PR ( #813 ) in Zarr 2.9.4+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants