Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using attribute keys with mixed types #1037

Closed
malmans2 opened this issue May 25, 2022 · 8 comments · Fixed by #1066
Closed

Error when using attribute keys with mixed types #1037

malmans2 opened this issue May 25, 2022 · 8 comments · Fixed by #1066
Labels
help wanted Issue could use help from someone with familiarity on the topic

Comments

@malmans2
Copy link
Member

Minimal, reproducible code sample, a copy-pastable example if possible

import zarr
z1 = zarr.open('data/example.zarr', mode='w', shape=(10, 10), dtype='i4')
z1.attrs.put({1: "bar"})  # OK
z1.attrs.put({"foo": "bar"})  # OK
z1.attrs.put({1: "bar", "foo": "bar"})  # Error
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 z1.attrs.put({1: "bar", "foo": "bar"})

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:109, in Attributes.put(self, d)
    106 def put(self, d):
    107     """Overwrite all attributes with the key/value pairs in the provided dictionary
    108     `d` in a single operation."""
--> 109     self._write_op(self._put_nosync, d)

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:73, in Attributes._write_op(self, f, *args, **kwargs)
     71 # synchronization
     72 if self.synchronizer is None:
---> 73     return f(*args, **kwargs)
     74 else:
     75     with self.synchronizer[self.key]:

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:112, in Attributes._put_nosync(self, d)
    111 def _put_nosync(self, d):
--> 112     self.store[self.key] = json_dumps(d)
    113     if self.cache:
    114         self._cached_asdict = d

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/util.py:38, in json_dumps(o)
     36 def json_dumps(o: Any) -> bytes:
     37     """Write JSON in a consistent, human-readable way."""
---> 38     return json.dumps(o, indent=4, sort_keys=True, ensure_ascii=True,
     39                       separators=(',', ': ')).encode('ascii')

File ~/mambaforge/envs/xarray/lib/python3.9/json/__init__.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232 if cls is None:
    233     cls = JSONEncoder
--> 234 return cls(
    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237     separators=separators, default=default, sort_keys=sort_keys,
    238     **kw).encode(obj)

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:201, in JSONEncoder.encode(self, o)
    199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
--> 201     chunks = list(chunks)
    202 return ''.join(chunks)

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    429     yield from _iterencode_list(o, _current_indent_level)
    430 elif isinstance(o, dict):
--> 431     yield from _iterencode_dict(o, _current_indent_level)
    432 else:
    433     if markers is not None:

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:353, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    351 first = True
    352 if _sort_keys:
--> 353     items = sorted(dct.items())
    354 else:
    355     items = dct.items()

TypeError: '<' not supported between instances of 'str' and 'int'

Problem description

I'm not able to assign attributes using keys with different types. Looks like the issue is coming from json that is trying to sort the keys. I guess there are 3 options:

  • Only strings are supported: Raise an error if unsupported types are used?
  • Mixed types are not supported: Raise an error if mixed types are used?
  • It's a bug: Do not sort the attributes by default?

Version and installation information

  • Value of zarr.__version__: 2.11.3
  • Value of numcodecs.__version__: 0.9.1
  • Version of Python interpreter: 3.9.12
  • Operating system (Linux/Windows/Mac): Mac
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda
conda env export

name: xarray
channels:

  • conda-forge
    dependencies:
  • affine=2.3.1=pyhd8ed1ab_0
  • aiobotocore=2.3.2=pyhd8ed1ab_0
  • aiohttp=3.8.1=py39h63b48b0_1
  • aioitertools=0.10.0=pyhd8ed1ab_0
  • aiosignal=1.2.0=pyhd8ed1ab_0
  • antlr-python-runtime=4.7.2=py39h6e9494a_1003
  • appdirs=1.4.4=pyh9f0ad1d_0
  • appnope=0.1.3=pyhd8ed1ab_0
  • asciitree=0.3.3=py_2
  • asttokens=2.0.5=pyhd8ed1ab_0
  • async-timeout=4.0.2=pyhd8ed1ab_0
  • attrs=21.4.0=pyhd8ed1ab_0
  • backcall=0.2.0=pyh9f0ad1d_0
  • backports=1.0=py_2
  • backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  • blosc=1.21.1=h97e831e_3
  • boost-cpp=1.74.0=hdbf7018_7
  • boto3=1.21.21=pyhd8ed1ab_0
  • botocore=1.24.21=pyhd8ed1ab_1
  • bottleneck=1.3.4=py39h86b5767_1
  • brotli=1.0.9=h5eb16cf_7
  • brotli-bin=1.0.9=h5eb16cf_7
  • brotlipy=0.7.0=py39h63b48b0_1004
  • bzip2=1.0.8=h0d85af4_4
  • c-ares=1.18.1=h0d85af4_0
  • ca-certificates=2022.5.18.1=h033912b_0
  • cached-property=1.5.2=hd8ed1ab_1
  • cached_property=1.5.2=pyha770c72_1
  • cairo=1.16.0=h9e0e54b_1010
  • cartopy=0.20.2=py39hd74d15e_4
  • cdat_info=8.2.1=pyhd8ed1ab_2
  • cdms2=3.1.5=py39ha4b5fb4_14
  • cdtime=3.1.4=py39hef3cfc1_7
  • certifi=2022.5.18.1=py39h6e9494a_0
  • cf-units=3.0.1=py39hc89836e_2
  • cffi=1.15.0=py39he338e87_0
  • cfgrib=0.9.10.1=pyhd8ed1ab_0
  • cfgv=3.3.1=pyhd8ed1ab_0
  • cfitsio=4.1.0=h2c97ad1_0
  • cftime=1.6.0=py39h86b5767_1
  • charset-normalizer=2.0.12=pyhd8ed1ab_0
  • click=8.1.3=py39h6e9494a_0
  • click-plugins=1.1.1=py_0
  • cligj=0.7.2=pyhd8ed1ab_1
  • cloudpickle=2.1.0=pyhd8ed1ab_0
  • coverage=6.4=py39h701faf5_0
  • cryptography=37.0.2=py39h9c2a9ce_0
  • curl=7.83.1=h372c54d_0
  • cycler=0.11.0=pyhd8ed1ab_0
  • cytoolz=0.11.2=py39h63b48b0_2
  • dask-core=2022.5.1=pyhd8ed1ab_0
  • decorator=5.1.1=pyhd8ed1ab_0
  • distarray=2.12.2=pyhd8ed1ab_2
  • distlib=0.3.4=pyhd8ed1ab_0
  • distributed=2022.5.1=pyhd8ed1ab_0
  • eccodes=2.26.0=hf772d56_0
  • esmf=8.2.0=nompi_h92c19ad_0
  • esmpy=8.2.0=nompi_py39h8ffa011_1
  • execnet=1.9.0=pyhd8ed1ab_0
  • executing=0.8.3=pyhd8ed1ab_0
  • expat=2.4.8=h96cf925_0
  • fasteners=0.17.3=pyhd8ed1ab_0
  • filelock=3.7.0=pyhd8ed1ab_0
  • findlibs=0.0.2=pyhd8ed1ab_0
  • flox=0.5.3=pyhd8ed1ab_0
  • font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  • font-ttf-inconsolata=3.000=h77eed37_0
  • font-ttf-source-code-pro=2.038=h77eed37_0
  • font-ttf-ubuntu=0.83=hab24e00_0
  • fontconfig=2.14.0=h676cef8_0
  • fonts-conda-ecosystem=1=0
  • fonts-conda-forge=1=0
  • fonttools=4.33.3=py39h701faf5_0
  • freetype=2.10.4=h4cff582_1
  • freexl=1.0.6=h0d85af4_0
  • frozenlist=1.3.0=py39h63b48b0_1
  • fsspec=2022.5.0=pyhd8ed1ab_0
  • future=0.18.2=py39h6e9494a_5
  • g2clib=1.6.3=h7284f3b_1
  • geos=3.10.2=he49afe7_0
  • geotiff=1.7.1=had63758_1
  • gettext=0.19.8.1=hd1a6beb_1008
  • giflib=5.2.1=hbcb3906_2
  • h5netcdf=1.0.0=pyhd8ed1ab_0
  • h5py=3.6.0=nompi_py39hbc6cb89_100
  • hdf4=4.2.15=hefd3b78_3
  • hdf5=1.12.1=nompi_ha60fbc9_104
  • heapdict=1.0.1=py_0
  • hypothesis=6.46.8=pyhd8ed1ab_0
  • icu=69.1=he49afe7_0
  • identify=2.5.1=pyhd8ed1ab_0
  • idna=3.3=pyhd8ed1ab_0
  • importlib-metadata=4.11.4=py39h6e9494a_0
  • importlib_metadata=4.11.4=hd8ed1ab_0
  • importlib_resources=5.7.1=pyhd8ed1ab_1
  • iniconfig=1.1.1=pyh9f0ad1d_0
  • ipython=8.3.0=py39h6e9494a_0
  • iris=3.2.1=pyhd8ed1ab_0
  • jasper=2.0.33=h013e400_0
  • jedi=0.18.1=py39h6e9494a_1
  • jinja2=3.1.2=pyhd8ed1ab_0
  • jmespath=1.0.0=pyhd8ed1ab_0
  • jpeg=9e=h5eb16cf_1
  • json-c=0.16=h01d06f9_0
  • jsonschema=4.5.1=pyhd8ed1ab_0
  • jupyter_core=4.10.0=py39h6e9494a_0
  • kealib=1.4.14=ha22a8b1_3
  • kiwisolver=1.4.2=py39h7248d28_1
  • krb5=1.19.3=hb49756b_0
  • lazy-object-proxy=1.7.1=py39h63b48b0_1
  • lcms2=2.12=h577c468_0
  • lerc=3.0=he49afe7_0
  • libaec=1.0.6=he49afe7_0
  • libblas=3.9.0=14_osx64_openblas
  • libbrotlicommon=1.0.9=h5eb16cf_7
  • libbrotlidec=1.0.9=h5eb16cf_7
  • libbrotlienc=1.0.9=h5eb16cf_7
  • libcblas=3.9.0=14_osx64_openblas
  • libcdms=3.1.2=h7d8e03a_117
  • libcf=1.0.3=py39h8d2e2c5_114
  • libcurl=7.83.1=h372c54d_0
  • libcxx=14.0.3=hc203e6f_0
  • libdap4=3.20.6=h3e144a0_2
  • libdeflate=1.10=h0d85af4_0
  • libdrs=3.1.2=h102a9a4_118
  • libdrs_f=3.1.2=h2534672_114
  • libedit=3.1.20191231=h0678c8f_2
  • libev=4.33=haf1e3a3_1
  • libffi=3.4.2=h0d85af4_5
  • libgdal=3.4.3=h6474519_0
  • libgfortran=5.0.0=9_3_0_h6c81a4c_23
  • libgfortran5=9.3.0=h6c81a4c_23
  • libglib=2.70.2=hf1fb8c0_4
  • libiconv=1.16=haf1e3a3_0
  • libkml=1.3.0=h8fd9edb_1014
  • liblapack=3.9.0=14_osx64_openblas
  • libllvm10=10.0.1=h009f743_3
  • libnetcdf=4.8.1=nompi_h6609ca0_102
  • libnghttp2=1.47.0=h942079c_0
  • libopenblas=0.3.20=openmp_hb3cd9ec_0
  • libpng=1.6.37=h7cec526_2
  • libpq=14.2=hea3049e_0
  • librttopo=1.1.0=hec60dd8_9
  • libspatialite=5.0.1=hadde3e2_15
  • libssh2=1.10.0=h52ee1ee_2
  • libtiff=4.3.0=hfca7e8f_4
  • libuuid=2.32.1=h35c211d_1000
  • libwebp=1.2.2=h28dabe5_0
  • libwebp-base=1.2.2=h0d85af4_1
  • libxcb=1.13=h0d85af4_1004
  • libxml2=2.9.12=h7e28ab6_1
  • libxslt=1.1.33=h1acebb3_3
  • libzip=1.8.0=h8b0c345_1
  • libzlib=1.2.11=h6c3fc93_1014
  • llvm-openmp=14.0.3=ha654fa7_0
  • llvmlite=0.36.0=py39h798a4f4_0
  • locket=1.0.0=pyhd8ed1ab_0
  • lxml=4.8.0=py39h63b48b0_2
  • lz4-c=1.9.3=he49afe7_1
  • markupsafe=2.1.1=py39h63b48b0_1
  • matplotlib-base=3.5.2=py39h64a0072_0
  • matplotlib-inline=0.1.3=pyhd8ed1ab_0
  • msgpack-python=1.0.3=py39h7248d28_1
  • multidict=6.0.2=py39h63b48b0_1
  • munkres=1.1.4=pyh9f0ad1d_0
  • nbformat=5.4.0=pyhd8ed1ab_0
  • nc-time-axis=1.4.1=pyhd8ed1ab_0
  • ncurses=6.3=h96cf925_1
  • netcdf-fortran=4.5.4=nompi_h9ed14b0_100
  • netcdf4=1.5.8=nompi_py39he7d1c46_101
  • nodeenv=1.6.0=pyhd8ed1ab_0
  • nspr=4.32=hcd9eead_1
  • nss=3.78=ha8197d3_0
  • numba=0.53.1=py39h32e38f5_1
  • numcodecs=0.9.1=py39h9fcab8e_2
  • numexpr=2.8.0=py39hbd61c47_2
  • numpy=1.22.4=py39h677350a_0
  • numpy_groupies=0.9.16=pyhd8ed1ab_0
  • openblas=0.3.20=openmp_h5ad848b_0
  • openjpeg=2.4.0=h6e7aa92_1
  • openssl=1.1.1o=hfe4f2af_0
  • packaging=21.3=pyhd8ed1ab_0
  • pandas=1.4.2=py39hbd61c47_1
  • parso=0.8.3=pyhd8ed1ab_0
  • partd=1.2.0=pyhd8ed1ab_0
  • patsy=0.5.2=pyhd8ed1ab_0
  • pcre=8.45=he49afe7_0
  • pexpect=4.8.0=pyh9f0ad1d_2
  • pickleshare=0.7.5=py_1003
  • pillow=9.1.1=py39h579eac4_0
  • pint=0.19.2=pyhd8ed1ab_0
  • pip=22.1.1=pyhd8ed1ab_0
  • pixman=0.40.0=hbcb3906_0
  • platformdirs=2.5.1=pyhd8ed1ab_0
  • pluggy=1.0.0=py39h6e9494a_3
  • pooch=1.6.0=pyhd8ed1ab_0
  • poppler=22.04.0=h39497b0_0
  • poppler-data=0.4.11=hd8ed1ab_0
  • postgresql=14.2=he8fe76e_0
  • pre-commit=2.19.0=py39h6e9494a_0
  • proj=9.0.0=h2364a93_1
  • prompt-toolkit=3.0.29=pyha770c72_0
  • pseudonetcdf=3.2.2=pyhd8ed1ab_0
  • psutil=5.9.1=py39h701faf5_0
  • pthread-stubs=0.4=hc929b4f_1001
  • ptyprocess=0.7.0=pyhd3deb0d_0
  • pure_eval=0.2.2=pyhd8ed1ab_0
  • py=1.11.0=pyh6c4a22f_0
  • pycparser=2.21=pyhd8ed1ab_0
  • pygments=2.12.0=pyhd8ed1ab_0
  • pyopenssl=22.0.0=pyhd8ed1ab_0
  • pyparsing=3.0.9=pyhd8ed1ab_0
  • pyproj=3.3.1=py39hfe197cf_0
  • pyrsistent=0.18.1=py39h63b48b0_1
  • pyshp=2.3.0=pyhd8ed1ab_0
  • pysocks=1.7.1=py39h6e9494a_5
  • pytest=7.1.2=py39h6e9494a_0
  • pytest-cov=3.0.0=pyhd8ed1ab_0
  • pytest-env=0.6.2=py_0
  • pytest-forked=1.4.0=pyhd8ed1ab_0
  • pytest-xdist=2.5.0=pyhd8ed1ab_0
  • python=3.9.12=h8b4d769_1_cpython
  • python-dateutil=2.8.2=pyhd8ed1ab_0
  • python-eccodes=1.4.0=py39h86b5767_1
  • python-fastjsonschema=2.15.3=pyhd8ed1ab_0
  • python-xxhash=3.0.0=py39h63b48b0_1
  • python_abi=3.9=2_cp39
  • pytz=2022.1=pyhd8ed1ab_0
  • pyyaml=6.0=py39h63b48b0_4
  • rasterio=1.2.10=py39hc9cfaa8_5
  • readline=8.1=h05e3726_0
  • requests=2.27.1=pyhd8ed1ab_0
  • s3transfer=0.5.2=pyhd8ed1ab_0
  • scipy=1.8.1=py39hfa1a3ab_0
  • seaborn=0.11.2=hd8ed1ab_0
  • seaborn-base=0.11.2=pyhd8ed1ab_0
  • setuptools=62.3.2=py39h6e9494a_0
  • shapely=1.8.2=py39hd471b09_1
  • six=1.16.0=pyh6c4a22f_0
  • snappy=1.1.9=h6e38e02_1
  • snuggs=1.4.7=py_0
  • sortedcontainers=2.4.0=pyhd8ed1ab_0
  • sparse=0.13.0=pyhd8ed1ab_0
  • sqlite=3.38.5=hd9f0692_0
  • stack_data=0.2.0=pyhd8ed1ab_0
  • statsmodels=0.13.2=py39hc89836e_0
  • tblib=1.7.0=pyhd8ed1ab_0
  • tiledb=2.8.3=hf17dc58_1
  • tk=8.6.12=h5dbffcc_0
  • toml=0.10.2=pyhd8ed1ab_0
  • tomli=2.0.1=pyhd8ed1ab_0
  • toolz=0.11.2=pyhd8ed1ab_0
  • tornado=6.1=py39h63b48b0_3
  • traitlets=5.2.1.post0=pyhd8ed1ab_0
  • typing-extensions=4.2.0=hd8ed1ab_1
  • typing_extensions=4.2.0=pyha770c72_1
  • tzcode=2022a=h5eb16cf_0
  • tzdata=2022a=h191b570_0
  • udunits2=2.2.28=h06ef574_0
  • ukkonen=1.0.1=py39h7248d28_2
  • unicodedata2=14.0.0=py39h63b48b0_1
  • urllib3=1.26.9=pyhd8ed1ab_0
  • virtualenv=20.14.1=py39h6e9494a_0
  • wcwidth=0.2.5=pyh9f0ad1d_2
  • wheel=0.37.1=pyhd8ed1ab_0
  • wrapt=1.14.1=py39h701faf5_0
  • xerces-c=3.2.3=h6564042_4
  • xorg-libxau=1.0.9=h35c211d_0
  • xorg-libxdmcp=1.1.3=h35c211d_0
  • xxhash=0.8.0=h35c211d_3
  • xz=5.2.5=haf1e3a3_1
  • yaml=0.2.5=h0d85af4_2
  • yarl=1.7.2=py39h63b48b0_2
  • zarr=2.11.3=pyhd8ed1ab_0
  • zict=2.2.0=pyhd8ed1ab_0
  • zipp=3.8.0=pyhd8ed1ab_0
  • zlib=1.2.11=h6c3fc93_1014
  • zstd=1.5.2=ha9df2e0_1
  • pip:
    • numbagg==0.2.1
      prefix: /Users/mattia/mambaforge/envs/xarray
@joshmoore joshmoore added the help wanted Issue could use help from someone with familiarity on the topic label May 30, 2022
@joshmoore
Copy link
Member

see:

I think I'd tend towards "Only strings are supported" but another sub-option would be to print a warning and automatically stringify.

@clbarnes
Copy link
Contributor

Agree on only supporting strings, much easier for everyone.

@joshmoore
Copy link
Member

@clbarnes (or anyone else): and thoughts on completely disallowing versus stringifying arguments?

@clbarnes
Copy link
Contributor

clbarnes commented Jul 4, 2022

Strong preference for documenting/ annotating that keys must be strings (and that values must be json-serialisable), and not stringifying non-string keys (which would be unpredictable for different types).

JavaScript gets pretty confusing with its string coercion, no need to open ourselves up to that. Python's got a whole lot better to develop in now there's more focus on sane types. Explicitly checking for key string-ness won't check for the JSON-validity the value (which might be a dict with non-string keys), so it's probably best to just leave it all up to the serialiser.

@joshmoore
Copy link
Member

@clbarnes, objection noted. I guess I'm most worried about the impact of the breaking change since at the moment a single update (z.attrs[1] = "foo") gets you:

/tmp $cat test.zarr/.zattrs
{
    "1": "foo"
}

@clbarnes
Copy link
Contributor

Maybe a couple of versions of noisy deprecation warnings, then? Even if it never actually gets patched out until v3.

@joshmoore
Copy link
Member

Thanks all. Rolling @malmans2's #1066 into 2.13 but leaving this open for the deprecation cycle.

@joshmoore
Copy link
Member

Strike that see #1125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue could use help from someone with familiarity on the topic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants