Skip to content

Reading data that was written with deprecated bytes codec #3513

@rabernat

Description

@rabernat

Prior to 3.1, it was possible to write an array that looked like this

doc = {
 'shape': [],
 'data_type': 'bytes',
 'chunk_grid': {'name': 'regular', 'configuration': {'chunk_shape': []}},
 'chunk_key_encoding': {'name': 'default',
  'configuration': {'separator': '/'}},
 'fill_value': [],
 'codecs': [{'name': 'vlen-bytes', 'configuration': {}},
  {'name': 'zstd', 'configuration': {'level': 0, 'checksum': False}}],
 'attributes': {},
 'zarr_format': 3,
 'node_type': 'array',
 'storage_transformers': []
}

Attempting to load this data errors

import zarr
zarr.core.metadata.ArrayV3Metadata.from_dict(doc)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/dtype/registry.py:208, in DataTypeRegistry.match_json(self, data, zarr_format)
    206     except DataTypeValidationError:
    207         pass
--> [208](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/dtype/registry.py:208) raise ValueError(f"No Zarr data type found that matches {data!r}")

ValueError: No Zarr data type found that matches 'bytes'

The following tweaks make it loadable

doc["data_type"] = "variable_length_bytes"
doc["fill_value"] = ""

It would be nice if we

  1. Had an alias for the deprecated bytes dtype to variable_length_bytes
  2. Could deal with fill_value = [] here

Otherwise data that was written with older Zarr versions is not interoperable with new ones.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions