Skip to content

compatibility with the v3 bytes dtype #3517

@d-v-b

Description

@d-v-b

there's a v3 data type definition for a variable-length bytes data type: https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/bytes which was not on my radar when I added variable-length bytes support in #2874.

The v3 bytes data type is incompatible with the VariableLengthBytes data type that I implemented in #2874. The differences are:

data type identifier fill value
v3 bytes dtype "bytes" array of ints (one per byte)
Zarr Python VariableLengthBytes dtype "variable_length_bytes" string (base64-encoded bytes)

As an ecosystem we should probably not have 2 nearly identical data types. That argues for consolidating these two. Since the VariableLengthBytes data type doesn't have a spec, I think its current behavior should be deprecated and we should either modify it to comply with the v3 bytes data type spec, or introduce a brand new data type class that complies with that spec.

Either way we can be compatible with older data by taking "vlen-bytes" as an alias for "bytes", and reading (but not writing) the base64-encoded fill value.

Any thoughts or preferences for these two options? Modifying the JSON form of the existing data type would break the ability for older versions of zarr-python to read the data type metadata, but we also loudly warned about this with warnings on the data type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions