Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core protocol v3.0 status #53

Closed
jrbourbeau opened this issue Mar 30, 2020 · 3 comments
Closed

Core protocol v3.0 status #53

jrbourbeau opened this issue Mar 30, 2020 · 3 comments
Labels

Comments

@jrbourbeau
Copy link
Member

Hi All!

I spent some time looking through the work surrounding the v3.0 core protocol over in #16. My goal for this issue is to summarize the current status of this work and help spur conversation in the community. Any feedback can then be used to guide and prioritize future work on the core protocol and protocol extensions.

cc @alimanfoo @jakirkham @joshmoore @ryan-williams

Specification development process document (current status)

  • Defines concept of a core protocol, protocol extensions, stores, and codecs
  • Will define the process for minor/major changes to the core protocol and how decisions are made
  • Could use feedback from the community

Core protocol (current status)

  • Core concepts and terminology

    • E.g. arrays, groups, chunks, etc.
    • These all seem to be well defined and in good shape overall
  • Node names

    • Restriction to node name characters and some possible names
    • Case insensitive uniqueness of siblings
    • Question: Are the restrictions on node names too restrictive?
  • Data types

  • Chunking

    • Core protocol consists of regular grid. Other grid types, e.g. non-uniform chunking or unknown chunk sizes, can be defined via protocol extensions
    • Core protocol uses C- and F-order for the memory layout of each chunk. Other layouts, e.g. sparse memory layouts, are possible via protocol extensions
    • Chunk encoding consists of a compressor codec. Note this does not include filters, which can be supported via protocol extensions
  • Metadata

    • Three types of metadata documents: bootstrap metadata, array metadata, and group metadata
    • The bootstrap metadata doc must be encoded in JSON, while the array and group metadata docs can use other encodings
    • Bootstrap metadata document contains the protocol specification used (e.g. v3.0, v3.1, etc.), how the array and group metadata documents are encoded (default is JSON), and a list of protocol extensions used
    • Array metadata document contains the array shape, data type, user-defined attributes, etc.
      • Protocol extension points include: data type, chunk grid type, and chunk memory layout
      • extensions metadata value need to be defined in protocol spec
      • Question: There seems to be a question about how to specify the fill_value for dtypes other than bool and int
    • Group metadata document contains protocol extensions and user-defined attributes
  • Stores

    • Defines abstract store interface which can be implemented on top of different storage technology backends
    • Abstract interface methods for operating on keys and values in a store include get, set, delete, etc.
    • Not all abstract methods need to be implemented (e.g. can have a read-only store)
    • Core protocol does not define any store implementations, but gives examples of possible implementations
    • Some protocol operation need to be filled out
  • Protocol extensions

    • This section needs to be completed

Protocol extensions (current status)

Three protocol extensions are currently in progress:

  • Datetime data types - looks relatively filled out
  • Complex data types - currently a scaffolding
  • Filters - currently a scaffolding

Several other possible extensions are outlined in #49

Stores (current status)

  • Currently one store spec in progress, the file system store
@alimanfoo
Copy link
Member

Thanks @jrbourbeau, this is a great summary of current status.

One thing I've been meaning to add for a while but haven't got around to yet is an example of a codec spec, e.g., for gzip. It would be just for illustration, so there is a concrete example of what we mean by a codec spec, and what a codec spec might look like.

Re the questions you highlight, maybe we could break them out into separate issues? Could label them as core-protocol-v3.0 so it's clear they're part of discussion of the v3 core protocol, rather than something else? (And we probably need some general way of categorising issues to help with triage?)

@abergou
Copy link

abergou commented Mar 1, 2021

I have a question regarding non-uniform chunking: are the protocol extensions something that would need to be incorporated into a future version of the zarr spec (say zarr spec 3.5 or spec 4); or is the plan to expose an api that would allow for the definition of "fancier" arrays with spec 3? If the latter then do you know the status of exposing such functionality within zarr?

@jstriebel
Copy link
Member

jstriebel commented Dec 5, 2022

are the protocol extensions something that would need to be incorporated into a future version of the zarr spec (say zarr spec 3.5 or spec 4); or is the plan to expose an api that would allow for the definition of "fancier" arrays with spec 3? If the latter then do you know the status of exposing such functionality within zarr?

@abergou v3 now incorporates extension points that can be used by future extensions to provide additional features (e.g. fancier arrays) to v3. An overview of extension points is given here: https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html#extensions-section

We're following the ZEP process now, (see ZEP 0), and just posted an update on the next steps: https://zarr.dev/blog/zep1-update. Different examples for extensions are tracked in issue #169.
Closing this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

4 participants