Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define system architecture #8

Closed
alimanfoo opened this issue Mar 13, 2019 · 5 comments
Closed

Define system architecture #8

alimanfoo opened this issue Mar 13, 2019 · 5 comments
Labels
core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec v3-meta

Comments

@alimanfoo
Copy link
Member

Zarr and n5 naturally decouple different components of their architecture, in a way that allows clear and simple interfaces to be defined between them, and that allows for pluggability. For example, both define a storage layer interface, based on storage and retrieval of key/value pairs, which can then have pluggable implementations (file system, S3, mongodb, ...). Zarr (via numcodecs) also defines a codec API which enables new filters or compressors to be plugged in.

It could be very helpful to draw a picture of this system architecture, and then to discuss what aspects of the architecture should be covered in a "core" spec, versus where additional specs can be layered on top or plugged in.

For example, we might have a core spec that formally defines the system architecture, defines abstract APIs for storage and chunk encoding/decoding, and defines how essential metadata are formatted. Then we might have a separate spec for each storage layer implementation, that defines how keys and values are mapped into concrete storage entities like file paths and file contents. And we might have a separate spec for each codec, that defines the encoding process and format.

In other words, a modular spec architecture, that allows new specs for things like storage or encoding to be plugged in without affecting the core spec.

@alimanfoo alimanfoo changed the title Define modular architecture Define system architecture Mar 13, 2019
@alimanfoo
Copy link
Member Author

OK, here's a first go at drawing something, based on our conversation today on the zarr/n5 call:

zarr architecture minimal 2019-03-13 (1)

This is not complete and a bad drawing (apologies) but hopefully captures a couple of useful concepts, including:

  • Array metadata
    • Filter chain metadata
  • Codec interface
  • Codec implementations
  • Storage interface
  • Storage transformations
  • Storage implementations

We discussed all of the above today, except for storage transformations, which I've added in - this is just to capture the idea that you may want to define transformations on the keys and/or values, before they hit the storage implementation.

@alimanfoo
Copy link
Member Author

Just adding another thought, the consolidated metadata feature could also be an example of a storage transformation.

@joshmoore
Copy link
Member

On the call that @alimanfoo refers to, @axtimwalde brought up the idea of having the minimum API be simply the writing of byte arrays/streams. I still find this intriguing, so a few thoughts that have been rattling around:

  • I think I gave up too early on endianness. I disagree that byte order is not important for portability. It should either be specified in the spec even at this low-level or detectable in the byte stream itself.
  • There was a statement that the shape does not apply at this "no-memory-layout". At least conceptually, that leads me to believe we're talking about a "zobject" as opposed to a "zarray". chunks still apply and is worded nicely since you're saying, "break this stream into num_chunks = d1 x d2 x ... x dn".
  • "zarrays" then conceptually subclass "zobject" to turn those num_chunks byte arrays into an ND-array by applying a memory layout on top of the streams.
  • But I still have the inclination to say that an array of memory layouts could be placed on top of the stream to product multiple objects: List<Object> applyMemoryLayouts(InputStream...)

@jstriebel jstriebel added core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec v3-meta labels Nov 16, 2022
@jstriebel
Copy link
Member

There are now two informative illustrations in the v3 spec which go into this direction:
https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#concepts-and-terminology

Defining a system architecture in the spec more concretely might also be imposing too many restrictions on impelementations.

@alimanfoo @joshmoore Do you feel like the important parts are covered for now?

@joshmoore
Copy link
Member

Having one image that gets all the base concepts across is probably something to shoot for eventually but need not be exactly the one above nor done immediately. So 👍 if you want to close from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec v3-meta
Projects
Status: Done
Development

No branches or pull requests

3 participants