Define system architecture #8

alimanfoo · 2019-03-13T21:40:27Z

Zarr and n5 naturally decouple different components of their architecture, in a way that allows clear and simple interfaces to be defined between them, and that allows for pluggability. For example, both define a storage layer interface, based on storage and retrieval of key/value pairs, which can then have pluggable implementations (file system, S3, mongodb, ...). Zarr (via numcodecs) also defines a codec API which enables new filters or compressors to be plugged in.

It could be very helpful to draw a picture of this system architecture, and then to discuss what aspects of the architecture should be covered in a "core" spec, versus where additional specs can be layered on top or plugged in.

For example, we might have a core spec that formally defines the system architecture, defines abstract APIs for storage and chunk encoding/decoding, and defines how essential metadata are formatted. Then we might have a separate spec for each storage layer implementation, that defines how keys and values are mapped into concrete storage entities like file paths and file contents. And we might have a separate spec for each codec, that defines the encoding process and format.

In other words, a modular spec architecture, that allows new specs for things like storage or encoding to be plugged in without affecting the core spec.

alimanfoo · 2019-03-13T22:32:40Z

OK, here's a first go at drawing something, based on our conversation today on the zarr/n5 call:

This is not complete and a bad drawing (apologies) but hopefully captures a couple of useful concepts, including:

Array metadata
- Filter chain metadata
Codec interface
Codec implementations
Storage interface
Storage transformations
Storage implementations

We discussed all of the above today, except for storage transformations, which I've added in - this is just to capture the idea that you may want to define transformations on the keys and/or values, before they hit the storage implementation.

alimanfoo · 2019-03-14T22:15:04Z

Just adding another thought, the consolidated metadata feature could also be an example of a storage transformation.

joshmoore · 2019-03-16T16:40:41Z

On the call that @alimanfoo refers to, @axtimwalde brought up the idea of having the minimum API be simply the writing of byte arrays/streams. I still find this intriguing, so a few thoughts that have been rattling around:

I think I gave up too early on endianness. I disagree that byte order is not important for portability. It should either be specified in the spec even at this low-level or detectable in the byte stream itself.
There was a statement that the shape does not apply at this "no-memory-layout". At least conceptually, that leads me to believe we're talking about a "zobject" as opposed to a "zarray". chunks still apply and is worded nicely since you're saying, "break this stream into num_chunks = d1 x d2 x ... x dn".
"zarrays" then conceptually subclass "zobject" to turn those num_chunks byte arrays into an ND-array by applying a memory layout on top of the streams.
But I still have the inclination to say that an array of memory layouts could be placed on top of the stream to product multiple objects: List<Object> applyMemoryLayouts(InputStream...)

jstriebel · 2023-02-22T12:43:48Z

There are now two informative illustrations in the v3 spec which go into this direction:
https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#concepts-and-terminology

Defining a system architecture in the spec more concretely might also be imposing too many restrictions on impelementations.

@alimanfoo @joshmoore Do you feel like the important parts are covered for now?

joshmoore · 2023-03-01T08:30:01Z

Having one image that gets all the base concepts across is probably something to shoot for eventually but need not be exactly the one above nor done immediately. So 👍 if you want to close from my side.

alimanfoo changed the title ~~Define modular architecture~~ Define system architecture Mar 13, 2019

alimanfoo mentioned this issue Mar 14, 2019

Implement N5Store as a transformation layer over other stores zarr-developers/n5py#9

Open

alimanfoo mentioned this issue Mar 26, 2019

Docs file structure #11

Closed

joshmoore mentioned this issue May 8, 2019

Core protocol v3.0 - chunk grids #22

Merged

jstriebel added core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec v3-meta labels Nov 16, 2022

jstriebel closed this as completed Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define system architecture #8

Define system architecture #8

alimanfoo commented Mar 13, 2019

alimanfoo commented Mar 13, 2019

alimanfoo commented Mar 14, 2019

joshmoore commented Mar 16, 2019

jstriebel commented Feb 22, 2023

joshmoore commented Mar 1, 2023

Define system architecture #8

Define system architecture #8

Comments

alimanfoo commented Mar 13, 2019

alimanfoo commented Mar 13, 2019

alimanfoo commented Mar 14, 2019

joshmoore commented Mar 16, 2019

jstriebel commented Feb 22, 2023

joshmoore commented Mar 1, 2023