-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define system architecture #8
Comments
OK, here's a first go at drawing something, based on our conversation today on the zarr/n5 call: This is not complete and a bad drawing (apologies) but hopefully captures a couple of useful concepts, including:
We discussed all of the above today, except for storage transformations, which I've added in - this is just to capture the idea that you may want to define transformations on the keys and/or values, before they hit the storage implementation. |
Just adding another thought, the consolidated metadata feature could also be an example of a storage transformation. |
On the call that @alimanfoo refers to, @axtimwalde brought up the idea of having the minimum API be simply the writing of byte arrays/streams. I still find this intriguing, so a few thoughts that have been rattling around:
|
There are now two informative illustrations in the v3 spec which go into this direction: Defining a system architecture in the spec more concretely might also be imposing too many restrictions on impelementations. @alimanfoo @joshmoore Do you feel like the important parts are covered for now? |
Having one image that gets all the base concepts across is probably something to shoot for eventually but need not be exactly the one above nor done immediately. So 👍 if you want to close from my side. |
Zarr and n5 naturally decouple different components of their architecture, in a way that allows clear and simple interfaces to be defined between them, and that allows for pluggability. For example, both define a storage layer interface, based on storage and retrieval of key/value pairs, which can then have pluggable implementations (file system, S3, mongodb, ...). Zarr (via numcodecs) also defines a codec API which enables new filters or compressors to be plugged in.
It could be very helpful to draw a picture of this system architecture, and then to discuss what aspects of the architecture should be covered in a "core" spec, versus where additional specs can be layered on top or plugged in.
For example, we might have a core spec that formally defines the system architecture, defines abstract APIs for storage and chunk encoding/decoding, and defines how essential metadata are formatted. Then we might have a separate spec for each storage layer implementation, that defines how keys and values are mapped into concrete storage entities like file paths and file contents. And we might have a separate spec for each codec, that defines the encoding process and format.
In other words, a modular spec architecture, that allows new specs for things like storage or encoding to be plugged in without affecting the core spec.
The text was updated successfully, but these errors were encountered: