Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store type of store in top level metadata ? #65

Open
Carreau opened this issue May 6, 2020 · 7 comments
Open

Store type of store in top level metadata ? #65

Carreau opened this issue May 6, 2020 · 7 comments

Comments

@Carreau
Copy link
Contributor

Carreau commented May 6, 2020

I'm not 100% sure this is a spec question, or an implementation one, and it is mostly being driven by this morning questions on the community call.

Right now it looks to me that users need to specify which store they want to use, and that some guesses can be done depending on extensions (normalize_store_arg ?).

Currently this prevent to deeply change or experiment with store with similar structure without being aware of the kind of store one is working on.

Would it be interesting to have the (top level?) metadata to have a description of the kind of store that should be expected ?

Obviously for some of the stores it's hard, but for url-based or directory based stores, it should be pretty easy and give some flexibility WRT change of internal data structure, and/or bug fixes.

@alimanfoo
Copy link
Member

alimanfoo commented May 7, 2020 via email

@Carreau
Copy link
Contributor Author

Carreau commented May 7, 2020

Thanks,

Let me withdraw the "store in top level medata" and rephrase:

  • For really similar stores, for example, bug fixes between store versions how can a project like intake detect what kind /version of a store it is talking to ?

While for low level usage it is reasonable to specify a store/protocol, having a robust way to detect might be nice. It might be that a store has (had) a bug, and you might need to know which version of a store created a hierarchy/dataset and follow a different codepath.

Even if we avoid this use case, I feel like having a given way of saying "I am a ..." would be really useful for non technical users, who just want to call "open()", or drag and drop folders/files onto GUI.

Now it might not be .zgroup, and it might not be available for all store, but a would a convention like having a .store that have store-specific informations.

@alimanfoo
Copy link
Member

Hi @Carreau, I see where you're coming from I think, and these are valid considerations. I think we just need to figure out where they should live within the zarr architecture.

In the zarr architecture, a "store" is something that implements key/value operations, where keys are strings and values are arbitrary byte sequences. That is it. A "store" is completely agnostic to what is stored there. I.e., you could use a "store" to store any kind of data, not necessarily zarr data. It is just a common abstraction over a set of storage technologies, which includes file systems, cloud object stores, and key/value databases.

Now it might not be .zgroup, and it might not be available for all store, but a would a convention like having a .store that have store-specific informations.

This probably just needs some clarification regarding exactly what we mean by "store".

E.g., I use "store" to simply mean something that implements key/value operations. So you need to know what type of store it is before you can start retrieving any data from it.

@Carreau
Copy link
Contributor Author

Carreau commented May 11, 2020

Thanks, yes that make sens, and I think we need to have better separation between the store as an API and the internal of the storage system.

I think it is perfectly fine to know which store we are dealing with before opening a Zarr "connection" with this store, now can we come up with a better mechanism for discovering the type of store when those a behind a URL/Filesystem.

I feel like a high level zarr.open() should be able to have some extra logic to not have the user aware of the type of store, but that zarr.core.open() must be given a type of store explicitetly.

@alimanfoo
Copy link
Member

Inferring the type of the store from a URL-like string would seem like a reasonable approach to me, and should work for at least some store types. It could get a bit tricky in some cases. Also relevant is fsspec on URL chaining.

@joshmoore
Copy link
Member

For what it's worth, I've been pondering recently whether Zarr v2 couldn't be made to (optionally) detect consolidated and nesting, or at least to try one location and then fallback to the other.

@jstriebel
Copy link
Member

For v3 the goal is to have a clearly addressable URI (see #132), and all relevant metadata how to open the hierarchy/group/array should be stored in clearly defined metadata. This should help to avoid specifying storage details when opening an array, without needing to specify the type of store in the metadata itself. It would rather be encoded in the URI, and further store-specific settings would be part of the metadata, e.g. as storage transformers.

@joshmoore joshmoore changed the title Store type of store in top level medata ? Store type of store in top level metadata ? Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants