Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V3] Should Group be dict-like? #1787

Closed
d-v-b opened this issue Apr 11, 2024 · 2 comments
Closed

[V3] Should Group be dict-like? #1787

d-v-b opened this issue Apr 11, 2024 · 2 comments
Labels
design discussion V3 Related to compatibility with V3 spec
Milestone

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Apr 11, 2024

Although the V3 Group does not inherit from Mapping or MutableMapping, it has methods that are mapping-like, e.g. __contains__, __getitem__, __iter__, __delitem__ etc. I think this makes sense in terms of continuity with zarr v2, where Group explicitly implemented MutableMapping, but I'm not sure we ultimately want to do this.

Here I will argue against making Group to be dict-like.

First, an argument from fairness: conceptually, a Group can be expressed as the intersection of "a thing that contains sub-groups and sub-arrays" and "a thing that contains JSON-like attributes". Both "contains sub-groups and sub-arrays" and "contains JSON" can be implemented with the MutableMapping API. So a Group is two mutable mappings, not one. But if Group is itself a MutableMapping, then said methods have to pick one of the two possible sub-mappings to target. I don't feel confident in stating that the members of a group are more or less "important" than the attributes of the group. So I don't feel confident in picking one of these two mappings as the target for MutableMapping methods on Group. Thus I think using MutableMapping for the members of a Group, while delegating the same API for attributes under a .attrs is unfair (it's an arbitrary, and unnecessary, decision), and probably a mistake in V2, and also a mistake in h5py (where I think this API originated). There's a simple solution here: treat Group.members and Group.attributes each as their own MutableMapping, and don't pretend that Group is a dict, because it's not -- if anything, it's two dicts.

Second, an argument from ergonomics. We want to support something like x = Group["subarray"], or x = Group.members["subarray"]. This expands to x = (access the metadata for "subarray", then return an in-memory Array instance from it). Creating that in-memory Array instance will almost certainly take runtime parameters, e.g. write_empty_chunks, caching control, etc. But __getitem__ is hard to parametrize. If we rely on Group["subarray"], we are forced to choose from a few unattractive options:

  • we can define those Array parameters in a global configuration object, or as properties of the Group instance
  • we can ask users to modify the Array instance after assigning it.
  • we can remove those runtime parameters from the Array constructor and push them down to the array methods (which honestly I kind of like)
    But if we leave behind the dict API, then we just have regular functions that can be parametrized as needed, which gives us a lot of flexibility. So I would be curious to know what we get from the dict API that we would stand to lose here, besides continuity with v2, which I don't value too much at this point.

So, my conclusion is that we should not implement Group.__getitem__, or any other Mapping-like method on Group. There might be a good argument for implementing the MutableMapping API for Group.attrs and Group.members, but maybe not. Curious to hear whether anyone has strong objections to this direction.

cc @jhamman @normanrz

@d-v-b d-v-b added the V3 Related to compatibility with V3 spec label Apr 11, 2024
@d-v-b d-v-b mentioned this issue Apr 12, 2024
6 tasks
@normanrz
Copy link
Contributor

I guess the alternative to implementing Group.__getitem__ would be to have Group.open_array methods? I think that is ok and allows additional arguments, such as runtime_configuration, to be passed.

@jhamman jhamman added this to the 3.0.0.alpha milestone Apr 22, 2024
@jhamman jhamman closed this as completed May 17, 2024
@d-v-b
Copy link
Contributor Author

d-v-b commented May 17, 2024

the current state of v3 is that the top-level Group object does not implement MutableMapping, i.e. it is not dict-like. We do anticipate that this may inconvenience anyone who used the MutableMapping methods from the v2 codebase; accordingly, we would be open to adding these methods (with deprecation warnings) temporarily in early v3 releases, should demand arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion V3 Related to compatibility with V3 spec
Projects
Status: Done
Development

No branches or pull requests

3 participants