You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although the V3 Group does not inherit from Mapping or MutableMapping, it has methods that are mapping-like, e.g. __contains__, __getitem__, __iter__, __delitem__ etc. I think this makes sense in terms of continuity with zarr v2, where Group explicitly implemented MutableMapping, but I'm not sure we ultimately want to do this.
Here I will argue against making Group to be dict-like.
First, an argument from fairness: conceptually, a Group can be expressed as the intersection of "a thing that contains sub-groups and sub-arrays" and "a thing that contains JSON-like attributes". Both "contains sub-groups and sub-arrays" and "contains JSON" can be implemented with the MutableMapping API. So a Group is two mutable mappings, not one. But if Group is itself a MutableMapping, then said methods have to pick one of the two possible sub-mappings to target. I don't feel confident in stating that the members of a group are more or less "important" than the attributes of the group. So I don't feel confident in picking one of these two mappings as the target for MutableMapping methods on Group. Thus I think using MutableMapping for the members of a Group, while delegating the same API for attributes under a .attrs is unfair (it's an arbitrary, and unnecessary, decision), and probably a mistake in V2, and also a mistake in h5py (where I think this API originated). There's a simple solution here: treat Group.members and Group.attributes each as their own MutableMapping, and don't pretend that Group is a dict, because it's not -- if anything, it's two dicts.
Second, an argument from ergonomics. We want to support something like x = Group["subarray"], or x = Group.members["subarray"]. This expands to x = (access the metadata for "subarray", then return an in-memory Array instance from it). Creating that in-memory Array instance will almost certainly take runtime parameters, e.g. write_empty_chunks, caching control, etc. But __getitem__ is hard to parametrize. If we rely on Group["subarray"], we are forced to choose from a few unattractive options:
we can define those Array parameters in a global configuration object, or as properties of the Group instance
we can ask users to modify the Array instance after assigning it.
we can remove those runtime parameters from the Array constructor and push them down to the array methods (which honestly I kind of like)
But if we leave behind the dict API, then we just have regular functions that can be parametrized as needed, which gives us a lot of flexibility. So I would be curious to know what we get from the dict API that we would stand to lose here, besides continuity with v2, which I don't value too much at this point.
So, my conclusion is that we should not implement Group.__getitem__, or any other Mapping-like method on Group. There might be a good argument for implementing the MutableMapping API for Group.attrs and Group.members, but maybe not. Curious to hear whether anyone has strong objections to this direction.
I guess the alternative to implementing Group.__getitem__ would be to have Group.open_array methods? I think that is ok and allows additional arguments, such as runtime_configuration, to be passed.
the current state of v3 is that the top-level Group object does not implement MutableMapping, i.e. it is not dict-like. We do anticipate that this may inconvenience anyone who used the MutableMapping methods from the v2 codebase; accordingly, we would be open to adding these methods (with deprecation warnings) temporarily in early v3 releases, should demand arise.
Although the V3
Group
does not inherit fromMapping
orMutableMapping
, it has methods that are mapping-like, e.g.__contains__
,__getitem__
,__iter__
,__delitem__
etc. I think this makes sense in terms of continuity with zarr v2, whereGroup
explicitly implementedMutableMapping
, but I'm not sure we ultimately want to do this.Here I will argue against making
Group
to be dict-like.First, an argument from fairness: conceptually, a
Group
can be expressed as the intersection of "a thing that contains sub-groups and sub-arrays" and "a thing that contains JSON-like attributes". Both "contains sub-groups and sub-arrays" and "contains JSON" can be implemented with theMutableMapping
API. So aGroup
is two mutable mappings, not one. But ifGroup
is itself aMutableMapping
, then said methods have to pick one of the two possible sub-mappings to target. I don't feel confident in stating that the members of a group are more or less "important" than the attributes of the group. So I don't feel confident in picking one of these two mappings as the target forMutableMapping
methods onGroup
. Thus I think usingMutableMapping
for the members of a Group, while delegating the same API for attributes under a.attrs
is unfair (it's an arbitrary, and unnecessary, decision), and probably a mistake in V2, and also a mistake in h5py (where I think this API originated). There's a simple solution here: treatGroup.members
andGroup.attributes
each as their ownMutableMapping
, and don't pretend thatGroup
is a dict, because it's not -- if anything, it's two dicts.Second, an argument from ergonomics. We want to support something like
x = Group["subarray"]
, orx = Group.members["subarray"]
. This expands tox = (access the metadata for "subarray", then return an in-memory Array instance from it)
. Creating that in-memoryArray
instance will almost certainly take runtime parameters, e.g.write_empty_chunks
, caching control, etc. But__getitem__
is hard to parametrize. If we rely onGroup["subarray"]
, we are forced to choose from a few unattractive options:Array
parameters in a global configuration object, or as properties of theGroup
instanceArray
instance after assigning it.Array
constructor and push them down to the array methods (which honestly I kind of like)But if we leave behind the dict API, then we just have regular functions that can be parametrized as needed, which gives us a lot of flexibility. So I would be curious to know what we get from the dict API that we would stand to lose here, besides continuity with v2, which I don't value too much at this point.
So, my conclusion is that we should not implement
Group.__getitem__
, or any otherMapping
-like method onGroup
. There might be a good argument for implementing theMutableMapping
API forGroup.attrs
andGroup.members
, but maybe not. Curious to hear whether anyone has strong objections to this direction.cc @jhamman @normanrz
The text was updated successfully, but these errors were encountered: