Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use __reduce__ for pickling instead of __setstate__ / __getstate__ #1089

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions zarr/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2353,13 +2353,18 @@ def hexdigest(self, hashname="sha1"):

return checksum

def __getstate__(self):
return (self._store, self._path, self._read_only, self._chunk_store,
self._synchronizer, self._cache_metadata, self._attrs.cache,
self._partial_decompress, self._write_empty_chunks, self._version)

def __setstate__(self, state):
self.__init__(*state)
def __reduce__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on implementing __getnewargs_ex__ instead?

In particular this Python doc text seems relevant:

Although powerful, implementing __reduce__() directly in your classes is error prone. For this reason, class designers should use the high-level interface (i.e., __getnewargs_ex__(), __getstate__() and __setstate__()) whenever possible.

Same question with other changes below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does implementing __getnewargs_ex__ avoid the need to call __init__ directly in __setstate__?

Copy link
Member

@jakirkham jakirkham Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this only returns the arguments that are passed to __new__ and not a function to call (nor does it call one) itself. Also because __setstate__ would be dropped then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We may be able to use __getnewargs__ instead of __getnewargs_ex__ if we only use positional arguments, which is preferred

From __getnewargs_ex__ docs:

You should implement this method if the __new__() method of your class requires keyword-only arguments. Otherwise, it is recommended for compatibility to implement __getnewargs__().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how is passing arguments to __new__ helpful here? We need to run routines contained in __init__ to properly set up the class instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you could point me to an example of this pattern it would be helpful i think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__new__ is run before __init__. The arguments would go to __init__ as well. We just wouldn't have to do that ourselves. Details in these docs.

Sure this would work

import pickle


class Value:
    def __init__(self, value):
        self.value = value

    def __getnewargs__(self):
        return (self.value,)


v = Value(5)
v2 = pickle.loads(pickle.dumps(v))
assert 5 == v.value == v2.value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's very helpful! I'll look into using this instead of __reduce__

args = (self.store,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using self.store instead of self._store as used before? Same question with other lines below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure which to use, but the public properties seemed more intuitive to me. Happy to change to using the private properties if there's a material difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with what is already here

self.path,
self.read_only,
self.chunk_store,
self.synchronizer,
self._cache_metadata,
self.attrs.cache,
self._partial_decompress,
self.write_empty_chunks,
self._version)
return (self.__class__, args)

def _synchronized_op(self, f, *args, **kwargs):

Expand Down
15 changes: 9 additions & 6 deletions zarr/hierarchy.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,12 +350,15 @@ def typestr(o):

return items

def __getstate__(self):
return (self._store, self._path, self._read_only, self._chunk_store,
self.attrs.cache, self._synchronizer)

def __setstate__(self, state):
self.__init__(*state)
def __reduce__(self):
args = (self.store,
self.path,
self.read_only,
self.chunk_store,
self.attrs.cache,
self.synchronizer,
self._version)
return (self.__class__, args)

def _item_path(self, item):
absolute = isinstance(item, str) and item and item[0] == '/'
Expand Down