Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use __reduce__ for pickling instead of __setstate__ / __getstate__ #1089

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 19, 2022

In main, Array and Group have __setstate__ methods that directly call __init__, which isn't great (mypy complains about this). We can avoid calling __init__ by defining __reduce__ for these classes instead of __setstate__ / __getstate__.

Note that there are a lot of store classes that use __setstate__ / __getstate__ with direct calls of __init__. Clearing that up might be in scope for additional commits to this branch.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #1089 (2c6c779) into main (ece1810) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 2c6c779 differs from pull request most recent head 364d793. Consider uploading reports for the commit 364d793 to get more accurate results

@@           Coverage Diff           @@
##             main    #1089   +/-   ##
=======================================
  Coverage   99.94%   99.94%           
=======================================
  Files          34       34           
  Lines       13846    13844    -2     
=======================================
- Hits        13839    13837    -2     
  Misses          7        7           
Impacted Files Coverage Δ
zarr/core.py 100.00% <100.00%> (ø)
zarr/hierarchy.py 99.78% <100.00%> (-0.01%) ⬇️

@joshmoore
Copy link
Member

Things are green and reading https://docs.python.org/3/library/pickle.html#object.__reduce__ this seems to follow sensibly, but open to other opinions.

@d-v-b: can you coordinate the entries you want for docs/release.rst for this?

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Davis! 🙏

Agree this can be better. Have some suggestions on how to approach that below


def __setstate__(self, state):
self.__init__(*state)
def __reduce__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on implementing __getnewargs_ex__ instead?

In particular this Python doc text seems relevant:

Although powerful, implementing __reduce__() directly in your classes is error prone. For this reason, class designers should use the high-level interface (i.e., __getnewargs_ex__(), __getstate__() and __setstate__()) whenever possible.

Same question with other changes below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does implementing __getnewargs_ex__ avoid the need to call __init__ directly in __setstate__?

Copy link
Member

@jakirkham jakirkham Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this only returns the arguments that are passed to __new__ and not a function to call (nor does it call one) itself. Also because __setstate__ would be dropped then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We may be able to use __getnewargs__ instead of __getnewargs_ex__ if we only use positional arguments, which is preferred

From __getnewargs_ex__ docs:

You should implement this method if the __new__() method of your class requires keyword-only arguments. Otherwise, it is recommended for compatibility to implement __getnewargs__().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how is passing arguments to __new__ helpful here? We need to run routines contained in __init__ to properly set up the class instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you could point me to an example of this pattern it would be helpful i think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__new__ is run before __init__. The arguments would go to __init__ as well. We just wouldn't have to do that ourselves. Details in these docs.

Sure this would work

import pickle


class Value:
    def __init__(self, value):
        self.value = value

    def __getnewargs__(self):
        return (self.value,)


v = Value(5)
v2 = pickle.loads(pickle.dumps(v))
assert 5 == v.value == v2.value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's very helpful! I'll look into using this instead of __reduce__

def __setstate__(self, state):
self.__init__(*state)
def __reduce__(self):
args = (self.store,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using self.store instead of self._store as used before? Same question with other lines below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure which to use, but the public properties seemed more intuitive to me. Happy to change to using the private properties if there's a material difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with what is already here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants