Skip to content

Conversation

@jakirkham
Copy link
Member

@jakirkham jakirkham commented Dec 2, 2018

On Python 3, pickle.loads is able to take anything that conforms to the (new) buffer protocol. So there has been no issue giving it an ndarray to work with. Unfortunately, on Python 2, pickle.loads requires a bytes object specifically and is not able to take any type implementing the (new) buffer protocol. So we have been going ahead and coercing everything to bytes on Python 2.

However it turns out that cStringIO's StringIO on Python 2 does support the buffer protocol. Thus a StringIO object can be created on Python 2 without copying the data. While this still cannot be used with pickle.loads, it can be used with pickle.load, which special cases reading from StringIO leveraging the read function, which amounts to sharing a pointer between StringIO and pickle.loads. Thus achieving a no copying unpickler for Python 2.

ref: http://www.hydrogen18.com/blog/unpickling-buffers.html

ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L716
ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L681
ref: https://github.com/python/cpython/blob/2.7/Modules/cPickle.c#L614
ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L160

TODO:

  • Unit tests and/or doctests in docstrings
  • tox -e py37 passes locally
  • tox -e py27 passes locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • tox -e docs passes locally
  • AppVeyor and Travis CI passes
  • Test coverage to 100% (Coveralls passes)

On Python 3, `pickle.loads` is able to take anything that conforms to
the (new) buffer protocol. So there has been no issue giving it an
`ndarray` to work with. Unfortunately, on Python 2, `pickle.loads`
requires a `bytes` object specifically and is not able to take any type
implementing the (new) buffer protocol. So we have been going ahead and
coercing everything to `bytes` on Python 2.

However it turns out that `cStringIO`'s `StringIO` on Python 2 does
support the buffer protocol. Thus a `StringIO` object can be created on
Python 2 without copying the data. While this still cannot be used with
`pickle.loads`, it can be used with `pickle.load`, which special cases
reading from `StringIO` leveraging the read function, which amounts to
sharing a pointer between `StringIO` and `pickle.loads`. Thus achieving
a no copying unpickler for Python 2.

ref: http://www.hydrogen18.com/blog/unpickling-buffers.html

ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L716
ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L681
ref: https://github.com/python/cpython/blob/2.7/Modules/cPickle.c#L614
ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L160
@jakirkham jakirkham added this to the v0.6.2 milestone Dec 2, 2018
@jakirkham jakirkham requested a review from alimanfoo December 2, 2018 21:30
@jakirkham jakirkham changed the title Use (new) buffer protocol in Pickle.decode Use (new) buffer protocol in Pickle.decode on Python 2 Dec 2, 2018
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More good stuff! Thanks, LGTM.

@alimanfoo alimanfoo merged commit 52c93c3 into zarr-developers:master Dec 3, 2018
@jakirkham jakirkham deleted the avoid_decode_copy_pickle_py2 branch December 3, 2018 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants