Use a higher pickle protocol for serializing objects on Python 2 #179

jamadden · 2017-08-26T12:30:14Z

Previously protocol 1 was used. The higher protocol is more efficient for new-style classes (all persistent objects are new-style), according to the docs, at the cost of being very slightly less space efficient for old-style classes.

In tests of a persistent object with two trivial numeric attributes, the higher protocol was 12 bytes smaller, and serialized and deserialized 1us faster. Introducing a reference to another new-style
class (with a small dict and list of strings for attributes) for a more realistic test made the higher protocol twice as fast to serialize (20.5 vs 10.3us), almost half the size (215 vs 142 bytes), and it deserialized 30% faster (6.5 vs 4.6us).

On Python 2, this will now allow open file objects to be pickled (loading the object will result in a closed file); previously this would result in a TypeError (as does under Python 3). We had tests that you couldn't do that with a BlobFile so I had to update it to still make that true.

I wouldn't recommend serializing arbitrary open files under Python 2 (for one thing, they can't trivially be deserialized in Python 3), but I didn't take any steps to prevent it either. Since this hasn't
been possible, there shouldn't be code in the wild that is trying to do it---and it wouldn't be forward compatible with Python 3 either.

Previously protocol 1 was used. This is more efficient for new-style classes (all persistent objects are new-style), according to the docs, at the cost of being very slightly less space efficient for old-style classes. In tests of a persistent object with two trivial numeric attributes, the higher protocol was 12 bytes smaller, and serialized and deserialized 1us faster. Introducing a reference to another new-style class for a more realistic test made the higher protocol twice as fast to serialize (20.5 vs 10.3us), almost half the size (215 vs 142 bytes), and it deserialized 30% faster (6.5 vs 4.6us). On Python 2, this will now allow open ``file`` objects to be pickled (loading the object will result in a closed file); previously this would result in a ``TypeError`` (as does under Python 3). We had tests that you couldn't do that with a BlobFile so I had to update it to still make that true. I wouldn't recommend serializing arbitrary open files under Python 2 (for one thing, they can't trivially be deserialized in Python 3), but I didn't take any steps to prevent it either. Since this hasn't been possible, there shouldn't be code in the wild that is trying to do it---and it wouldn't be forward compatible with Python 3 either.

jamadden · 2017-08-26T12:37:37Z

Another advantage would be that __getnewargs__ would be called on both Python 2 and 3. This would (probably) solve #18.

jimfulton · 2017-08-26T13:59:16Z

Python 2 pickle/cPickle don't support protocol 3, so this change more closely ties us to zodbpickle, which is a maintenance burden that I'd like to get rid of at some point.

jamadden · 2017-08-26T14:11:39Z

Protocol 2 would have the same benefit, so that could be changed if you'd like.

I just thought it was nice to be able to use the same protocol under both versions, especially since eventually Python 2 will go away. (FWIW, it seems like under Python 2, other than the version number, protocol 2 and 3 are the same...that was why zodbpickle's binary object was needed in the first place, wasn't it, to be able to make the distinction under Python 2 that Python 3's protocol 3 makes natively?)

Regarding getting rid of zodbpickle entirely, I would still eventually like to get around to porting protocol 4 back to it, which would make large objects much more efficient. And it would be useful for Python 2 and 3.3...which I now remember we just dropped, so that's a bit less important.

jamadden · 2017-08-26T14:36:22Z

D'oh! I forgot the main reason we're committed to zodbpickle: Python's noload is broken, which breaks parts of ZODB (e.g., zodbdgc).

I suppose we could try to get my patch that fixed noload for zodbpickle upstreamed (I don't think it reintroduces the regression that caused its removal), but the earliest that would happen would be 3.7.

jimfulton · 2017-08-26T14:55:30Z

On Sat, Aug 26, 2017 at 10:36 AM, Jason Madden ***@***.***> wrote: D'oh! I forgot the main reason we're committed to zodbpickle: Python's noload is broken, which breaks parts of ZODB (e.g., zodbdgc).

Yes. That was my fault. I regret it. :(

I suppose we could try to get my patch that fixed noload for zodbpickle upstreamed (I don't *think* it reintroduces the regression that caused its removal), but the earliest that would happen would be 3.7.

Noload was a hack. The purpose of no-load is to get persistent references. I don't see noload being accepted upstream. I *do* see a possibility for an optimized API to get references. That would probably be a useful thing to to. In might be interesting to have a stab at an API to extract references based on pickletools.genops. It would likely allow an implementation that avoids a number of the allocations/instantiations performed by a regular pickle load. I should note, however, that noload is an optimization. If using something like relstorage or zc.zodbdgc, that optimization is less important because the computation is performed outside of the database server and doesn't block anything else, so I consider this optimization less important than I once did.

jamadden · 2017-08-26T15:14:12Z

I should note, however, that noload is an optimization. If using something
like relstorage or zc.zodbdgc, that optimization is less important because
the computation is performed outside of the database server and doesn't
block anything else, so I consider this optimization less important than I
once did.

True, it's an optimization. It's a substantial optimization, the last time I measured. IIRC it was something like an order of a magnitude (but that was 2014 so I may be misremembering). It also is extremely helpful if not everything can be unpickled, either because the class has gone missing or because it relies on infrastructure that's not set up without loading a full application (e.g., zc.baseregistry which unpickles by doing component.getUtility(IRegistry, name='somename').

Another way to accomplish that optimization would be cool!

So, given all that about our ties to zodbpickle, but also the fact that on Python 2, the protocols are pretty much the same, I am happy to either leave it or change it, it's up to you.

jimfulton · 2017-08-26T17:52:04Z

I would prefer to use protocol 2 on Python 2. This means that the records would remain readable with standard pickle on Python 2, which I think has value.

This way the records typically stay readable with stdlib pickle.

jamadden · 2017-08-26T17:55:37Z

Makes sense. Done.

jamadden · 2017-08-29T14:45:00Z

Is there anything else I can do to get this ready to be approved?

jmuchemb · 2017-08-29T14:47:20Z

Update the changelog ? (it still describes a change for protocol 3)

jamadden · 2017-08-29T14:49:07Z

Update the changelog ? (it still describes a change for protocol 3)

Thanks! I have that locally, I just haven't pushed it yet.

[skip ci]

jamadden · 2017-08-29T14:55:59Z

Thank you!

jimfulton · 2017-08-29T14:56:25Z

Thank you! :)

jamadden · 2017-08-29T21:48:26Z

As it happens, we're just now hitting a case where protocol 2 might be useful (for __getnewargs__). Could there be a PyPI release when convenient please?

jimfulton · 2017-08-30T12:55:46Z

Released.

jamadden · 2017-08-30T13:09:19Z

On Aug 30, 2017, at 07:55, Jim Fulton ***@***.***> wrote: Released

Thank you!

jimfulton · 2018-03-25T19:45:20Z

I'm an idiot. Using protocol 2 defeats zodbpickle's binary mechanism. :( Whimper.

icemac · 2018-09-12T06:52:31Z

See #193 as followup of this ticket.

jamadden mentioned this pull request Aug 26, 2017

ZODB.serialize.ObjectWriter should fall back to __reduce__() right? #18

Closed

Reference PR in change note. [skip ci]

62dc82f

Use protocol 2 on Python 2.

5914bbe

This way the records typically stay readable with stdlib pickle.

Fix change note protocol number.

111cb09

[skip ci]

jimfulton merged commit be5a9d5 into master Aug 29, 2017

jimfulton deleted the jam-pickle-prot branch August 29, 2017 14:55

jamadden mentioned this pull request Aug 29, 2017

[WIP] Persistent tool config pickling OpenNTI/nti.ims#19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a higher pickle protocol for serializing objects on Python 2 #179

Use a higher pickle protocol for serializing objects on Python 2 #179

jamadden commented Aug 26, 2017

jamadden commented Aug 26, 2017

jimfulton commented Aug 26, 2017

jamadden commented Aug 26, 2017

jamadden commented Aug 26, 2017

jimfulton commented Aug 26, 2017 via email

jamadden commented Aug 26, 2017

jimfulton commented Aug 26, 2017

jamadden commented Aug 26, 2017

jamadden commented Aug 29, 2017

jmuchemb commented Aug 29, 2017

jamadden commented Aug 29, 2017

jamadden commented Aug 29, 2017

jimfulton commented Aug 29, 2017

jamadden commented Aug 29, 2017

jimfulton commented Aug 30, 2017

jamadden commented Aug 30, 2017 via email

jimfulton commented Mar 25, 2018

icemac commented Sep 12, 2018

Use a higher pickle protocol for serializing objects on Python 2 #179

Use a higher pickle protocol for serializing objects on Python 2 #179

Conversation

jamadden commented Aug 26, 2017

jamadden commented Aug 26, 2017

jimfulton commented Aug 26, 2017

jamadden commented Aug 26, 2017

jamadden commented Aug 26, 2017

jimfulton commented Aug 26, 2017 via email

jamadden commented Aug 26, 2017

jimfulton commented Aug 26, 2017

jamadden commented Aug 26, 2017

jamadden commented Aug 29, 2017

jmuchemb commented Aug 29, 2017

jamadden commented Aug 29, 2017

jamadden commented Aug 29, 2017

jimfulton commented Aug 29, 2017

jamadden commented Aug 29, 2017

jimfulton commented Aug 30, 2017

jamadden commented Aug 30, 2017 via email

jimfulton commented Mar 25, 2018

icemac commented Sep 12, 2018