-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickling issues between C and Python implementations. #19
Comments
This sounds like a serious bug that ought to be fixed. |
It may be as simple as reversing the names of the Python implementations. Instead of: class OOBTreePy(BTree):
...
try:
from ._OOBTree import OOBucket
except ImportError as e: #pragma NO COVER w/ C extensions
OOBTree = OOBTreePy
...
else: #pragma NO COVER w/o C extensions
from ._OOBTree import OOBTree
... Do this: class OOBTree(BTree):
...
OOBTreePy = OOBTree # NEW; expose the Python implementation always
try:
from ._OOBTree import OOBucket
except ImportError as e: #pragma NO COVER w/ C extensions
pass # Nothing to do here anymore
else: #pragma NO COVER w/o C extensions
from ._OOBTree import OOBTree
... I haven't really thought this through... |
That does appear to work. The second issue (pickling empty BTrees) appears in the plain pure-python implementation, without any cross-implementation factors in play. I'm working on a branch to fix these. |
Argh. No, no that doesn't work. I get test failures as soon as ZODB is available with the C implementation. |
Maybe rather something like this from the persistent module? https://github.com/zopefoundation/persistent/blob/master/persistent/__init__.py |
@ml31415 I'm not sure what you're suggesting. Persistent doesn't need to do anything special to make sure that both the C and Python implementations pickle and unpickle the same way (cross implementation) because people usually only pickle subclasses of Persistent. So the module and/or class name differences don't matter much. |
Well, but it applies to PersistentList and PersistentMapping. If I remember correctly, they both got the same class location, independent of implementation. |
PersistentList and PersistentMapping only have one implementation, in Python. |
doh. Ok, I'm out and leave this to the experts :) |
By adding a This has the minor problem that any "legacy" pickles that still have the Python class name in them will unpickle as the Python class (so Michael's DB would still be corrupted). I don't see any way to get around that while maintaining pickle compatibility, short of doing some nasty things with stack introspection (which is slow on PyPy). But that seems livable, especially because it could be worked around during unpickling by setting However, there's a substantial problem that turns out to be a deal breaker: the approach of returning a different class from So far, I'm at a loss for any way to make the C and Python pickles always result in the C version if it's available (and preferably being equal) that doesn't change the pickle format, meaning older versions couldn't unpickle new pickles. Thoughts? |
FWIW, the changes so far are at master...NextThought:issue_2_pickle |
Fixed my db by doing lots of undos this afternoon, so as long as noone else complains ... I wouldn't bother about me or reverting any messed up objects. Besides that, would it be so bad to use a custom constructor? |
A custom constructor means at least these things:
|
Here's a subversive thought: When performing the class check for the protocol >= 2, all implementations of This means we can lie about the Downsides: calling the property getter function may add overhead to accessing |
What if there's no C implementation? E.g. the user didn't have the right libraries/headers when they installed BTrees? |
My proof-of-concept handles this by making sure that the Python implementation has the same |
With the There's no need for a metaclass to make >>> class X(object):
... pass
...
>>> class Y(object):
... @property
... def __class__(self):
... return X
...
>>> y = Y()
>>> isinstance(y, type(y)) # Y
True
>>> isinstance(y, y.__class__) # X
True In |
I've opened PR #20 for the |
Looks good, simpler than expected! |
This grows out of the discussion in zopefoundation/persistent#32. The user had been using the C implementation of BTrees and had many saved pickles. Upon updating dependencies, the user discovered he was using the Python implementation. This was slow, so the user switched back to the C implementation.
This leads to two related issues. The first, most serious, issue is that a BTree pickled by the Python implementation will only ever unpickle as the Python implementation, though a BTree pickled by the C implementation will unpickle as whichever implementation is available (but see below). Because of the tree-like structure of BTree objects, this can lead to AttributeErrors when the parent object is (re)pickled as the Python implementation and still has children that are pickled as the C implementation.
The second issue I discovered when testing the first issue. An empty C BTree unpickled with the Python implementation isn't initialized correctly, leading to its own AttributeErrors.
In environments like my own organizations, that are slowly rolling out PyPy, it will be common to have developers and production/test environments sharing pickles where everyone is on some combination of PyPy and CPython. Maybe even different production (micro)services will be on PyPy and CPython at the same time. So the first AttributeError could be a real problem.
It seems to me like the two implementations should produce pickles that always result in loading the "best" available implementation, i.e., the Python implementation should not specify the
Py
suffix in pickle names. I think that's the only way to avoid the first issue. Though, there is an argument to be made that someone might explicitly want to pickle the Python implementation, but I'm not sure why. I'm not sure how to fix this without changing the pickling format in a backwards-incompatible way, but I'm not a pickle expert. I can try to look into that a bit.I suspect the second issue is easier to fix.
If there's consensus that something needs to be done on this I can do the work to submit a PR.
The text was updated successfully, but these errors were encountered: