Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when loading pickle from ZODB that was migrated from Python 2 to 3 #285

Closed
pbauer opened this issue Nov 18, 2019 · 4 comments

Comments

@pbauer
Copy link
Member

pbauer commented Nov 18, 2019

I have 18 seemingly broken OISet pickles in by ZODB after migrating it from 2.7 to 3.7.
In py2 all seems to be ok and I can load them from the DB. The same pickle throws a UnicodeDecodeError when I try to load it in Python 3.

I don't know what the actual object is or where in the tree it is used (it's from a medium-sized Plone site). The transaction-description points to plone.app.iterate (working-copies for content) and to content that no longer exists. So I think I could simply remove them.

I migrated the ZODB with zodbupdate: ./bin/zodbupdate --convert-py3 --file=var/filestorage/Data.fs --encoding=utf8. I've done this multiple times so far for various projects and never had this issue.

Below is information I gathered while debugging in py2 and py3.

Python 2:

>>> from ZODB.utils import p64
>>> from plone import api
>>> portal = api.portal.get()
>>> app = portal.__parent__
>>> oid = 0x2c0ab6
>>> oid = p64(oid)
>>> oid
'\x00\x00\x00\x00\x00,\n\xb6'
>>> obj = app._p_jar.get(oid)
>>> obj
OISet([
 <InterfaceClass OFS.EtagSupport.EtagBaseInterface>,
 <InterfaceClass Acquisition.interfaces.IAcquirer>,
 <SchemaClass plone.app.dexterity.behaviors.discussion.IAllowDiscussion>,
 <InterfaceClass zope.annotation.interfaces.IAnnotatable>,
 <InterfaceClass zope.annotation.interfaces.IAttributeAnnotatable>,
 <InterfaceClass plone.uuid.interfaces.IAttributeUUID>,
 <InterfaceClass Products.CMFDynamicViewFTI.interfaces.IBrowserDefault>,
 <InterfaceClass Products.CMFCore.interfaces.ICatalogAware>,
 <InterfaceClass Products.CMFCore.interfaces.ICatalogableDublinCore>,
 <InterfaceClass zope.location.interfaces.IContained>,
 <InterfaceClass Products.CMFCore.interfaces.IContentish>,
 <class 'grokcore.component.interfaces.IContext'>,
 <InterfaceClass OFS.interfaces.ICopySource>,
 <InterfaceClass webdav.interfaces.IDAVResource>,
 <InterfaceClass plone.dexterity.interfaces.IDexterityContent>,
 <InterfaceClass plone.app.relationfield.interfaces.IDexterityHasRelations>,
 <InterfaceClass plone.dexterity.interfaces.IDexterityItem>,
 <InterfaceClass plone.dexterity.interfaces.IDexteritySchema>,
 <InterfaceClass plone.app.contenttypes.interfaces.IDocument>,
 <InterfaceClass Products.CMFCore.interfaces.IDublinCore>,
 <InterfaceClass Products.CMFCore.interfaces.IDynamicType>,
 <SchemaClass plone.app.dexterity.behaviors.exclfromnav.IExcludeFromNavigation>,
 <InterfaceClass webdav.interfaces.IFTPAccess>,
 <InterfaceClass z3c.relationfield.interfaces.IHasIncomingRelations>,
 <InterfaceClass z3c.relationfield.interfaces.IHasOutgoingRelations>,
 <InterfaceClass z3c.relationfield.interfaces.IHasRelations>,
 <InterfaceClass plone.namedfile.interfaces.IImageScaleTraversable>,
 <InterfaceClass OFS.interfaces.IItem>,
 <InterfaceClass plone.app.iterate.interfaces.IIterateAware>,
 <InterfaceClass plone.app.contenttypes.behaviors.leadimage.ILeadImage>
])

>>> pickle, state = app._p_jar.db()._storage.load(oid)
>>> pickle
'cBTrees.OIBTree\nOISet\nq\x01.((cwebdav.EtagSupport\nEtagBaseInterface\nq\x02cAcquisition.interfaces\nIAcquirer\nq\x03cplone.app.dexterity.behaviors.discussion\nIAllowDiscussion\nq\x04czope.annotation.interfaces\nIAnnotatable\nq\x05czope.annotation.interfaces\nIAttributeAnnotatable\nq\x06cplone.uuid.interfaces\nIAttributeUUID\nq\x07cProducts.CMFDynamicViewFTI.interfaces\nIBrowserDefault\nq\x08cProducts.CMFCore.interfaces._content\nICatalogAware\nq\tcProducts.CMFCore.interfaces._content\nICatalogableDublinCore\nq\nczope.location.interfaces\nIContained\nq\x0bcProducts.CMFCore.interfaces._content\nIContentish\nq\x0ccgrokcore.component.interfaces\nIContext\nq\rcOFS.interfaces\nICopySource\nq\x0ecwebdav.interfaces\nIDAVResource\nq\x0fcplone.dexterity.interfaces\nIDexterityContent\nq\x10cplone.app.relationfield.interfaces\nIDexterityHasRelations\nq\x11cplone.dexterity.interfaces\nIDexterityItem\nq\x12cplone.dexterity.interfaces\nIDexteritySchema\nq\x13cplone.app.contenttypes.interfaces\nIDocument\nq\x14cProducts.CMFCore.interfaces._content\nIDublinCore\nq\x15cProducts.CMFCore.interfaces._content\nIDynamicType\nq\x16cplone.app.dexterity.behaviors.exclfromnav\nIExcludeFromNavigation\nq\x17cOFS.interfaces\nIFTPAccess\nq\x18cz3c.relationfield.interfaces\nIHasIncomingRelations\nq\x19cz3c.relationfield.interfaces\nIHasOutgoingRelations\nq\x1acz3c.relationfield.interfaces\nIHasRelations\nq\x1bcplone.namedfile.interfaces\nIImageScaleTraversable\nq\x1ccOFS.interfaces\nIItem\nq\x1dcplone.app.iterate.interfaces\nIIterateAware\nq\x1ecplone.app.contenttypes.behaviors.leadimage\nILeadImage\nq\x1ft(U\x08\x00\x00\x00\x00\x00,\n\xb7q h\x01tQtq!.'

>>> state
'\x03\xa8\x11W\x01h%\xcc'

>>> from io import BytesIO
>>> wrap = BytesIO(pickle)
>>> import pickletools
>>> memo = {}
>>> pickletools.dis(wrap, memo=memo)
    0: c    GLOBAL     'BTrees.OIBTree OISet'
   22: q    BINPUT     1
   24: .    STOP
>>> pickletools.dis(wrap, memo=memo)
   25: (    MARK
   26: (        MARK
   27: c            GLOBAL     'webdav.EtagSupport EtagBaseInterface'
   65: q            BINPUT     2
   67: c            GLOBAL     'Acquisition.interfaces IAcquirer'
  101: q            BINPUT     3
  103: c            GLOBAL     'plone.app.dexterity.behaviors.discussion IAllowDiscussion'
  162: q            BINPUT     4
  164: c            GLOBAL     'zope.annotation.interfaces IAnnotatable'
  205: q            BINPUT     5
  207: c            GLOBAL     'zope.annotation.interfaces IAttributeAnnotatable'
  257: q            BINPUT     6
  259: c            GLOBAL     'plone.uuid.interfaces IAttributeUUID'
  297: q            BINPUT     7
  299: c            GLOBAL     'Products.CMFDynamicViewFTI.interfaces IBrowserDefault'
  354: q            BINPUT     8
  356: c            GLOBAL     'Products.CMFCore.interfaces._content ICatalogAware'
  408: q            BINPUT     9
  410: c            GLOBAL     'Products.CMFCore.interfaces._content ICatalogableDublinCore'
  471: q            BINPUT     10
  473: c            GLOBAL     'zope.location.interfaces IContained'
  510: q            BINPUT     11
  512: c            GLOBAL     'Products.CMFCore.interfaces._content IContentish'
  562: q            BINPUT     12
  564: c            GLOBAL     'grokcore.component.interfaces IContext'
  604: q            BINPUT     13
  606: c            GLOBAL     'OFS.interfaces ICopySource'
  634: q            BINPUT     14
  636: c            GLOBAL     'webdav.interfaces IDAVResource'
  668: q            BINPUT     15
  670: c            GLOBAL     'plone.dexterity.interfaces IDexterityContent'
  716: q            BINPUT     16
  718: c            GLOBAL     'plone.app.relationfield.interfaces IDexterityHasRelations'
  777: q            BINPUT     17
  779: c            GLOBAL     'plone.dexterity.interfaces IDexterityItem'
  822: q            BINPUT     18
  824: c            GLOBAL     'plone.dexterity.interfaces IDexteritySchema'
  869: q            BINPUT     19
  871: c            GLOBAL     'plone.app.contenttypes.interfaces IDocument'
  916: q            BINPUT     20
  918: c            GLOBAL     'Products.CMFCore.interfaces._content IDublinCore'
  968: q            BINPUT     21
  970: c            GLOBAL     'Products.CMFCore.interfaces._content IDynamicType'
 1021: q            BINPUT     22
 1023: c            GLOBAL     'plone.app.dexterity.behaviors.exclfromnav IExcludeFromNavigation'
 1089: q            BINPUT     23
 1091: c            GLOBAL     'OFS.interfaces IFTPAccess'
 1118: q            BINPUT     24
 1120: c            GLOBAL     'z3c.relationfield.interfaces IHasIncomingRelations'
 1172: q            BINPUT     25
 1174: c            GLOBAL     'z3c.relationfield.interfaces IHasOutgoingRelations'
 1226: q            BINPUT     26
 1228: c            GLOBAL     'z3c.relationfield.interfaces IHasRelations'
 1272: q            BINPUT     27
 1274: c            GLOBAL     'plone.namedfile.interfaces IImageScaleTraversable'
 1325: q            BINPUT     28
 1327: c            GLOBAL     'OFS.interfaces IItem'
 1349: q            BINPUT     29
 1351: c            GLOBAL     'plone.app.iterate.interfaces IIterateAware'
 1395: q            BINPUT     30
 1397: c            GLOBAL     'plone.app.contenttypes.behaviors.leadimage ILeadImage'
 1452: q            BINPUT     31
 1454: t            TUPLE      (MARK at 26)
 1455: (        MARK
 1456: U            SHORT_BINSTRING '\x00\x00\x00\x00\x00,\n\xb7'
 1466: q            BINPUT     32
 1468: h            BINGET     1
 1470: t            TUPLE      (MARK at 1455)
 1471: Q        BINPERSID
 1472: t        TUPLE      (MARK at 25)
 1473: q    BINPUT     33
 1475: .    STOP
highest protocol among opcodes = 1

Here the same in Python 3:

>>> from ZODB.utils import p64
>>> from plone import api
>>> portal = api.portal.get()
>>> app = portal.__parent__
>>> oid = 0x2c0ab6
>>> oid = p64(oid)
>>> oid
b'\x00\x00\x00\x00\x00,\n\xb6'

>>> pickle, state = app._p_jar.db()._storage.load(oid)
>>> pickle
b'cBTrees.OIBTree\nOISet\nq\x01.((cwebdav.EtagSupport\nEtagBaseInterface\nq\x02cAcquisition.interfaces\nIAcquirer\nq\x03cplone.app.dexterity.behaviors.discussion\nIAllowDiscussion\nq\x04czope.annotation.interfaces\nIAnnotatable\nq\x05czope.annotation.interfaces\nIAttributeAnnotatable\nq\x06cplone.uuid.interfaces\nIAttributeUUID\nq\x07cProducts.CMFDynamicViewFTI.interfaces\nIBrowserDefault\nq\x08cProducts.CMFCore.interfaces._content\nICatalogAware\nq\tcProducts.CMFCore.interfaces._content\nICatalogableDublinCore\nq\nczope.location.interfaces\nIContained\nq\x0bcProducts.CMFCore.interfaces._content\nIContentish\nq\x0ccgrokcore.component.interfaces\nIContext\nq\rcOFS.interfaces\nICopySource\nq\x0ecwebdav.interfaces\nIDAVResource\nq\x0fcplone.dexterity.interfaces\nIDexterityContent\nq\x10cplone.app.relationfield.interfaces\nIDexterityHasRelations\nq\x11cplone.dexterity.interfaces\nIDexterityItem\nq\x12cplone.dexterity.interfaces\nIDexteritySchema\nq\x13cplone.app.contenttypes.interfaces\nIDocument\nq\x14cProducts.CMFCore.interfaces._content\nIDublinCore\nq\x15cProducts.CMFCore.interfaces._content\nIDynamicType\nq\x16cplone.app.dexterity.behaviors.exclfromnav\nIExcludeFromNavigation\nq\x17cOFS.interfaces\nIFTPAccess\nq\x18cz3c.relationfield.interfaces\nIHasIncomingRelations\nq\x19cz3c.relationfield.interfaces\nIHasOutgoingRelations\nq\x1acz3c.relationfield.interfaces\nIHasRelations\nq\x1bcplone.namedfile.interfaces\nIImageScaleTraversable\nq\x1ccOFS.interfaces\nIItem\nq\x1dcplone.app.iterate.interfaces\nIIterateAware\nq\x1ecplone.app.contenttypes.behaviors.leadimage\nILeadImage\nq\x1ft(U\x08\x00\x00\x00\x00\x00,\n\xb7q h\x01tQtq!.'
>>> state
b'\x03\xa8\x11W\x01h%\xcc'

>>> obj = app._p_jar.get(oid)
>>> obj
ERROR:ZODB.Connection:Couldn't load state for BTrees.OIBTree.OISet 0x2c0ab6
Traceback (most recent call last):
  File "/Users/pbauer/workspace/dipf-intranet/src-mrd/ZODB/src/ZODB/Connection.py", line 795, in setstate
    self._reader.setGhostState(obj, p)
  File "/Users/pbauer/workspace/dipf-intranet/src-mrd/ZODB/src/ZODB/serialize.py", line 633, in setGhostState
    state = self.getState(pickle)
  File "/Users/pbauer/workspace/dipf-intranet/src-mrd/ZODB/src/ZODB/serialize.py", line 626, in getState
    return unpickler.load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position 7: ordinal not in range(128)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/pbauer/workspace/dipf-intranet/src-mrd/ZODB/src/ZODB/Connection.py", line 795, in setstate
    self._reader.setGhostState(obj, p)
  File "/Users/pbauer/workspace/dipf-intranet/src-mrd/ZODB/src/ZODB/serialize.py", line 633, in setGhostState
    state = self.getState(pickle)
  File "/Users/pbauer/workspace/dipf-intranet/src-mrd/ZODB/src/ZODB/serialize.py", line 626, in getState
    return unpickler.load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position 7: ordinal not in range(128)

>>> from io import BytesIO
>>> wrap = BytesIO(pickle)
>>> import pickletools
>>> memo = {}
>>> pickletools.dis(wrap, memo=memo)
    0: c    GLOBAL     'BTrees.OIBTree OISet'
   22: q    BINPUT     1
   24: .    STOP
highest protocol among opcodes = 1
>>> pickletools.dis(wrap, memo=memo)
   25: (    MARK
   26: (        MARK
   27: c            GLOBAL     'webdav.EtagSupport EtagBaseInterface'
   65: q            BINPUT     2
   67: c            GLOBAL     'Acquisition.interfaces IAcquirer'
  101: q            BINPUT     3
  103: c            GLOBAL     'plone.app.dexterity.behaviors.discussion IAllowDiscussion'
  162: q            BINPUT     4
  164: c            GLOBAL     'zope.annotation.interfaces IAnnotatable'
  205: q            BINPUT     5
  207: c            GLOBAL     'zope.annotation.interfaces IAttributeAnnotatable'
  257: q            BINPUT     6
  259: c            GLOBAL     'plone.uuid.interfaces IAttributeUUID'
  297: q            BINPUT     7
  299: c            GLOBAL     'Products.CMFDynamicViewFTI.interfaces IBrowserDefault'
  354: q            BINPUT     8
  356: c            GLOBAL     'Products.CMFCore.interfaces._content ICatalogAware'
  408: q            BINPUT     9
  410: c            GLOBAL     'Products.CMFCore.interfaces._content ICatalogableDublinCore'
  471: q            BINPUT     10
  473: c            GLOBAL     'zope.location.interfaces IContained'
  510: q            BINPUT     11
  512: c            GLOBAL     'Products.CMFCore.interfaces._content IContentish'
  562: q            BINPUT     12
  564: c            GLOBAL     'grokcore.component.interfaces IContext'
  604: q            BINPUT     13
  606: c            GLOBAL     'OFS.interfaces ICopySource'
  634: q            BINPUT     14
  636: c            GLOBAL     'webdav.interfaces IDAVResource'
  668: q            BINPUT     15
  670: c            GLOBAL     'plone.dexterity.interfaces IDexterityContent'
  716: q            BINPUT     16
  718: c            GLOBAL     'plone.app.relationfield.interfaces IDexterityHasRelations'
  777: q            BINPUT     17
  779: c            GLOBAL     'plone.dexterity.interfaces IDexterityItem'
  822: q            BINPUT     18
  824: c            GLOBAL     'plone.dexterity.interfaces IDexteritySchema'
  869: q            BINPUT     19
  871: c            GLOBAL     'plone.app.contenttypes.interfaces IDocument'
  916: q            BINPUT     20
  918: c            GLOBAL     'Products.CMFCore.interfaces._content IDublinCore'
  968: q            BINPUT     21
  970: c            GLOBAL     'Products.CMFCore.interfaces._content IDynamicType'
 1021: q            BINPUT     22
 1023: c            GLOBAL     'plone.app.dexterity.behaviors.exclfromnav IExcludeFromNavigation'
 1089: q            BINPUT     23
 1091: c            GLOBAL     'OFS.interfaces IFTPAccess'
 1118: q            BINPUT     24
 1120: c            GLOBAL     'z3c.relationfield.interfaces IHasIncomingRelations'
 1172: q            BINPUT     25
 1174: c            GLOBAL     'z3c.relationfield.interfaces IHasOutgoingRelations'
 1226: q            BINPUT     26
 1228: c            GLOBAL     'z3c.relationfield.interfaces IHasRelations'
 1272: q            BINPUT     27
 1274: c            GLOBAL     'plone.namedfile.interfaces IImageScaleTraversable'
 1325: q            BINPUT     28
 1327: c            GLOBAL     'OFS.interfaces IItem'
 1349: q            BINPUT     29
 1351: c            GLOBAL     'plone.app.iterate.interfaces IIterateAware'
 1395: q            BINPUT     30
 1397: c            GLOBAL     'plone.app.contenttypes.behaviors.leadimage ILeadImage'
 1452: q            BINPUT     31
 1454: t            TUPLE      (MARK at 26)
 1455: (        MARK
 1456: U            SHORT_BINSTRING '\x00\x00\x00\x00\x00,\n·'
 1466: q            BINPUT     32
 1468: h            BINGET     1
 1470: t            TUPLE      (MARK at 1455)
 1471: Q        BINPERSID
 1472: t        TUPLE      (MARK at 25)
 1473: q    BINPUT     33
 1475: .    STOP

The only noteable difference I see is the SHORT_BINSTRING where \xb7 is turned into a middle dot · but I've no idea if that means something.

I worked around the whole problem by emptying the problematic OISets manually in Python 2 and then redoing the migration to Python 3. Packing and fsrefs then worked again:

def empty_broken_sets():
    portal = api.portal.get()
    app = portal.__parent__
    broken_oids = [0x2c0ab6, 0x2c0ab8, 0x2c56c4, 0x2c56c6, 0x2c57ca, 0x2c57cc, 0x2c5896, 0x2c5898, 0x2e929e, 0x2e92a0, 0x330780, 0x330782, 0x33835f, 0x338361, 0x356df9, 0x356dfb, 0x35907c, 0x35907e]
    from ZODB.utils import p64
    for oid in broken_oids:
        oid = p64(oid)
        obj = app._p_jar.get(oid)
        obj.clear()
        obj._p_changed = 1

It would be better to fix the underlying issue but so far I don't know what it is.

@jamadden
Copy link
Member

The only noteable difference I see is the SHORT_BINSTRING where \xb7 is turned into a middle dot · but I've no idea if that means something.

That's just a display difference; the byte value is the same. In the opcodeSHORT_BINSTRING, the SHORT part means "less than 255 characters" (so its length only needs one byte to encode), "BIN" means "binary pickle protocol", aka protocol 1 or higher, and "STRING" means "native string" (kinda). Python 3's strings are Unicode, so that's how pickletools displays the string (don't ask me how it guesses an encoding). Python 2's strings are 8-bit bytes, so that's how it displays the string.

Speaking of "kinda": BINSTRING isn't output by Python 3 (ever), it's only output by Python 2. Python 2 just happily unpickles the arbitrary data back into the 8-bit str, but Python 3 tries to decode it to its native Unicode string using the encoding given to the Unpickler; if the encoding is "bytes" then it skips the decoding and returns an instance of bytes. The default encoding is ASCII, which of course can't represent \xb7.

Python 2 had no way to produce a pickle of an str in a way that meant "always unpickle this as an 8-bit byte string". That's where zodbpickle comes in: objects of type zodbpickle.binary are pickled in such a way that they unpickle as bytes on Python 3.

This specific pickle is of an OIDSet, or rather, it's the pickle of what an OIDSet returned from __getstate__. For the leaf nodes of the tree, that state is (data, next_node). data is itself a tuple, and next_node is a persistent object. Persistent objects are written out as their persistent ID (the BINPERSID opcode), and for ZODB, that (usually) means the _p_oid: an 8-byte 8-bit string.

It was only starting with ZODB 5.4.0 that zodbpickle.binary was used to write _p_oid values. Prior to that, they were written as the plain str on Python 2, leading to the BINSTRING opcode and the problem we see here. So we know this pickle was written with ZODB 5.3 or earlier on Python 2.

ZODB 5.3 switched to using pickle protocol 2, which starts with a distinct header byte. This pickle doesn't have that, so we know that it was written with ZODB 5.2 or earlier. That means this object hasn't been modified since early 2017. Which doesn't actually tell us much 😄

Other than that, I guess the question is: should zodbupdate have been expected to locate this particular _p_oid and re-write it to use the same opcodes that zodbpickle.binary would have produced (SHORT_BINBYTES)? To me, that seems like a reasonable thing for it to do, but I don't know anything about how zodbupdate works.

@gotcha
Copy link
Member

gotcha commented Nov 20, 2019

Thanks @pbauer @jamadden for detailed description and answer. Will help in the future !

@ewohnlich
Copy link

I ran into this same problem and was also unable to run fsrefs to get all problem oids: UnboundLocalError: local variable 'oid' referenced before assignment here https://github.com/zopefoundation/ZODB/blob/master/src/ZODB/serialize.py#L707. That may or not be related to the underlying issue, but it certainly looks like a logic with oid assignment there. Perhaps bytes, str should assign oid, klass instead of data, klass?

@sallner
Copy link
Member

sallner commented Mar 30, 2021

Fixed by #295 but needs a release now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants