Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ZODB pickle corruption on python-3.7 #47

Merged
merged 1 commit into from
Jun 11, 2019

Conversation

kedder
Copy link
Member

@kedder kedder commented May 31, 2019

This is an attempt to fix a data corruption issue, that manifests in a
random byte replacing a correct one in a pickle, which makes a pickle unreadable.

I couldn't figure out the exact reason for it, but max_output_len in the fixed expression might become 0 under extreme conditions (when output_len is 0 and n is 1). The patch fixed the issue and was tested by our team for about a month.

Copy link
Member

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be possible to write a test case for this?

_pickle_33.c was taken from the Python 3.3 standard library. Was the bug fixed in Python at some latter point? Should we report an upstream bug? Or find and backport an upstream patch? Or at least link to the relevant upstream bug/commit?

@kedder
Copy link
Member Author

kedder commented May 31, 2019

I'll try to repro it in a test, but it is pretty difficult - the bug was very hard to trigger, I suspect it happens on a very specific set of data.

I noticed that changing MAX_WRITE_BUF_SIZE constant also affects the issue. From what I can tell, C code tries to keep buffer size below MAX_WRITE_BUF_SIZE and if buffer exceeds it, it clears it and starts growing it again. This constant and its handling was removed in future versions (see https://github.com/python/cpython/blob/master/Modules/_pickle.c), so I suspect this bug is not applicable for future versions of python.

We tried to port _pickle_33.c from the further upstream versions, but it received too many changes to make it feasible.

Unfortunately I lost the original traceback that we were seeing. Python version of the pickle code (that will be enabled if you delete the _pickle.cpython-37m-x86_64-linux-gnu.so is not affected by this bug, so that's a temporary workaround.

Another changes that seems to fix the issue (at least for one particular data set I had at hand):

  • Changing WRITE_BUF_SIZE to be equal to MAX_WRITE_BUF_SIZE, so buffer doesn't grow
  • Changing suspicious (self->output_len + n) / 2 * 3 expression to (self->output_len + n) * 2 to avoid corner cases of buffer size not growing when self->output_len + n is equal to 1.
  • The one suggested in this MR, which I find to be the simplest.

@kedder
Copy link
Member Author

kedder commented May 31, 2019

Another detail: this bug produces error on reading pickles. I.e. this change doesn't fix the corrupted pickles that are already stored in ZODB.

I was able to fix one corrupted pickle by carefully inspecting contents with pickletools and fixing one byte in hex editor.

@dwt
Copy link

dwt commented Jun 4, 2019

This is great!

At ZODB/issues/271 I reported a problem that after a zodbupdate run during the --pack phase we had reproducible explosions under python 3.7 that didn't happen on python 3.6.

This patch seems to fix that, i.e. I can no longer reproduce that explosion. Also the error I have seen looks quite similar, in that a single byte was written wrongly somewhere in the middle of a pickle (usually an append instruction was changed to a setitem instruction), which I also was able to fix by manually editing the pickle in question. (In my case though, pickles later in the ZODB had more problems that I wasn't able to work around).

Sadly the only way I can reliably reproduce it is zodbupdate-ing a 4.5 GB ZODB from python 2 to 3, and I have no clue how to even beginn to make a reproducible test case from that. :-( Sadly the DB is full of data I cannot share, so I cannot give it to somebody with more experience minimising the problem.

Still I hope that this helps supporting this fix in that it provides at least some evidence that it does something sensible.

@dwt
Copy link

dwt commented Jun 4, 2019

If it helps, I can share a pickle object with the corruption privately to people I trust.

@icemac
Copy link
Member

icemac commented Jun 6, 2019

If is is too hard to write a test at least a change log entry would be welcome.

@kedder
Copy link
Member Author

kedder commented Jun 6, 2019

Funny thing, using some brute force I was able to repro the issue and confirm this PR actually fixes it:

>>> import io
>>> from zodbpickle import pickle
>>> inp = ['a'] * 34991
>>> f = io.BytesIO()
>>> pickle.dump(inp, f)
>>> len(f.getvalue())
70064
>>> f.seek(0)
0
>>> outp = pickle.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: invalid load key, 'H'.

The 34991 is a magic number for (default) protocol 3. Other protocols are affected too, just with different list sizes. The error is not consistent - sometimes it is EOFError, sometimes exception is not raised, but returned unpickled value doesn't match original one.

Now, the script to find this case runs for several minutes, trying all lists of sizes up to 100K. This doesn't feel like a good fit for the otherwise quite fast test suite. Hardcoding this particular instance in a test doesn't seem to be very useful either.

What do you think, what would be the sanest way to release this?

@mgedmin
Copy link
Member

mgedmin commented Jun 6, 2019

Hardcoding this particular instance in a test doesn't seem to be very useful either.

I think it's fine -- and much better than not having any test case at all!

This is an attempt to fix a data corruption issue, that manifests in a
random byte replacing a correct one in a pickle.
@kedder
Copy link
Member Author

kedder commented Jun 10, 2019

I've added the test and a changelog entry.

@kedder kedder changed the title Make sure buffer size is more than 0 Fix ZODB pickle corruption on python-3.7 Jun 11, 2019
Copy link
Member

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kedder kedder merged commit f27eafa into zopefoundation:master Jun 11, 2019
@icemac
Copy link
Member

icemac commented Jun 12, 2019

Released as 1.0.4 to PyPI, see https://pypi.org/project/zodbpickle/1.0.4/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants