Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TransactionMetaData in charge of (de)serializing extension data #207

Merged
merged 1 commit into from Sep 23, 2019

Conversation

jmuchemb
Copy link
Member

@jmuchemb jmuchemb commented May 31, 2018

IStorage implementations used to do this task themselves which leads to code
duplication and sometimes bugs (one was fixed recently in NEO). Like for object
serialization, this should be done by the upper layer (Connection).

This commit also provides a way to get raw extensions data while iterating
over transactions (this is actually the original purpose[2]). So far, extension
data could only be retrieved unpickled, which caused several problems:

  • tools like zodb dump [1] cannot dump data exactly as stored on a
    storage. This makes database potentially not bit-to-bit identical to
    its original after restoring from such dump.

  • zodb dump output could be changing from run to run on the same
    database. This comes from the fact that e.g. python dictionaries are
    unordered and so when pickling a dict back to bytes the result could
    be not the same as original.

    ( this problem can be worked-around partly to work reliably for e.g.
    dict with str keys - by always emitting items in key sorted order,
    but it is hard to make it work reliably for arbitrary types )

Both issues make it hard to verify integrity of database at the lowest
possible level after restoration, and make it hard to verify bit-to-bit
compatibility with non-python ZODB implementations.

For this, TransactionMetaData gets a new 'extension_bytes' attribute and
and common usage becomes:

  • Application committing a transaction:

    • 'extension' is set with a dictionary
    • the storage gets the bytes via 'extension_bytes'
  • Iteration:

    • the storage passes bytes as 'extension' parameter of TransactionMetaData
    • the application can get extension data either as bytes ('extension_bytes')
      or deserialized ('extension'): in the former case, no deserialization
      happens and the returned value is exactly what was passed by the storage

[1] https://lab.nexedi.com/nexedi/zodbtools
[2] #183

Co-Authored-By: Kirill Smelkov kirr@nexedi.com

@jmuchemb
Copy link
Member Author

I recently got more interested by #183 while doing 2 things in NEO: check that a FS DB was correctly imported (using zodbtools), and fight with bugs and code duplication about serialization of extension data.

But like Jim, I prefer to serialize between extension and the new extension_bytes when necessary. I do it lazily with properties.

Since extension_bytes is not optional anymore, many comments in #183 become irrelevant and it looks less controversial to define a new attribute IStorageTransactionMetaData, so I still do it.

@jmuchemb jmuchemb requested a review from jimfulton July 6, 2018 18:13
@jamadden
Copy link
Member

I think I like this (though I'm not fond of the magic that the decorator uses to get the attribute names from the argspec)!

@jmuchemb
Copy link
Member Author

jmuchemb commented Sep 3, 2018

though I'm not fond of the magic that the decorator uses to get the attribute names from the argspec

I found important to minimize code duplication here, so that both properties remain always symmetric. I wish I could find something better. Despite code that is a little harder to read, there's still little some duplication at the places _extension_property is used.

I don't want to wait too long before merging this one, as I have other changes I'd like to submit.

I plan to merge mid-September, unless anyone wants more time to review.

src/ZODB/Connection.py Outdated Show resolved Hide resolved
@icemac
Copy link
Member

icemac commented Dec 14, 2018

What is the status of this PR? What needs to be done to get it merged?

NexediGitlab pushed a commit to Nexedi/zodbtools that referenced this pull request Jan 11, 2019
Currently we exercise zodbdump and zodbcommit+zodbdump with non-empty
extensions, which works if ZODB is patched for txn.extension_bytes
support, but fails on pristine ZODB.

Support for txn.extension_bytes cannot get into upstream ZODB for
more than a year:

zopefoundation/ZODB#183
zopefoundation/ZODB#207

and  even if it somehow will make it, it will likely be only in ZODB5,
while we still care to support ZODB4 and ZODB3.

Skipping zodbdump / zodbcommit tests, if a ZODB does not have
txn.extension_bytes support, would result in significant reduction of
zodbtools test coverage, because practically that is the current
situation with all upstream ZODB{3,4,5}. Dropping test coverage for
non-empty extensions is neither a good option.

For those reason, let's rework the tests and test both zodbdump and
zodbcommit with two scenarios:

1. on a test database where transactions extensions are always empty.
   This should work on all ZODB irregardless of whether
   txn.extension_bytes patch is there or not.

2. on a test database where transactions extensions are present.
   This should work if ZODB has txn.extension_bytes support, but if not,
   we can mark this case as xfail, since the failure is expected.

This way we make the testsuite pass irregardless of whether
txn.extension_bytes support is there, and we don't abandon dump/commit
testing coverage.

/helped-by Jérome Perrin <jerome@nexedi.com>
@jmuchemb jmuchemb force-pushed the extension_bytes branch 2 times, most recently from a89b480 to c8ce8a4 Compare August 29, 2019 10:10
@jmuchemb
Copy link
Member Author

First, I rebased the original commit. But even with the small diff I suggested above, it would remain possible to mutate extensions after having kept a reference to them.

Magic does not look a good idea if the result is not perfect, especially when there's no need.

So I changed for something simpler, following tseaver comment partially. See IStorageTransactionMetaData.__doc__ for more details.

@jmuchemb
Copy link
Member Author

I'll merge next monday.

IStorage implementations used to do this task themselves which leads to code
duplication and sometimes bugs (one was fixed recently in NEO). Like for object
serialization, this should be done by the upper layer (Connection).

This commit also provides a way to get raw extensions data while iterating
over transactions (this is actually the original purpose[2]). So far, extension
data could only be retrieved unpickled, which caused several problems:

- tools like `zodb dump` [1] cannot dump data exactly as stored on a
  storage. This makes database potentially not bit-to-bit identical to
  its original after restoring from such dump.

- `zodb dump` output could be changing from run to run on the same
  database. This comes from the fact that e.g. python dictionaries are
  unordered and so when pickling a dict back to bytes the result could
  be not the same as original.

  ( this problem can be worked-around partly to work reliably for e.g.
    dict with str keys - by always emitting items in key sorted order,
    but it is hard to make it work reliably for arbitrary types )

Both issues make it hard to verify integrity of database at the lowest
possible level after restoration, and make it hard to verify bit-to-bit
compatibility with non-python ZODB implementations.

For this, TransactionMetaData gets a new 'extension_bytes' attribute and
and common usage becomes:

* Application committing a transaction:

  - 'extension' is set with a dictionary
  - the storage gets the bytes via 'extension_bytes'

* Iteration:

  - the storage passes bytes as 'extension' parameter of TransactionMetaData
  - the application can get extension data either as bytes ('extension_bytes')
    or deserialized ('extension'): in the former case, no deserialization
    happens and the returned value is exactly what was passed by the storage

[1] https://lab.nexedi.com/nexedi/zodbtools
[2] #183

Co-Authored-By: Kirill Smelkov <kirr@nexedi.com>
@jmuchemb jmuchemb merged commit 2f8cc67 into master Sep 23, 2019
@jmuchemb jmuchemb deleted the extension_bytes branch September 23, 2019 14:57
jmuchemb added a commit to zopefoundation/ZEO that referenced this pull request Jun 11, 2020
jmuchemb added a commit to zopefoundation/ZEO that referenced this pull request Jun 11, 2020
navytux added a commit to navytux/zc.zlibstorage that referenced this pull request Nov 15, 2021
If test class does not indicate that underlying storage supports
extension_bytes (see zopefoundation/ZODB#207),
then ZODB.tests.IteratorStorage will verify that extension_bytes is
computed from extension instead of taken from on-storage raw value:

https://github.com/zopefoundation/ZODB/blob/5.6.0-48-gbc13ca74b/src/ZODB/tests/IteratorStorage.py#L93-L115

However when underlying storage supports extension_bytes, so is
zlibstorage that wraps it.

Many classes in zc.zlibstorage tests inherit from classes test classes
for FileStorage

https://github.com/zopefoundation/zc.zlibstorage/blob/1.2.0-4-g51cca6b/src/zc/zlibstorage/tests.py#L314-L338

Those classes will have .use_extension_bytes properly set because ZODB
sets so for FileStorageTests:

https://github.com/zopefoundation/ZODB/blob/5.6.0-48-gbc13ca74b/src/ZODB/tests/testFileStorage.py#L44-L62

However for tests that inherit from ZEO tests, .use_extension_bytes is
not uniformly set for ZEO over FileStorage, and so with ZODB >= 5.6.0
zc.zlibstorage fails with

    Failure in test checkTransactionExtensionFromIterator (zc.zlibstorage.tests.FileStorageClientZlibZEOServerZlibTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/IteratorStorage.py", line 115, in checkTransactionExtensionFromIterator
        self.assertNotEqual(extension_bytes, txn.extension_bytes)
      File "/usr/lib/python2.7/unittest/case.py", line 522, in assertNotEqual
        raise self.failureException(msg)
    AssertionError: '\x80\x03}q\x01U\x03fooK\x01s.' == '\x80\x03}q\x01U\x03fooK\x01s.'

and

    Failure in test checkTransactionExtensionFromIterator (zc.zlibstorage.tests.FileStorageClientZlibZEOZlibTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/IteratorStorage.py", line 115, in checkTransactionExtensionFromIterator
        self.assertNotEqual(extension_bytes, txn.extension_bytes)
      File "/usr/lib/python2.7/unittest/case.py", line 522, in assertNotEqual
        raise self.failureException(msg)
    AssertionError: '\x80\x03}q\x01U\x03fooK\x01s.' == '\x80\x03}q\x01U\x03fooK\x01s.'

Fix it similarly to zopefoundation/ZEO@db24ed2f
icemac pushed a commit to zopefoundation/zc.zlibstorage that referenced this pull request Nov 16, 2021
If test class does not indicate that underlying storage supports
extension_bytes (see zopefoundation/ZODB#207),
then ZODB.tests.IteratorStorage will verify that extension_bytes is
computed from extension instead of taken from on-storage raw value:

https://github.com/zopefoundation/ZODB/blob/5.6.0-48-gbc13ca74b/src/ZODB/tests/IteratorStorage.py#L93-L115

However when underlying storage supports extension_bytes, so is
zlibstorage that wraps it.

Many classes in zc.zlibstorage tests inherit from classes test classes
for FileStorage

https://github.com/zopefoundation/zc.zlibstorage/blob/1.2.0-4-g51cca6b/src/zc/zlibstorage/tests.py#L314-L338

Those classes will have .use_extension_bytes properly set because ZODB
sets so for FileStorageTests:

https://github.com/zopefoundation/ZODB/blob/5.6.0-48-gbc13ca74b/src/ZODB/tests/testFileStorage.py#L44-L62

However for tests that inherit from ZEO tests, .use_extension_bytes is
not uniformly set for ZEO over FileStorage, and so with ZODB >= 5.6.0
zc.zlibstorage fails with

    Failure in test checkTransactionExtensionFromIterator (zc.zlibstorage.tests.FileStorageClientZlibZEOServerZlibTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/IteratorStorage.py", line 115, in checkTransactionExtensionFromIterator
        self.assertNotEqual(extension_bytes, txn.extension_bytes)
      File "/usr/lib/python2.7/unittest/case.py", line 522, in assertNotEqual
        raise self.failureException(msg)
    AssertionError: '\x80\x03}q\x01U\x03fooK\x01s.' == '\x80\x03}q\x01U\x03fooK\x01s.'

and

    Failure in test checkTransactionExtensionFromIterator (zc.zlibstorage.tests.FileStorageClientZlibZEOZlibTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/IteratorStorage.py", line 115, in checkTransactionExtensionFromIterator
        self.assertNotEqual(extension_bytes, txn.extension_bytes)
      File "/usr/lib/python2.7/unittest/case.py", line 522, in assertNotEqual
        raise self.failureException(msg)
    AssertionError: '\x80\x03}q\x01U\x03fooK\x01s.' == '\x80\x03}q\x01U\x03fooK\x01s.'

Fix it similarly to zopefoundation/ZEO@db24ed2f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants