Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add test for open vs invalidation race #345

Merged
merged 3 commits into from Apr 21, 2021

Commits on Apr 21, 2021

  1. tests: Add test for open vs invalidation race

    Add test that exercises open vs invalidation race condition that, if
    happen, leads to data corruption. We are seeing such race happening on
    storage level in ZEO (zopefoundation/ZEO#166),
    and previously we've seen it also to happen on Connection level
    (zopefoundation#290). By adding this test
    to be exercised wrt all storages we make sure that all storages stay
    free from this race.
    
    And it payed out. Besides catching original problems from
    zopefoundation#290 and
    zopefoundation/ZEO#166 , this test also
    discovered a concurrency bug in MVCCMappingStorage:
    
        Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
        Traceback (most recent call last):
          File "/usr/lib/python2.7/unittest/case.py", line 329, in run
            testMethod()
          File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
            self.fail(failure[0])
          File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
            raise self.failureException(msg)
        AssertionError: T1: obj1.value (24)  !=  obj2.value (23)
    
    The problem with MVCCMappingStorage was that instance.poll_invalidations
    was correctly taking main_lock with intention to make sure main data is
    not mutated during analysis, but instance.tpc_finish and
    instance.tpc_abort did _not_ taken main lock, which was leading to
    committed data to be propagating into main storage in non-atomic way.
    
    This bug was also observable if both obj1 and obj2 in the added test
    were always loaded from the storage (added obj2._p_invalidate after
    obj1._p_invalidate).
    
    -> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
    instance when processing transaction completion.
    
    /cc @d-maurer, @jamadden, @jmuchemb
    /reviewed-on zopefoundation#345
    navytux committed Apr 21, 2021
    Copy the full SHA
    bf82319 View commit details
    Browse the repository at this point in the history
  2. tests: Add test for load vs external invalidation race

    For ZEO this data corruption bug was reported at
    zopefoundation/ZEO#155 and fixed at
    zopefoundation/ZEO#169.
    
    Without that fix the failure shows e.g. as follows when running ZEO test
    suite:
    
        Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
        Traceback (most recent call last):
          File "/usr/lib/python2.7/unittest/case.py", line 329, in run
            testMethod()
          File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
            self.fail([_ for _ in failure if _])
          File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
            raise self.failureException(msg)
        AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']
    
    Even if added test is somewhat similar to
    check_race_loadopen_vs_local_invalidate, it is added anew without trying
    to unify code. The reason here is that the probability to catch load vs
    external invalidation race is significantly reduced when there are only
    1 modify and 1 verify workers. The unification with preserving both
    tests semantic would make test for "load vs local invalidate" harder to
    follow. Sometimes a little copying is better than trying to unify too
    much.
    
    For the test to work, test infrastructure is amended with
    ._new_storage_client() method that complements ._storage attribute:
    client-server storages like ZEO, NEO and RelStorage allow several
    storage clients to be connected to single storage server. For
    client-server storages test subclasses should implement
    _new_storage_client to return new storage client that is connected to
    the same storage server self._storage is connected to.
    
    For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170
    
    Other client-server storages can follow to implement ._new_storage_client()
    and this way automatically activate this "load vs external invalidation"
    test when their testsuite is run.
    
    Contrary to test for "load vs local invalidate" N is set to lower value (100),
    because with 8 workers the bug is usually reproduced at not-so-high iteration
    number (5-10-20).
    
    /cc @d-maurer, @jamadden, @jmuchemb
    /reviewed-on zopefoundation#345
    navytux committed Apr 21, 2021
    Copy the full SHA
    cf8056c View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    c4166fd View commit details
    Browse the repository at this point in the history