Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add test for open vs invalidation race #345

Merged
merged 3 commits into from Apr 21, 2021

Conversation

navytux
Copy link
Contributor

@navytux navytux commented Apr 14, 2021

Add test that exercises open vs invalidation race condition that, if
happen, leads to data corruption. We are seeing such race happening on
storage level in ZEO (zopefoundation/ZEO#166),
and previously we've seen it also to happen on Connection level
(#290). By adding this test
to be exercised wrt all storages we make sure that all storages stay
free from this race.

And it payed out. Besides catching original problems from
#290 and
zopefoundation/ZEO#166 , this test also
discovered a concurrency bug in MVCCMappingStorage:

Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
    self.fail(failure[0])
  File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
    raise self.failureException(msg)
AssertionError: T1: obj1.value (24)  !=  obj2.value (23)

The problem with MVCCMappingStorage was that instance.poll_invalidations
was correctly taking main_lock with intention to make sure main data is
not mutated during analysis, but instance.tpc_finish and
instance.tpc_abort did not taken main lock, which was leading to
committed data to be propagating into main storage in non-atomic way.

This bug was also observable if both obj1 and obj2 in the added test
were always loaded from the storage (added obj2._p_invalidate after
obj1._p_invalidate).

-> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
instance when processing transaction completion.

/cc @d-maurer, @jamadden, @jmuchemb

@navytux
Copy link
Contributor Author

navytux commented Apr 14, 2021

CI is failing with:

318Version and requirements information containing docutils:
319  Base installation request: 'Sphinx', 'docutils', 'ZODB', 'sphinxcontrib_zopeext', 'j1m.sphinxautozconfig', 'sphinx_rtd_theme'
320  Requirement of sphinx_rtd_theme: docutils<0.17
321  Requirement of j1m.sphinxautozconfig: docutils
322  Requirement of Sphinx: docutils<0.17,>=0.12
323While:
324  Installing sphinx.
325Error: There is a version conflict.
326We already have: docutils 0.17
327but sphinx-rtd-theme 0.5.2 requires 'docutils<0.17'.
328

which should be unrelated to my patch.

navytux added a commit to navytux/ZEO that referenced this pull request Apr 14, 2021
Currently loadBefore and prefetch spawn async protocol.load_before task,
and, after waking up on its completion, populate the cache with received
data. But in between the time when protocol.load_before handler is run
and the time when protocol.load_before caller wakes up, there is a
window in which event loop might be running some other code, including
the code that handles invalidateTransaction messages from the server.

This means that cache updates and cache invalidations can be processed on
the client not in the order that server sent it. And such difference in
the order can lead to data corruption if e.g server sent

	<- loadBefore oid serial=tid1 next_serial=None
	<- invalidateTransaction tid2 oid

and client processed it as

	invalidateTransaction tid2 oid
	cache.store(oid, tid1, next_serial=None)

because here the end effect is that invalidation for oid@tid2 is not
applied to the cache.

The fix is simple: perform cache updates right after loadBefore reply is
received.

Fixes: zopefoundation#155

The fix is based on analysis and initial patch by @jmuchemb:

zopefoundation#155 (comment)

For tests, similarly to zopefoundation/ZODB#345,
I wanted to include a general test for this issue into ZODB, so that all
storages - not just ZEO - are exercised for this race scenario. However
in ZODB test infrastructure there is currently no established general
way to open several client storage connections to one storage server.
This way the test for this issue currently lives in wendelin.core
repository (and exercises both NEO and ZEO there):

https://lab.nexedi.com/nexedi/wendelin.core/commit/c37a989d

I understand there is a room for improvement. For the reference, my
original ZEO-specific data corruption reproducer is here:

zopefoundation#155 (comment)
https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

/cc @d-maurer, @jamadden, @dataflake, @jimfulton
@jmuchemb
Copy link
Member

Ouch, that would be the longest ZODB test. I run them for NEO on my laptop:

  • before
         Test Module |  run  | unexpected | expected | skipped |  time    
    -----------------+-------+------------+----------+---------+----------
               Basic |   16  |       .    |      .   |     .   |   1.88s   
            Conflict |    5  |       .    |      .   |     .   |   0.54s   
             History |    1  |       .    |      .   |     .   |   0.15s   
            Iterator |    9  |       .    |      .   |     .   |   1.01s   
                  MT |    5  |       .    |      .   |     .   |   5.31s   
                Pack |   14  |       .    |      2   |     .   |  17.49s   
          Persistent |    1  |       .    |      .   |     .   |   0.15s   
            ReadOnly |    2  |       .    |      .   |     .   |   0.30s   
            Revision |    7  |       .    |      .   |     .   |   1.61s   
     Synchronization |    7  |       .    |      .   |     .   |   0.50s   
                Undo |   40  |       .    |      2   |     .   |  11.84s   
                ZODB |   22  |       .    |      1   |     .   |   4.09s   
    -----------------+-------+------------+----------+---------+----------
      neo.tests.zodb |       |            |          |         |          
    -----------------+-------+------------+----------+---------+----------
             Summary |  129  |       .    |      5   |     .   |  44.86s   
    -----------------+-------+------------+----------+---------+----------
    
  • after
         Test Module |  run  | unexpected | expected | skipped |  time
    -----------------+-------+------------+----------+---------+----------
               Basic |   17  |       .    |      .   |     .   |  17.29s
            Conflict |    5  |       .    |      .   |     .   |   0.54s
             History |    1  |       .    |      .   |     .   |   0.13s
            Iterator |    9  |       .    |      .   |     .   |   0.99s
                  MT |    5  |       .    |      .   |     .   |   5.27s
                Pack |   14  |       .    |      2   |     .   |  17.25s
          Persistent |    1  |       .    |      .   |     .   |   0.17s
            ReadOnly |    2  |       .    |      .   |     .   |   0.27s
            Revision |    7  |       .    |      .   |     .   |   1.65s
     Synchronization |    7  |       .    |      .   |     .   |   0.51s
                Undo |   40  |       .    |      2   |     .   |  11.92s
                ZODB |   22  |       .    |      1   |     .   |   4.04s
    -----------------+-------+------------+----------+---------+----------
      neo.tests.zodb |       |            |          |         |
    -----------------+-------+------------+----------+---------+----------
             Summary |  130  |       .    |      5   |     .   |  60.03s
    -----------------+-------+------------+----------+---------+----------
    

So ~15s.

@navytux
Copy link
Contributor Author

navytux commented Apr 15, 2021

We can reduce N somewhat, but the timings highlight that NEO/SQLite is 2x slower than ZEO here. And for ZEO with N=1000 it is only 2 seconds on my machine: (ZEO 5.2.2-1-g3d90ed42 + zopefoundation/ZEO#167; NEO v1.12-77-gde0feb4e):

# stable with 3 repeats
kirr@deco:~/src/wendelin/wendelin.core/lib/tests/testprog$ time WENDELIN_CORE_TEST_DB='<zeo>' ./zopenrace.py
OK

real    0m2,340s
user    0m2,561s
sys     0m0,505s
# stable with 3 repeats
kirr@deco:~/src/wendelin/wendelin.core/lib/tests/testprog$ time WENDELIN_CORE_TEST_DB='<neo>' ./zopenrace.py
Using temp directory /tmp/neo_a8sq1P
OK

real    0m4,435s
user    0m4,238s
sys     0m1,026s

Reducing N will reduce the probability to catch problems and so will make CI less useful.

I would say that spending even 10 seconds to reliably catch data corruption bugs is worth it.

I will reduce N to 500. On my machine from 2016 (from 5 years ago) this will make the test to run with ZEO for ~ 1s and to run with NEO for ~ 2s. I believe this is reasonable.

Kirill

navytux added a commit to navytux/ZODB that referenced this pull request Apr 15, 2021
@jmuchemb reports that it this test is noticably slow on his laptop:

zopefoundation#345 (comment)
@navytux
Copy link
Contributor Author

navytux commented Apr 15, 2021

Travis CI failures are, again, completely unrelated to my patch.

@jmuchemb
Copy link
Member

the timings highlight that NEO/SQLite is 2x slower than ZEO here.

For those who wonder, the ZODB tests in NEO run by default in a single-process mode with a special scheduler. Compared to a normal NEO DB:

  • a test DB is slower
  • the test has more control over the DB and some operations like setting up the DB are much faster

In normal mode, the ZODB tests of NEO take 100.97s vs 109.18s, so the added test is about twice as fast, which would be about the same as ZEO.

I will reduce N to 500.

Which which value can you almost always reproduce all bugs that have been found ?

In fact, my previous comment was mainly informative, with little hope that something could be done.

More generally, a "classic" test suite is not really the place for this kind of test. In NEO, we put such test in a stress test suite that we choose for example to run for 1 hour. That would be much more work to do the same in ZODB.

Depending on the value of a useful N, this PR can be fine for now.

@navytux
Copy link
Contributor Author

navytux commented Apr 15, 2021

@jmuchemb, in #345 (comment) I ran zopenrace.py, not ZODB test itself (which is modelled after zopenrace.py). That zopenrace.py uses wendelin.lib.testing.getTestDB to setup test database. With WENDELIN_CORE_TEST_DB='<neo>' that getTestDB uses neo.tests.functional to setup test NEO cluster. Do I read you right that neo.tests.functional employs that special scheduler instead of regular OS processes? I have always thought that - contrary to neo.tests.threaded - neo.tests.functional uses regular OS processes because I see os.fork in its code and I see forked processes in ps output when zopenrace.py is being run. I might be missing something, but does your argument really explain the speed difference?

For the reference, here is how ZEO test cluster is set up when zopenrace.py is run with WENDELIN_CORE_TEST_DB='<zeo>'.

Which which value can you almost always reproduce all bugs that have been found ?

N=1000. Even 500 is somewhat a compromise, because I rememeber sometimes some tests were failing around iter=700 or 800.

More generally, a "classic" test suite is not really the place for this kind of test. In NEO, we put such test in a stress test suite that we choose for example to run for 1 hour. That would be much more work to do the same in ZODB.

It is a reasonable approach when it takes non-small time to reproduce problems. But if it is just several seconds to reproduce, the place for such tests is in regular testsuite in my view. The synergy of those two approaches could be that such tests live in regular testsuite and leverage reasonably small N by default for the run to take seconds. But when the test suite is run with something like --stress or --long mode, those N are raised up by say 10, 100 or more to exercise problems much more thoroughly.

Depending on the value of a useful N, this PR can be fine for now.

I'm fine with N=1000; I'm somewhat ok with N=500. In my view the test becomes less useful if N is reduced further.

Kirill

navytux added a commit to navytux/ZEO that referenced this pull request Apr 16, 2021
…neric _new_storage_client()

This allows ZODB tests to recognize ZEO as client-server storage and so
make "load vs external invalidate" test added in
zopefoundation/ZODB#345 to reproduce data
corruption problem reported at
zopefoundation#155.

For the reference: that problem should be fixed by
zopefoundation#169.

We drop

    # It's hard to find the actual address.
    # The rpc mgr addr attribute is a list.  Each element in the
    # list is a socket domain (AF_INET, AF_UNIX, etc.) and an
    # address.

note because at the time it was added (81f586c) it came with

    addr = self._storage._rpc_mgr.addr[0][1]

but nowdays after 0386718 getting to server address is just by ClientStorage._addr.
@navytux
Copy link
Contributor Author

navytux commented Apr 16, 2021

I have updated this pull request with another test that exercises "load vs external invalidation" data corruption discovered in zopefoundation/ZEO#155. Quoting 9601756 :

---- 8< ----
tests: Add test for load vs external invalidation race

For ZEO this data corruption bug was reported at
zopefoundation/ZEO#155 and fixed at
zopefoundation/ZEO#169.

Without that fix the failure shows e.g. as follows when running ZEO test
suite:

Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run 
    testMethod()
  File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
    self.fail([_ for _ in failure if _]) 
  File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
    raise self.failureException(msg)
AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']

Even if added test is somewhat similar to
check_race_loadopen_vs_local_invalidate, it is added anew without trying
to unify code. The reason here is that the probability to catch load vs
external invalidation race is significantly reduced when there are only
1 modify and 1 verify workers. The unification with preserving both
tests semantic would make test for "load vs local invalidate" harder to
follow. Sometimes a little copying is better than trying to unify too
much.

For the test to work, test infrastructure is amended with
._new_storage_client() method that complements ._storage attribute:
client-server storages like ZEO, NEO and RelStorage allow several
storage clients to be connected to single storage server. For
client-server storages test subclasses should implement
_new_storage_client to return new storage client that is connected to
the same storage server self._storage is connected to.

For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170

Other client-server storages can follow to implement ._new_storage_client()
and this way automatically activate this "load vs external invalidation"
test when their testsuite is run.

Contrary to test for "load vs local invalidate" N is set to lower value (100),
because with 8 workers the bug is usually reproduced at not-so-high iteration
number (5-10-20).

/cc @d-maurer, @jamadden, @jmuchemb
---- 8< ----

So now this pull-request, together with zopefoundation/ZEO#170, adds test for all data corruption bugs in ZODB5 stack discovered so far.

@navytux
Copy link
Contributor Author

navytux commented Apr 16, 2021

Travis CI failures - all with error to install sphinx - are, again, completely unrelated to my patch.

@d-maurer
Copy link
Contributor

Earlier, we observed that race conditions are much harder to detect in Python 3 than Python 2. The experience from zopefoundation/ZEO#168 (comment) explains this observation: obviously, race condition probability increases with the context switch frequency; by default Python 3 switches contexts every 5 ms compared to 100 interpreter instructions for Python 2 (about 0.01 ms on modern hardware); thus, under normal settings, Python 3 is almost sequential compared to Python 2.

We may want to control the context switch frequency for race condition tests

@navytux
Copy link
Contributor Author

navytux commented Apr 19, 2021

@d-maurer, thanks, this might be good idea. I see corresponding changes on https://github.com/zopefoundation/ZEO/pull/167/files/2c98d943d7e191abd64cdac53876267dffac9c74..4f59226553e651e5cc8597e734a084f63501c16e and I've tried to see whether it indeed makes the test fail with python3. Unfortunately I stilll cannot trigger the test to fail, even with something like

--- a/lib/tests/testprog/zopenrace.py
+++ b/lib/tests/testprog/zopenrace.py
@@ -106,6 +106,10 @@ def __init__(self, i):
 
 @func
 def main():
+    import sys
+    #sys.setcheckinterval(10)
+    sys.setswitchinterval(5*1e-5)
+
     tdb = getTestDB()
     tdb.setup()
     defer(tdb.teardown)

I've tried also 1e-6, 1e-5 and 1e-4 - the test (zopenrace.py) always pass for me with CPython 3.9.2.

For the reference "100 instructions" on my machine is something in between 1e-6 and 1e-5 seconds.

I also see CPython itself has sys.setswitchinterval(1e-6) (the lowest possible value) in tests for threading bugs, so still this might be something we could want to consider.

https://github.com/python/cpython/blob/68ba0c67cac10c2545ea3b62d8b161e5b3594edd/Lib/test/test_concurrent_futures.py#L687-L694
https://github.com/python/cpython/blob/68ba0c67cac10c2545ea3b62d8b161e5b3594edd/Lib/test/_test_multiprocessing.py#L4280-L4295
https://github.com/python/cpython/blob/68ba0c67cac10c2545ea3b62d8b161e5b3594edd/Lib/test/test_importlib/test_threaded_import.py#L261-L273
...

However on python2, if I lower the "check interval" from default 100 to 1 (again the lowest possible value), zopenrace.py stops to reproduce problem as well.

We might indeed move race tests to layer (as in your changes), or mark with something lile @with_high_concurrency decorator, but it is still not completely clear to me what to do in those settings on python3. And python2 "works" out of the box with defaults.

@d-maurer
Copy link
Contributor

d-maurer commented Apr 20, 2021 via email

@navytux
Copy link
Contributor Author

navytux commented Apr 20, 2021

@d-maurer, thanks. I have marked my race-reproducing tests with @with_high_concurrency decorator that makes sure to run wrapped function with appropriate sys.set{switch,check}interval. Details are here: acace75.

Would you please have a look?
I believe this PR should be ready to go in. And even if later we find something to improve, we can improve that as a next follow-up step.

Thanks beforehand,
Kirill

navytux added a commit to navytux/ZEO that referenced this pull request Apr 20, 2021
Add entry to changelog.

Note: the fix now has corresponding test, that should be coming in through

zopefoundation#170  and
zopefoundation/ZODB#345
@navytux
Copy link
Contributor Author

navytux commented Apr 20, 2021

CI is ok, modulo already seen unrelated "version conflict on install sphinx" kind of problems.

"""
@functools.wraps(f)
def _(*argv, **kw):
if six.PY3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move to an upper level so we can have:

previous = sys.getswitchinterval()
try:
    sys.setswitchinterval(5e-6)
    return f(*argv, **kw)
finally:
    sys.setswitchinterval(previous)

and

previous = sys.getcheckinterval()
try:
    sys.setcheckinterval(100)
    return f(*argv, **kw)
finally:
    sys.setcheckinterval(previous)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the code less clear to me. I prefer to keep my original version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue about the duplication of the try/return/finally lines, even if for me it's so small that I'd call it "overfactoring". But less clear? There's nothing more pythonic than the following pattern:

save state
try:
    mess up (and ideally, do it after the try - see PEP 419)
    ...
finally:
    restore

No need to "name" the restore part by moving it a function.

When we drop support for Python 2, that's anyway how it would be rewritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue about the duplication of the try/return/finally lines

Yes, that duplication of f call sites makes it less clear in my view.

When we drop support for Python 2, that's anyway how it would be rewritten.

So let's rewrite it that way if | when Python 2 support is dropped. And until it is not dropped, let's keep the code clearly structured.

src/ZODB/tests/util.py Outdated Show resolved Hide resolved
break
except:
failed.set()
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer:

  1. move _modify/_verify out of modify/verify
  2. merge modify & verify into a single function (you could still have a nested function with code that is common to _modify/_verify)
  3. rename _modify/_verify to modify/verify
  4. make threading.Thread pass an extra argument that is either modify or verify
  5. for the commented print, you can use __name__ of the passed function

On the other side, no need for threading.Thread to pass N.

Copy link
Contributor Author

@navytux navytux Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's have it this way

On the other side, no need for threading.Thread to pass N.

except this - I prefer to pass parameters explicitly instead of implicitly referring to shared globals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you capitalized it, it's semantically not a variable but a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a major parameter.

src/ZODB/tests/BasicStorage.py Outdated Show resolved Hide resolved
for x in range(nwork):
t = threading.Thread(name='T%d' % x, target=T, args=(x, N))
t.start()
tg.append(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about

tg = [threading.Thread(name='T%d' % x, target=T, args=(x, N))
      for x in range(nwork)]
for t in tg:
    t.start()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bit less clear to me, so I prefer to leave my version of this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides style (here, I won't insist), I wonder it makes a difference to minimize the difference of start time between threads, and if it would have an impact to reproduce failure. Or maybe it's completely negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this change and run the tests on unfixed ZEO. The failure is reproducible with both versions. The iteration number at which the failue is reproduced is scattered in between 1-5-6-30-... in both versions. So I believe this change is completely orthogonal to test semantic.

src/ZODB/tests/BasicStorage.py Outdated Show resolved Hide resolved
@d-maurer
Copy link
Contributor

To get rid of the Sphinx problem, you could try to add in buildout.cfg under section versions:python2: docutils = < 0.17

@navytux
Copy link
Contributor Author

navytux commented Apr 20, 2021

To get rid of the Sphinx problem, you could try to add in buildout.cfg under section versions:python2: docutils = < 0.17

Thanks, @d-maurer. I've created #346 to address this.

navytux added a commit to zopefoundation/ZEO that referenced this pull request Apr 20, 2021
…neric _new_storage_client() (#170)

This allows ZODB tests to recognize ZEO as client-server storage and so
make "load vs external invalidate" test added in
zopefoundation/ZODB#345 to reproduce data
corruption problem reported at
#155.

For the reference: that problem should be fixed by
#169.

We drop

    # It's hard to find the actual address.
    # The rpc mgr addr attribute is a list.  Each element in the
    # list is a socket domain (AF_INET, AF_UNIX, etc.) and an
    # address.

note because at the time it was added (81f586c) it came with

    addr = self._storage._rpc_mgr.addr[0][1]

but nowdays after 0386718 getting to server address is just by ClientStorage._addr.

/reviewed-on #170
/reviewed-by @d-maurer
navytux added a commit to navytux/ZODB that referenced this pull request Apr 20, 2021
Address review feedback provided by @jmuchemb:

1. move _modify/_verify out of modify/verify
2. merge modify & verify into a single function (you could still have a nested function with code that is common to _modify/_verify)
3. rename _modify/_verify to modify/verify
4. make threading.Thread pass an extra argument that is either modify or verify
5. for the commented print, you can use __name__ of the passed function

zopefoundation#345 (comment)
navytux added a commit to navytux/ZODB that referenced this pull request Apr 20, 2021
Address review feedback:

Move nwork up so that there is no longer misleading failure=[] coming
before failure=[None]*nwork. Also add space in `[None] * nwork` as
suggested.

zopefoundation#345 (comment)
zopefoundation#345 (comment)
navytux added a commit to navytux/ZODB that referenced this pull request Apr 20, 2021
@navytux
Copy link
Contributor Author

navytux commented Apr 20, 2021

Changes updated according to review feedback. Unless #346 is merged some CI will fail, so if ok let's merge #346 first and then I will incorporate that into hereby PR.

@navytux
Copy link
Contributor Author

navytux commented Apr 20, 2021

#346 was merged. I synced changes in this pull-request with master to get to green CI state. Let's see how it goes this time.

navytux added a commit to navytux/ZEO that referenced this pull request Apr 20, 2021
Currently loadBefore and prefetch spawn async protocol.load_before task,
and, after waking up on its completion, populate the cache with received
data. But in between the time when protocol.load_before handler is run
and the time when protocol.load_before caller wakes up, there is a
window in which event loop might be running some other code, including
the code that handles invalidateTransaction messages from the server.

This means that cache updates and cache invalidations can be processed on
the client not in the order that server sent it. And such difference in
the order can lead to data corruption if e.g server sent

	<- loadBefore oid serial=tid1 next_serial=None
	<- invalidateTransaction tid2 oid

and client processed it as

	invalidateTransaction tid2 oid
	cache.store(oid, tid1, next_serial=None)

because here the end effect is that invalidation for oid@tid2 is not
applied to the cache.

The fix is simple: perform cache updates right after loadBefore reply is
received.

Fixes: zopefoundation#155

The fix is based on analysis and initial patch by @jmuchemb:

zopefoundation#155 (comment)

A tests corresponding to the fix is coming coming through

zopefoundation#170  and
zopefoundation/ZODB#345

For the reference, my original ZEO-specific data corruption reproducer
is here:

zopefoundation#155 (comment)
https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

/cc @jamadden, @dataflake, @jimfulton
/reviewed-by @jmuchemb, @d-maurer
/reviewed-on zopefoundation#169
navytux added a commit to zopefoundation/ZEO that referenced this pull request Apr 20, 2021
Currently loadBefore and prefetch spawn async protocol.load_before task,
and, after waking up on its completion, populate the cache with received
data. But in between the time when protocol.load_before handler is run
and the time when protocol.load_before caller wakes up, there is a
window in which event loop might be running some other code, including
the code that handles invalidateTransaction messages from the server.

This means that cache updates and cache invalidations can be processed on
the client not in the order that server sent it. And such difference in
the order can lead to data corruption if e.g server sent

	<- loadBefore oid serial=tid1 next_serial=None
	<- invalidateTransaction tid2 oid

and client processed it as

	invalidateTransaction tid2 oid
	cache.store(oid, tid1, next_serial=None)

because here the end effect is that invalidation for oid@tid2 is not
applied to the cache.

The fix is simple: perform cache updates right after loadBefore reply is
received.

Fixes: #155

The fix is based on analysis and initial patch by @jmuchemb:

#155 (comment)

A tests corresponding to the fix is coming coming through

#170  and
zopefoundation/ZODB#345

For the reference, my original ZEO-specific data corruption reproducer
is here:

#155 (comment)
https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

/cc @jamadden, @dataflake, @jimfulton
/reviewed-by @jmuchemb, @d-maurer
/reviewed-on #169
@navytux
Copy link
Contributor Author

navytux commented Apr 20, 2021

CI is green.

Add test that exercises open vs invalidation race condition that, if
happen, leads to data corruption. We are seeing such race happening on
storage level in ZEO (zopefoundation/ZEO#166),
and previously we've seen it also to happen on Connection level
(zopefoundation#290). By adding this test
to be exercised wrt all storages we make sure that all storages stay
free from this race.

And it payed out. Besides catching original problems from
zopefoundation#290 and
zopefoundation/ZEO#166 , this test also
discovered a concurrency bug in MVCCMappingStorage:

    Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
        self.fail(failure[0])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: T1: obj1.value (24)  !=  obj2.value (23)

The problem with MVCCMappingStorage was that instance.poll_invalidations
was correctly taking main_lock with intention to make sure main data is
not mutated during analysis, but instance.tpc_finish and
instance.tpc_abort did _not_ taken main lock, which was leading to
committed data to be propagating into main storage in non-atomic way.

This bug was also observable if both obj1 and obj2 in the added test
were always loaded from the storage (added obj2._p_invalidate after
obj1._p_invalidate).

-> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
instance when processing transaction completion.

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on zopefoundation#345
For ZEO this data corruption bug was reported at
zopefoundation/ZEO#155 and fixed at
zopefoundation/ZEO#169.

Without that fix the failure shows e.g. as follows when running ZEO test
suite:

    Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
        self.fail([_ for _ in failure if _])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']

Even if added test is somewhat similar to
check_race_loadopen_vs_local_invalidate, it is added anew without trying
to unify code. The reason here is that the probability to catch load vs
external invalidation race is significantly reduced when there are only
1 modify and 1 verify workers. The unification with preserving both
tests semantic would make test for "load vs local invalidate" harder to
follow. Sometimes a little copying is better than trying to unify too
much.

For the test to work, test infrastructure is amended with
._new_storage_client() method that complements ._storage attribute:
client-server storages like ZEO, NEO and RelStorage allow several
storage clients to be connected to single storage server. For
client-server storages test subclasses should implement
_new_storage_client to return new storage client that is connected to
the same storage server self._storage is connected to.

For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170

Other client-server storages can follow to implement ._new_storage_client()
and this way automatically activate this "load vs external invalidation"
test when their testsuite is run.

Contrary to test for "load vs local invalidate" N is set to lower value (100),
because with 8 workers the bug is usually reproduced at not-so-high iteration
number (5-10-20).

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on zopefoundation#345
@navytux
Copy link
Contributor Author

navytux commented Apr 21, 2021

I believe it is time to merge this.

@navytux
Copy link
Contributor Author

navytux commented Apr 21, 2021

( patches rebased to master with fixups squashed into corresponding commits )

@navytux navytux merged commit ef9e02d into zopefoundation:master Apr 21, 2021
navytux added a commit that referenced this pull request Apr 21, 2021
Add test that exercises open vs invalidation race condition that, if
happen, leads to data corruption. We are seeing such race happening on
storage level in ZEO (zopefoundation/ZEO#166),
and previously we've seen it also to happen on Connection level
(#290). By adding this test
to be exercised wrt all storages we make sure that all storages stay
free from this race.

And it payed out. Besides catching original problems from
#290 and
zopefoundation/ZEO#166 , this test also
discovered a concurrency bug in MVCCMappingStorage:

    Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
        self.fail(failure[0])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: T1: obj1.value (24)  !=  obj2.value (23)

The problem with MVCCMappingStorage was that instance.poll_invalidations
was correctly taking main_lock with intention to make sure main data is
not mutated during analysis, but instance.tpc_finish and
instance.tpc_abort did _not_ taken main lock, which was leading to
committed data to be propagating into main storage in non-atomic way.

This bug was also observable if both obj1 and obj2 in the added test
were always loaded from the storage (added obj2._p_invalidate after
obj1._p_invalidate).

-> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
instance when processing transaction completion.

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on #345
navytux added a commit that referenced this pull request Apr 21, 2021
For ZEO this data corruption bug was reported at
zopefoundation/ZEO#155 and fixed at
zopefoundation/ZEO#169.

Without that fix the failure shows e.g. as follows when running ZEO test
suite:

    Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
        self.fail([_ for _ in failure if _])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']

Even if added test is somewhat similar to
check_race_loadopen_vs_local_invalidate, it is added anew without trying
to unify code. The reason here is that the probability to catch load vs
external invalidation race is significantly reduced when there are only
1 modify and 1 verify workers. The unification with preserving both
tests semantic would make test for "load vs local invalidate" harder to
follow. Sometimes a little copying is better than trying to unify too
much.

For the test to work, test infrastructure is amended with
._new_storage_client() method that complements ._storage attribute:
client-server storages like ZEO, NEO and RelStorage allow several
storage clients to be connected to single storage server. For
client-server storages test subclasses should implement
_new_storage_client to return new storage client that is connected to
the same storage server self._storage is connected to.

For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170

Other client-server storages can follow to implement ._new_storage_client()
and this way automatically activate this "load vs external invalidation"
test when their testsuite is run.

Contrary to test for "load vs local invalidate" N is set to lower value (100),
because with 8 workers the bug is usually reproduced at not-so-high iteration
number (5-10-20).

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on #345
@navytux navytux deleted the y/openrace branch April 21, 2021 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants