Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError on releasing resources of a Connection when closing the DB. #208

Closed
sunew opened this Issue Jun 24, 2018 · 16 comments

Comments

Projects
None yet
4 participants
@sunew
Copy link
Contributor

sunew commented Jun 24, 2018

UPDATE: see this comment for an up to date description: #208 (comment)

When running tests (in plone.restapi) we sometimes get the following KeyError. It is related to timing issues - inserting a time.sleep(2) in the test setup causes the problem to occur more frequently.

It can also be reproduced outside of a testing environment, when zope is shutting down in the middle of ongoing requests. See a comment below for a method of easy reproduction.

ZODB 5.4.0
It does not happen with ZODB 5.3.0

It is related to this commit: 1b9475d

In line 948 we see: c.close(False)

This is part of the traceback:
File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources c.close(False)
In 5.3.0 the corresponding line just sets the transaction manager to None:
c.transaction_manager = None

  Set up plone.restapi.testing.PloneRestApiDXLayer:Functional in 0.000 seconds.
  Running:
    1/77 (1.3%) test_comments_add_root (plone.restapi.tests.test_documentation.TestCommenting)/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/PubCore/ZServerPublisher.py:22: DeprecationWarning: publish_module is deprecated. Please import from ZServer.ZPublisher.Publish.
  from ZPublisher.Publish import publish_module
    8/77 (10.4%) test_documentation_expansion (plone.restapi.tests.test_documentation.TestCommenting)/work/buildout_cache/eggs/zope.deprecation-4.3.0-py2.7.egg/zope/deprecation/deprecation.py:88: DeprecationWarning: isDefaultPage is deprecated. Import from Products.CMFPlone.defaultpage instead
  name)
    22/77 (28.6%) test_documentation_file (plone.restapi.tests.test_documentation.TestDocumentation)/work/buildout_cache/eggs/Products.PortalTransforms-3.1.3-py2.7.egg/Products/PortalTransforms/libtransforms/commandtransform.py:105: DeprecationWarning: os.popen4 is deprecated.  Use the subprocess module.
  cin, couterr = os.popen4(command, 'b')
    32/77 (41.6%) test_documentation_jwt_login (plone.restapi.tests.test_documentation.TestDocumentation) (1.199 s)Traceback (most recent call last):
  File "./bin/test", line 442, in <module>
    '--test-path', '/work/buildout_cache/eggs/repoze.xmliter-0.6-py2.7.egg',
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 60, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 73, in run_internal
    runner.run()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 189, in run
    self.run_tests()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 291, in run_tests
    self.skipped, self.import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 460, in run_layer
    import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 383, in run_tests
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 370, in run
    result.stopTest(self)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 897, in stopTest
    self.testTearDown()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 821, in testTearDown
    layer.testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.app.testing/plone/app/testing/layers.py", line 288, in testTearDown
    super(PloneTestLifecycle, self).testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.testing/src/plone/testing/z2.py", line 945, in testTearDown
    self['zodbDB'].close()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 651, in close
    @self._connectionMap
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 518, in _connectionMap
    self.pool.map(f)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 213, in map
    self.all.map(f)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 62, in map
    f(elt)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 659, in _
    conn._release_resources()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources
    c.close(False)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 304, in close
    self.transaction_manager.unregisterSynch(self)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py", line 106, in unregisterSynch
    self._synchs.remove(synch)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 51, in remove
    del self.data[id(obj)]
  File "/usr/lib/python2.7/weakref.py", line 86, in __delitem__
    del self.data[key]
KeyError: 139902031255440


@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jun 24, 2018

@pbauer same error as in coredev Jenkins 5.2 pull reguest build here:
https://jenkins.plone.org/job/pull-request-5.2/891/consoleFull

@jmuchemb

This comment has been minimized.

Copy link
Member

jmuchemb commented Jun 24, 2018

In #181, I wrote the following comment about the change from c.transaction_manager = None to c.close(False):

I didn't plan to change this at the beginning but while running tests, I discovered that closing a DB left Connections registered to the global transaction manager. Before, these Connections were ignored by patching their afterCompletion/newTransaction.

But the call to unregisterSynch depends on the condition self.opened and self.transaction_manager is not None. Can you debug to see why your connection is in an inconsistent state ? For example, Is it possible that the call to Connection.open was interrupted between self.opened = time.time() and transaction_manager.registerSynch(self) ?

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jun 25, 2018

@jmuchemb Thanks, I will take a look!

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jun 28, 2018

Hi @jmuchemb , I did some debugging, and things are beginning to make sense.

Context:
It happens when running tests with plone.testing, using the FUNCTIONAL_TESTING layer.

There are two threads:

  1. There is a thread (MainThread), starting the application (Zope 4.0b5), opening the DB.
  2. A http request (from a testcase) is made to Zope, resulting in a new thread (Dummy-1) to process the request.
  3. The http request succeeds.
  4. The testcase is done, testTeardown is called.
  5. On tear down of the test, the database is closed, with ZODB.DB.DB.close()

In some cases, depending on timing, machine speed, etc., we get the failure described above in the previous comments.

A quick walkthrough (I skip over some bits) of the teardown:
MainThread:
DB.close(), (lines 636-668 ZODB.DB.py):
iterates over connections in the pool,
calls conn._release_resources for each connection.

_release_resources calls connection.close()
close() calls self.transaction_manager.unregisterSynch(self)
and this calls self._synchs.remove(synch)

Meanwhile, with really bad timing ;) Dummy-1 is finishing the request (see 2.) and closing its connection.

t1: MainThread: DB.Close -> get connections in pool and begin the iteration of all the DB connections
t2: Dummy-1: connection.close() and unregisterSynch
t3: MainThread: during close() on the connection of Dummy-1 - The synch will be gone from the transactionmanager._synchs
Causing the KeyError.

To put it short:
The problem happens when the DB is being closed and is closing all the registered connections, while another thread is finishing a request and closing its connection.

Pattern:
1 mainthread registers 7f16efd94550
2 dummy1 registers 7f16efd73110
3 dummy1 unregisters 7f16efd73110
4 mainthread unregisters 7f16efd94550
5 mainthread tries to unregister the connection of dummy1 7f16efd73110

  • and we have failure

Raw debug output:

registerSynch: <Connection at 7f16efd94550> id: 139736490001744 thread:MainThread
synchs before register: [] 

registerSynch: <Connection at 7f16efd73110> id: 139736489865488 thread:Dummy-1
synchs before register: [] 

unregisterSynch: <Connection at 7f16efd73110> id: 139736489865488 thread:Dummy-1
synchs before unregister: [<Connection at 7f16efd73110>] 

unregisterSynch: <Connection at 7f16efd94550> id: 139736490001744 thread:MainThread
synchs before unregister: [<Connection at 7f16efd94550>] 

unregisterSynch: <Connection at 7f16efd73110> id: 139736489865488 thread:MainThread
synchs before unregister: [] 

Traceback (most recent call last):
  File "./bin/test", line 436, in <module>
    '--test-path', '/work/buildout_cache/eggs/zope.globalrequest-1.4-py2.7.egg',
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 60, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 73, in run_internal
    runner.run()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 189, in run
    self.run_tests()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 291, in run_tests
    self.skipped, self.import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 460, in run_layer
    import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 383, in run_tests
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 370, in run
    result.stopTest(self)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 897, in stopTest
    self.testTearDown()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 821, in testTearDown
    layer.testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.app.testing/plone/app/testing/layers.py", line 288, in testTearDown
    super(PloneTestLifecycle, self).testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.testing/src/plone/testing/z2.py", line 950, in testTearDown
    self['zodbDB'].close()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 651, in close
    @self._connectionMap
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 518, in _connectionMap
    self.pool.map(f)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 213, in map
    self.all.map(f)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 62, in map
    f(elt)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 659, in _
    conn._release_resources()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources
    c.close(False)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 304, in close
    self.transaction_manager.unregisterSynch(self)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py", line 120, in unregisterSynch
    self._synchs.remove(synch)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 51, in remove
    del self.data[id(obj)]
  File "/usr/lib/python2.7/weakref.py", line 86, in __delitem__
    del self.data[key]
KeyError: 139736489865488

Stack of Dummy-1

(Pdb)   /work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/PubCore/ZServerPublisher.py(32)__init__()
-> response=b)
  /work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/ZPublisher/Publish.py(405)publish_module()
-> environ, debug, request, response)
  /work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/ZPublisher/Publish.py(291)publish_module_standard()
-> request.close()
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/ZPublisher/BaseRequest.py(224)close()
-> self.clear()
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/ZPublisher/HTTPRequest.py(211)clear()
-> BaseRequest.clear(self)
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/ZPublisher/BaseRequest.py(216)clear()
-> self._held = None
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/App/ZApplication.py(90)__del__()
-> self._jar.close()
> /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py(304)close()
-> self.transaction_manager.unregisterSynch(self)
  /work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py(123)unregisterSynch()

Stack of MainThread

(Pdb)  /work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py(60)run()
-> failed = run_internal(defaults, args, script_parts=script_parts)
  /work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py(73)run_internal()
-> runner.run()
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(189)run()
-> self.run_tests()
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(291)run_tests()
-> self.skipped, self.import_errors)
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(460)run_layer()
-> import_errors)
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(383)run_tests()
-> test(result)
  /usr/lib/python2.7/unittest/case.py(393)__call__()
-> return self.run(*args, **kwds)
  /usr/lib/python2.7/unittest/case.py(370)run()
-> result.stopTest(self)
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(897)stopTest()
-> self.testTearDown()
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(821)testTearDown()
-> layer.testTearDown()
  /work/playground/plone/plone52devel/src/plone.app.testing/plone/app/testing/layers.py(288)testTearDown()
-> super(PloneTestLifecycle, self).testTearDown()
  /work/playground/plone/plone52devel/src/plone.testing/src/plone/testing/z2.py(950)testTearDown()
-> self['zodbDB'].close()
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(651)close()
-> @self._connectionMap
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(518)_connectionMap()
-> self.pool.map(f)
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(213)map()
-> self.all.map(f)
  /work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py(62)map()
-> f(elt)
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(659)_()
-> conn._release_resources()
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py(948)_release_resources()
-> c.close(False)
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py(304)close()
-> self.transaction_manager.unregisterSynch(self)
> /work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py(126)unregisterSynch()
-> self._synchs.remove(synch)

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jun 28, 2018

I can reproduce it outside of a testing environment.

  1. insert time.sleep(10) somewhere in code run by a http request
  2. start the request from a browser
  3. before the 10 seconds have passed, stop zope (gracefully with "kill `cat var/instance.pid`")

This is the output incl. my debugging statements:

registerSynch: <Connection at 7f419b096810> id: 139919750686736 thread:Dummy-1
synchs before register: [] 

/work/buildout_cache/eggs/zope.deprecation-4.3.0-py2.7.egg/zope/deprecation/deprecation.py:88: DeprecationWarning: isDefaultPage is deprecated. Import from Products.CMFPlone.defaultpage instead
  name)
/work/buildout_cache/eggs/Chameleon-3.2-py2.7.egg/chameleon/tal.py:471: DeprecationWarning: Using len() is deprecated. Use the `length` attribute for the size of the current page, which is what we return now. Use the `sequence_length` attribute for the size of the entire sequence. 
  iterable = list(iterable) if iterable is not None else ()
2018-06-28 23:57:02 INFO SignalHandler Caught signal SIGTERM
2018-06-28 23:57:02 INFO Z2 Shutting down fast
2018-06-28 23:57:02 INFO ZServer closing HTTP to new connections

unregisterSynch: <Connection at 7f419b096810> id: 139919750686736 thread:MainThread
synchs before unregister: [] 

Traceback (most recent call last):
  File "/work/playground/plone/plone52devel/parts/instance/bin/interpreter", line 281, in <module>
    exec(compile(__file__f.read(), __file__, "exec"))
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/run.py", line 64, in <module>
    run()
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/run.py", line 29, in run
    starter.run()
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/starter.py", line 98, in run
    self.shutdown()
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/starter.py", line 103, in shutdown
    db.close()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 651, in close
    @self._connectionMap
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 518, in _connectionMap
    self.pool.map(f)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 213, in map
    self.all.map(f)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 62, in map
    f(elt)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 659, in _
    conn._release_resources()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources
    c.close(False)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 304, in close
    self.transaction_manager.unregisterSynch(self)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py", line 126, in unregisterSynch
    self._synchs.remove(synch)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 51, in remove
    del self.data[id(obj)]
  File "/usr/lib/python2.7/weakref.py", line 86, in __delitem__
    del self.data[key]
KeyError: 139919750686736
@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jun 28, 2018

my debug statements:
(transaction/_manager.py)

    def registerSynch(self, synch):
        """ See ITransactionManager.
        """
        import threading
        # if threading.currentThread().getName() == 'Dummy-1':
        #     from pdb import set_trace; set_trace()
        print '\n\x1b[1;33;40m' + \
            'registerSynch: ' + str(synch) + \
            ' id: ' + str(id(synch)) + \
            ' thread:' + threading.currentThread().getName() + \
            '\n' + 'synchs before register: ' + \
            str([ref() for ref in self._synchs.as_weakref_list()]) + '\x1b[0m \n'
        self._synchs.add(synch)
        if self._txn is not None:
            synch.newTransaction(self._txn)

    def unregisterSynch(self, synch):
        """ See ITransactionManager.
        """
        import threading
        # if threading.currentThread().getName() == 'Dummy-1':
        #     from pdb import set_trace; set_trace()
        print '\n\x1b[1;33;40m' + \
            'unregisterSynch: ' + str(synch) + \
            ' id: ' + str(id(synch)) + \
            ' thread:' + threading.currentThread().getName() + \
            '\n' + 'synchs before unregister: ' + \
            str([ref() for ref in self._synchs.as_weakref_list()]) + '\x1b[0m \n'
        # if synch not in self._synchs:
        #     from pdb import set_trace; set_trace()
        self._synchs.remove(synch)

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jul 4, 2018

@jmuchemb Hi, did you have time to take a look at my results?

@sunew sunew changed the title intermittent KeyError on releasing resources of a Connection KeyError on releasing resources of a Connection when closing the DB. Jul 5, 2018

@jmuchemb

This comment has been minimized.

Copy link
Member

jmuchemb commented Jul 6, 2018

So to sum up, Connection.close() is not thread-safe and from your report, it looks legitimate to close a connection at the same time as the DB is being closed. And it worked before because DB.close didn't actually close connections.

It should be noted also that the bug is not limited to unregisterSynch(). We could also have race conditions while processing __onCloseCallbacks or else.

For this kind of issue, the first idea that comes to mind is to protect Connection.close() with a new lock on the Connection object. There may be better solutions so I keep thinking about this. I doubt that reverting my change is correct: leaving connections registered to a transaction was really bad.

@jmuchemb jmuchemb added the bug label Jul 6, 2018

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Jul 6, 2018

@jmuchemb
Your summary: Yes, I agree with your conclusion.

And yes, there could be other similar race conditions. Sorry for the overload of data in my report 😄, but I was reporting while digging deeper.

And I agree with not reverting your commit, it is just making the issue surface, it is not real cause of the issue.

Maybe @jimfulton has an opinion on the protection of Connection.close() ?

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Aug 14, 2018

any news on this one?

sunew added a commit to sunew/transaction that referenced this issue Sep 15, 2018

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Oct 2, 2018

I have investigated a bit more:

We are in a zope4 setting, running the application normally, not as part of a test suite. (The test suite runs mentioned above made the issue surface more frequently).

There is a main thread and a thread per request.

A) A new request thread opens a connection, that registers with the transaction manager (in Connection.open() - calls transaction/_manager.py - registerSynch).
The connection is added to _synchs WeakSet of the transaction_manager.

B) The application is shut down - perhaps with SIGINT or SIGTERM. The request of A) is long running, so is still active while the application begins to shutdown. This means the request thread has not yet closed its connection.
The connection is still in DB pool, while shutdown calls ZODB.DB.close()

C) DB.close() iterates over all connections in the pool (with _connectionMap): For each connection:
conn._release_resources() -> conn.close() -> self.transaction_manager.unregisterSynch(self)

D) But the transaction_manager is a ThreadTransactionManager, a subclass of threading.local, so the _synchs WeakSet is local for each thread - this means the main thread sees a different _synchs than the request thread, causing the keyerror when trying to unregister the connection with the transaction manager from the main thread during db.close()

I'm not sure what a good solution would be.
@jimfulton any ideas or hints?

sunew added a commit that referenced this issue Oct 3, 2018

@sunew

This comment has been minimized.

Copy link
Contributor Author

sunew commented Oct 3, 2018

@jimfulton

This comment has been minimized.

Copy link
Member

jimfulton commented Oct 14, 2018

Here's an update. @sunew thanks for the test, that helped a lot. Your initial assessment was correct and mine was wrong. :) I'm pondering a fix.

@jimfulton

This comment has been minimized.

Copy link
Member

jimfulton commented Oct 14, 2018

I've added a fix on https://github.com/zopefoundation/ZODB/tree/sunew-failing-test-for-208 that depends on zopefoundation/transaction#68. When the transaction change is released, then I can release a ZODB fix.

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 17, 2018

@pbauer

This comment has been minimized.

Copy link

pbauer commented Oct 17, 2018

I tested that against Plone and found that the fix also requires changes in Zope where the transaction is used like this (I already added the additional .manager):

@contextmanager
def load_app(module_info):
    app_wrapper, realm, debug_mode = module_info
    # Loads the 'OFS.Application' from ZODB.
    app = app_wrapper()

    try:
        yield (app, realm, debug_mode)
    finally:
        if transaction.manager.manager._txn is not None:
            # Only abort a transaction, if one exists. Otherwise the
            # abort creates a new transaction just to abort it.
            transaction.abort()
        app._p_jar.close()
@jmuchemb

This comment has been minimized.

Copy link
Member

jmuchemb commented Oct 17, 2018

That becomes quite invasive, no ?

(actually, on performance side, I'm not fond of all the extra method calls in transaction)

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 17, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 18, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 20, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 24, 2018

icemac added a commit to zopefoundation/Zope that referenced this issue Oct 24, 2018

icemac added a commit to zopefoundation/Zope that referenced this issue Oct 24, 2018

@icemac icemac closed this in #226 Oct 25, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 26, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 28, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.