add_callback() not entirely thread-safe #635

Closed
pitrou opened this Issue Nov 13, 2012 · 7 comments

Comments

Projects
None yet
2 participants
Contributor

pitrou commented Nov 13, 2012

It seems that when io_loop.add_callback() is called in a thread while io_loop.close() is being called in another thread, you can get the following race condition:

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "surycat/utils/sockclient.py", line 256, in _run
    self._loop.close()
  File "/usr/lib/python2.7/dist-packages/tornado/ioloop.py", line 195, in close
    self._waker.close()
  File "/usr/lib/python2.7/dist-packages/tornado/platform/posix.py", line 68, in close
    self.writer.close()
IOError: close() called during concurrent operation on the same file object.
Contributor

pitrou commented Nov 13, 2012

Here is a patch with tests: http://dpaste.com/hold/830236/
Sorry for uploading it to an external service, but I couldn't find a button to attach a file to this issue (??).

Owner

bdarnell commented Nov 14, 2012

Github's issue tracker doesn't have any kind of attachment support (except for patches via the fork-and-pull-request system), so an external pastebin-type (or gist.github.com) is the way to go.

What's motivating this change? Since close is essentially a destructor it's not really safe to have other threads calling in to the IOLoop while it's being closed (in practice close is mostly used for unit tests and other isolated IOLoops where this isn't really an issue).

Contributor

pitrou commented Nov 14, 2012

Since close is essentially a destructor it's not really safe to have other threads calling in to the IOLoop while it's being closed

Well, my patch is essentially about making it safe :-)

in practice close is mostly used for unit tests and other isolated IOLoops where this isn't really an issue

In my case, I have an isolated IO loop where other threads can still add callbacks while the loop is being closed, because stopping and closing the loop itself is the result of some incoming events.

To be clear, I'm writing a tornado-based UDP handler which must integrate in an application written in a synchronous style, with other threads doing other stuff. The UDP handler can be started or torn down at the user's requests. The error above appeared sporadically in my unit tests, before I worked around the issue by adding the moral equivalent of this patch.

Owner

bdarnell commented Nov 15, 2012

And these other threads are OK with the fact that their callbacks will never be run? I'd have probably made add_callback on a closed/closing IOLoop throw an exception instead of accepting the callback without running it (I guess there's an analogy with the fact that destructors are never guaranteed to run, but it feels weird to not throw the exception when there's a clear place to do so).

If you're running in to this problem in unittests it sounds like maybe you need to join the other threads before closing the IOLoop? If the IOLoop is local to a single test you don't really want threads referring to it outlasting the test.

Contributor

pitrou commented Nov 15, 2012

Well, the IOLoop is really local to the UDP handler (it is created when starting the handler, and torn down when the handler is disposed of), which isn't related to the fact that the problem appeared during unittests (arguably, it appeared in unittests merely because that class isn't used elsewhere yet :-)).
You're right though that raising an exception may be cleaner and more explicit. How about RuntimeError, or perhaps a subclass thereof?

Owner

bdarnell commented Nov 15, 2012

Yeah, RuntimeError makes sense. One more thing about the patch: in close(), I think it's actually the assignment to self._closing and not the call to _waker.close() that needs to be protected by _callback_lock.

Contributor

pitrou commented Nov 15, 2012

Well, the call to _waker.close() is what produces an exception when there is a another thread running add_callback() (either _waker.close() fails with "concurrent I/O operation", or add_callback() fails with EBADF), which is why you have to have some mutual exclusion there.
OTOH, merely setting an attribute is atomic in Python; you don't need a lock for that.

bdarnell closed this in b724652 Nov 17, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment