-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libzmq assertion at exit under Python 3 #173
Comments
Interesting. I can confirm the issue with Python 3.2.2 on OSX 10.7.3, and pyzmq master. Asserts in libzmq generally mean bugs in libzmq itself, but it is certainly strange that pyzmq could hit it differently in Python 2 and 3. Adding an explict |
I tried to dig into this a little this evening without a lot of success. The assertion is being thrown because the poller still has one file descriptor that wasn't removed (i.e., Should I post this as an issue over on libzmq instead? |
I think I tracked the problem down to this change in Python 3.2: http://hg.python.org/cpython/rev/c892b0321d23 I believe this is the approximate sequence of events:
I have no idea how to get started fixing this, but it definitely appears to be a Python/pyzmq issue and not a libzmq one. I suspect this "PyThread_exit_thread() if trying to acquire the GIL during shutdown" could cause all kinds of unclean shutdown issues with libzmq threads running. |
which used copy=False by default. It doesn't make sense to use different defaults than Socket.send, and non-copying sends can cause problems on shutdown without manually terminating the context in Python 3. ref: zeromq#173
Thanks for tracking that down! Your analysis is correct that the free-function is called from the IO thread, and that's causing the problem. This is unavoidable[1] for non-copying sends, but fortunately it only affects non-copying sends, and for that matter, only affects non-copying sends that will be discarded at termination due to LINGER and failure to cleanup the context in your own code. Understanding that, here's a simpler repro script: import zmq
ctx = zmq.Context.instance()
socket = ctx.socket(zmq.REQ)
socket.linger = 0
socket.connect('tcp://127.0.0.1:12345') # nothing listening
socket.send(b'abc', copy=False)
# explicit close&term prior to Python exit eliminates the issue
# ctx.destroy()
print("begin cleanup")
# if destroy was not called, this will crash in Python3 ZMQStream does non-copying sends by default for some reason, which is a bad choice, so I am going to revert that in PR #172. For now, the answer is going to be: you must use explicit cleanup if you are doing non-copying sends and might discard them at shutdown due to LINGER. In your sample script, that amounts to a simple I'm not sure this is a huge deal, because it only affects already exiting processes, but there have been other places where grabbing the GIL from the io_thread has caused weirdness. [1] The only solution I have in my head is for the free_fn to actually be called from yet another thread that pyzmq creates, and the io_thread's callback somehow GIL-lessly puts the necessary info in a queue and notifies (or spawns) the thread, which grabs the GIL and deletes the object. That should avoid this problem, because it doesn't really matter that the free_fn is aborted at shutdown, it only matters that the io_thread is killed, which will not happen if it never grabs the GIL. But that means I would have to look up how to spawn and communicate with a C-thread in a platform-independent way, which isn't going to happen soon. |
Thanks for the explanation. I'm surprised that the stream.close() in the original example doesn't take care of the problem. If stream.close() executes, why is there still a message hanging around to be cleaned up at ctx.term()? |
That's because close is an asynchronous event. It starts the LINGER countdown, but does not ensure that the messages are cleaned up, because that happens in the io_threads. The message will be discarded very soon after the socket is closed, but not before |
closed by #408 |
The following code works fine (i.e., exits cleanly) under Python 2, but under Python 3.2 with pyzmq-2.1.11 prints:
It's possible I'm doing something wrong that just happens to work under Python 2, but I thought this might be a bug in pyzmq. Could someone with a little more experience take a look?
The text was updated successfully, but these errors were encountered: