try piggy-backing on tornado for proactor loop support #1524

minrk · 2021-05-09T11:07:53Z

Tornado 6.1 enables support for proactor by running a separate selector loop in a thread

~~try piggy-backing on that functionality by using tornado's AddThreadEventLoop when someone attempts to use zmq.asyncio with proactor~~

Went with vendoring SelectorThread from tornadoweb/tornado#3029 so no dependency is added.

closes #1521
closes #1423

use vendored copy of tornado's AddThread as a separate SelectorThread object

try to avoid leaking loop closers

Jeducious · 2021-08-27T10:48:24Z

@minrk Quick question, when using asyncio on windows, I get a warning when using the Proactor event loop, I have tornado 6.1 installed. Is this expected?

RuntimeWarning: Proactor event loop does not implement add_reader family of methods required for zmq. Registering an additional selector thread for add_reader support via tornado. Use 'asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())' to avoid this warning

I am having a persistent issue on windows where some tasks seem to suddenly stop receiving messages. I am implementing the MDP protocol in python, and I have automated tests that create multiple workers as asyncio tasks to simulate a busy server.

As the first few of the workers complete, the rest of the workers suddenly stop reporting heartbeats and the test hangs forever.

I imagine this is something I have done, painting myself into a corner somehow. But it would be great to know if any of this sounds suspect ;)

minrk · 2021-08-29T17:37:47Z

You can try calling asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy()) before invoking any asyncio methods to see if that helps. That would indicate this change is relevant.

Jeducious · 2021-08-29T22:23:56Z

Thanks! I did try that, results were that the warning does go away at least. Still having issues where the test will progress for a while but then hangs eventually. I can't 100% say for sure the its due to this though, I am also having problems on linux too, so for the moment, I can't really prove that the hang is due to the event loop policy.

If that changes I'll report back.

minrk · 2021-08-30T08:17:22Z

If changing the policy still hangs, then I think it's probably not that, and something else, possibly related to edge-triggering issues. These things can be hard to track down!

Jeducious · 2021-08-30T08:58:00Z

Indeed! I am digging, the problem is that it's difficult to reproduce reliably. There are other things in here than pyzmq, for example the python logging module. I am currently removing all logging to check that this is not a factor.

So I'm proceeding to eliminate things by removing them where I can. Will let you know if anything points back at ZMQ.

Jeducious · 2021-08-30T11:13:07Z

@minrk

Ok, have a question I am seeing an error on linux now. This error suggests I am exhausting the file descriptor quota. I had a look at the offending process, looks like it is accumulating fd's alright, but I wondered if you'd be able to tell me if this looks like something the asyncio pyzmq sockets might use? The majority of the fd's in use are of the type eventfd.

Man page on that is here

It basically says that these are used an event wait/notify mechanism by user-space applications, so I am guessing that this either

Asyncio tasks doing this
Pyzmq sockets... maybe?
Something else entirely that I am missing (catchall, had to throw that in there to cover my ignorance).

It seems like they are not being released, but can't confirm. The process hung, so they might have been if it had closed :)

python3 33202 ubuntu   60u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   61u  a_inode               0,14        0  10299 [eventpoll]
python3 33202 ubuntu   62u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   63u  a_inode               0,14        0  10299 [eventpoll]
python3 33202 ubuntu   64u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   65u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   66u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   67u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   68u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   69u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   70u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   71u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   72u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   73u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   74u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   75u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   76u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   77u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   78u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   79u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   80u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   81u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   82u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   83u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   84u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   85u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   86u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   87u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   88u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   89u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   90u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   91u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   92u  a_inode               0,14        0  10299 [eventfd]
python3 33202 ubuntu   93u  a_inode               0,14        0  10299 [eventfd]

minrk · 2021-08-31T06:33:00Z

Certainly possible, but I can't be sure. I don't know exactly what operations create these.

You might check asyncio.all_tasks() to see all the asyncio tasks you have running.

It's conceivable you have launched some task/future that you lose track of without awaiting or cancelling. This could be due to your code, or even a pyzmq bug.

Jeducious · 2021-08-31T09:10:22Z

Ok, I think I am getting close (though honestly concurrent programming can certainly prove me wrong it seems)

I have several workers which each run as an asyncio task. Each worker has a zmq.DEALER socket, plus, I create a monitor socket for each of the dealers using get_monitor_socket.

During a shutdown I call cancel on each worker task, this triggers a shutdown handler which calls the disable_monitor() method for the dealer socket. This is where the loop hangs.

It seems a little bit random as sometimes a few workers will all be able to cleanly shutdown, but then one will hang the loop on the call to disable_monitor.

I gets the feels that I may have abused disable_monitor, or sockets, or both here.

Is there a right way to cleanup a socket and its monitor socket? I am willing to bet, when multiple sockets with monitors attached are concerned, I am probably not doing it right.

Jeducious · 2021-08-31T09:30:48Z

OK, so, no need to wait, I decided to simply comment out the line that called disable_monitor and 'give it a ripper of a go' so to speak.

Now, the loop no longer hangs, in fact the entire test suite seems to be passing consistently now. So it seems calling disable_monitor was the wrong thing to do? Just don't know why.

Should I:-

Just leave the disable_monitor commented out and live on in blissful ignorance now it apparently works
call close on the monitor socket instead of disable?
something else?

minrk · 2021-08-31T12:16:36Z

If disable_monitor causes a hang, this suggests to me that there is a LINGER or ordering issue - that perhaps there are some messages not yet consumed by the monitor socket receiver, and the sender is blocking waiting for messages to be delivered.

That's a bit of a guess, though.

From this discussion you need to call disable before close on the monitor socket (disable closes the socket that bound, which is handled internally by libzmq, while you need ot manage closing the socket that connects to listen for monitor messages)

Jeducious · 2021-08-31T12:40:35Z

Thanks :)

I eventually got the test suite to pass on macOS and windows using the following

        self.zmq_socket.disable_monitor()
        self.mon_sock.close(linger=0)
        self.zmq_socket.close(linger=0)

Which matches the discussion you just referred to, which I noticed I've actually been a part of. Seems that came back to bite me by not paying attention to it!

Confirming this now works fine on windows and macOS, and also linux now.

I still have a run away condition of too many open fd's happening on linux. But that's a story for another day I think.

minrk force-pushed the tornado-asyncio branch from a38feed to eed6429 Compare May 9, 2021 11:36

minrk mentioned this pull request May 9, 2021

Asyncio Python 3.8 jupyter/nbclient#85

Open

minrk force-pushed the tornado-asyncio branch 8 times, most recently from a6f6e8c to 49661cc Compare May 10, 2021 09:42

[CI] try to fix apt invalid package situation

0f48654

minrk force-pushed the tornado-asyncio branch from 49661cc to b53e158 Compare May 10, 2021 10:58

minrk added 2 commits May 10, 2021 13:02

vendor SelectorThread from tornado.platform.asyncio

e39a5d9

try copying tornado for proactor loop support

8763017

use vendored copy of tornado's AddThread as a separate SelectorThread object

minrk force-pushed the tornado-asyncio branch from b53e158 to 9ef8e24 Compare May 10, 2021 11:03

hook up selector.close to loop

dadbe7f

try to avoid leaking loop closers

minrk force-pushed the tornado-asyncio branch from 02095fe to dadbe7f Compare May 10, 2021 12:19

minrk merged commit 3faf9e4 into zeromq:main May 13, 2021

minrk deleted the tornado-asyncio branch May 13, 2021 12:45

bdarnell mentioned this pull request Aug 28, 2022

separate SelectorThread into its own object tornadoweb/tornado#3029

Merged

T-256 mentioned this pull request Dec 12, 2023

AsyncSession does not work on Windows platforms. lexiforest/curl_cffi#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try piggy-backing on tornado for proactor loop support #1524

try piggy-backing on tornado for proactor loop support #1524

minrk commented May 9, 2021 •

edited

Loading

Jeducious commented Aug 27, 2021

minrk commented Aug 29, 2021

Jeducious commented Aug 29, 2021

minrk commented Aug 30, 2021

Jeducious commented Aug 30, 2021

Jeducious commented Aug 30, 2021

minrk commented Aug 31, 2021

Jeducious commented Aug 31, 2021

Jeducious commented Aug 31, 2021 •

edited

Loading

minrk commented Aug 31, 2021

Jeducious commented Aug 31, 2021

try piggy-backing on tornado for proactor loop support #1524

try piggy-backing on tornado for proactor loop support #1524

Conversation

minrk commented May 9, 2021 • edited Loading

Jeducious commented Aug 27, 2021

minrk commented Aug 29, 2021

Jeducious commented Aug 29, 2021

minrk commented Aug 30, 2021

Jeducious commented Aug 30, 2021

Jeducious commented Aug 30, 2021

minrk commented Aug 31, 2021

Jeducious commented Aug 31, 2021

Jeducious commented Aug 31, 2021 • edited Loading

minrk commented Aug 31, 2021

Jeducious commented Aug 31, 2021

minrk commented May 9, 2021 •

edited

Loading

Jeducious commented Aug 31, 2021 •

edited

Loading