-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEALER-REP hangs forever #29
Comments
I've noticed that if I add
Then I get next message from socket after script gets locked. Or my theory can be completely wrong :) |
use DEALER DEALER. REQ/REP sockets are basic types generally discouraged for real applications. My guess is the REP socket is getting out of sync, and since it HAS to follow recv, send, recv, send order it gets stuck at that point. Using DEALER DEALER worked without issue. |
REP socket on server is following recv, send, recv states correctly. If lock happens I can see that it produced 4 more messages that are received in client. It's client that never gets IO callback despite those messages being available on socket later. Besides - DEALER-REP is valid pattern, should never get "out of sync" on REP side (and in my case should not exceed HWM either because of throttling). I can not imagine how is it possible to achieve such desync because REP has its own buffer. I'll try DEALER-DEALER, however I do not want event loop in server - that means I'll receive task in AE callback and that exponentially complicates task execution flow because I cannot use |
Ah, also in DEALER-DEALER there can be only one peer, that's another reason why I chose DEALER-REP. |
OK, here is where stuff gets interesting... I disabled throttling So now I'm really confused. Why the code works when there is one or bunch messages published on socket at the same time but locks when there are few? I've tested it on 1_000_000 messages. |
@bbkr a bit slammed with work at the moment, but I'll take a deeper look at this just as soon as I have a chance. I've certainly run into weird behavior using event loops + zeromq's virtual fd in the past. Usually this is down to not handling zeromq's edge triggered semantics in exactly the right way. Is it possible this is the issue? If you aren't familiar with edge triggered vs level triggered behavior this article seems like a nice overview of the issues: |
So basically in edge triggered model I must consume all messages to get next "IO is readable" callback. So the scenario that leads to lock:
(it may also happen after consuming few messages while in callback) That means many examples linked in ZMQ guide and even this library PUSH/PULL synopsis are prone to this error. I have no idea how to fix it in a code that needs AnyEvent loop. The obvious hack is to give up on IO monitors and use timers, but that is very inefficient. |
@bbkr I haven't forgotten about this, just been extremely busy... I should be able to look at this in the next few weeks though. |
I was unable to reproduce the hang using the example client/server in your initial comment, either this was an issue with older versions or it is specific to your local system. I let it run for several minutes, sending/receiving tens of thousands of messages without issue.
|
Hi
I have DEALER to REP flow. Client has throttling (no more than 4 async requests at the same time) and monitors socket for asynchronous replies. Server is synchronous.
Client script gets stuck at random point. For example client produced 27 requests, server processed 27 requests but only 23 were received by client and throttling kicks in locking script forever. I've debugged this and IO callback on socket is not called anymore after receiving 23rd message.
Environment:
Server:
Client
I'm new to 0MQ so please forgive me if this is not a bug in ZMQ::FFI but bad logic in my code.
The text was updated successfully, but these errors were encountered: