Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zmq_send does not timeout when ZAP authentication fails. #882

Closed
hashstat opened this issue Feb 11, 2014 · 13 comments
Closed

zmq_send does not timeout when ZAP authentication fails. #882

hashstat opened this issue Feb 11, 2014 · 13 comments

Comments

@hashstat
Copy link
Contributor

When a server performing ZAP authentication rejects a client with a 400 (or other non-200) error, the client, sending data with zmq_send (and presumably all send functions), reattempts the connection and authentication indefinitely, unless ZMQ_RECONNECT_IVL is set to -1. There is currently no way to limit the number of authentication failures nor is there a way for the client to tell that authentication failed (because zmq_send succeeds and performs the authentication in the background).

It seems that there are three stages to the connection: socket connection, ZMQ handshake, and ZAP handshake. A failure in any three of these should probably be considered a connection failure. Does it make sense to treat them differently? If ZMQ_IMMEDIATE is set, it seems that zmq_send (and others) should block until all three steps are performed successfully. After looking at the code, it is not immediately apparent to me how to implement any of this. Any thoughts, suggestions, etc.?

@drewcrawford
Copy link
Contributor

For the record, I spent three (3!) days tracking down a bug that was obscured by the behavior here--e.g., nothing user-visible happens when authentication fails (keeps retrying).

This can also occur when the client has the wrong key for the server. In that case the behavior is especially bad since no ZAP authentication ever occurs, the client is never rejected, and it is not at all obvious why the connection is not succeeding. You pretty much have to resort to Wireshark to figure out why it is failing in that case.

In my opinion, the right thing to do is to report an error at zmq_connect. I realize that zmq_connect is not in the habit of reporting TCP errors to the application (for example, it seems to be by-design behavior that servers can be brought up after clients), however, an authentication error is likely a permanent failure, and it is not going to resolve itself without deliberate human intervention. And the human will not know to intervene if no error is reported anywhere.

However, the behavior I am advocating is contrary to the current tests and so I suspect may be controversial. I am probably biased by the time I spent investigating this, but in my view the benefits far outweigh the change in behavior.

@hintjens
Copy link
Member

On Sun, Feb 16, 2014 at 1:19 PM, Drew Crawford notifications@github.com wrote:

For the record, I spent three (3!) days tracking down a bug that was obscured by the behavior here--e.g., nothing user-visible happens when authentication fails (keeps retrying).

Sorry about that, it's really not good. Let's make sure it doesn't hit
more people.

This can also occur when the client has the wrong key for the server.

Indeed. What about, in the short term, explicit stderr output in
libzmq when such things happen on CURVE authentication? It's not a
nice long term solution but will help as we work through the different
cases.

In my opinion, the right thing to do is to report an error at zmq_connect.

Right. At least in the connector, when it gets cut off prematurely.

I'll make a patch for some error reporting at both sides that we can
hide away later as needed.

Really sorry for the lost time... :-/

-Pieter

@hintjens
Copy link
Member

I've added temporary tracing to the server side, which should be
enough. We'll fix the reconnect behavior itself separately.

#898

On Sun, Feb 16, 2014 at 5:58 PM, Pieter Hintjens ph@imatix.com wrote:

On Sun, Feb 16, 2014 at 1:19 PM, Drew Crawford notifications@github.com wrote:

For the record, I spent three (3!) days tracking down a bug that was obscured by the behavior here--e.g., nothing user-visible happens when authentication fails (keeps retrying).

Sorry about that, it's really not good. Let's make sure it doesn't hit
more people.

This can also occur when the client has the wrong key for the server.

Indeed. What about, in the short term, explicit stderr output in
libzmq when such things happen on CURVE authentication? It's not a
nice long term solution but will help as we work through the different
cases.

In my opinion, the right thing to do is to report an error at zmq_connect.

Right. At least in the connector, when it gets cut off prematurely.

I'll make a patch for some error reporting at both sides that we can
hide away later as needed.

Really sorry for the lost time... :-/

-Pieter

@drewcrawford
Copy link
Contributor

Im not attacking you personally Peter! No need to apologize.. :-). You can't really complain about the quality of free software. All I meant to say was "this is a major usability problem".

Logging something cf. your patch is a big step in the right direction, some indication is infinitely better than no indication. So this patch would have taken me from three days to about 1.

However I can make a wish list of things that would make this better:

  • adding output on the client side. As I understand it, your current patch only logs on the server-side. (I may be misreading it). In my situation, the client-side was a more obvious place to look.
  • long-term there should be some way to catch and handle this case. If you think about the use case, an authentication failure is evidence that the user has missed type their password or similar (picked the wrong keyfile... picked the wrong server identity from a list...) The right thing for application to do here is to have the user retype the password or otherwise reconfigure the authentication options.

@hintjens
Copy link
Member

Well, I did design this, so the fault is mine. Let's see how we can make
this a totally painless thing. Next step is to log in the client, to cover
at least the main errors like "you set a bad server key".

I'm not sure where in the client to log, which is why I didn't do that. In
fact we have to implement an error response to do this properly. For that I
need Martin Hurton's help.

I agree about returning the error up to the calling application.

@Asmod4n
Copy link
Contributor

Asmod4n commented Mar 10, 2014

Hitting the same problem, the server never stops asking the authentication handler when the handler answers with a non 200 reply, can't release my software that way :(

This happens with all authentication methods.

https://github.com/Asmod4n/celluloid-zmq-zap/blob/master/examples/curve_redis.rb

comment out line 14 and it will ask the redis server endlessly

@Asmod4n
Copy link
Contributor

Asmod4n commented Mar 10, 2014

@hintjens yeah, this needs a new error code, something like "i don't like you, don't come back again" so the requesting socket doesn't attempt to connect again.

@hintjens
Copy link
Member

I'm closing this issue to make a more explicit one.

@davidblewett
Copy link

@hintjens what ticket were you referring to? #883? If so, that ticket doesn't appear to reference some kind of feedback to the client that an auth error occurred.

Currently, you can set ZMQ_RECONNECT_IVL to force the socket to not try to reconnect on authentication failure, but the client code is never made aware of any potential issues.

@hintjens
Copy link
Member

This issue was fixed in ZeroMQ 4.1.0

@johnnykv
Copy link

johnnykv commented Jun 27, 2016

@hintjens How was this fixed in 4.1.0? Was it only the reconnect strategy that was fixed, or is there also implemented visibility (logging) when auth fails (bad key, invalid key, etc)

@hintjens
Copy link
Member

From memory, there is still debugging output whenever there's an
exceptional condition during authentication (at the libzmq side). We also
catch and deal with the timeout in libzmq, as far as I remember.
On 27 Jun 2016 10:19 pm, "Johnny Vestergaard" notifications@github.com
wrote:

@hintjens https://github.com/hintjens How was this fixed in 4.1.0? Was it
only the reconnect strategy what was fixed, or is there also implemented
visibility (logging) when auth fails (bad key, invalid key, etc)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#882 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AASzCA_SSy_gYgeRceONPCZ7QMWszndCks5qQDA9gaJpZM4BhI1G
.

@sigiesec
Copy link
Member

sigiesec commented Aug 23, 2017

@johnnykv Monitoring events for encryption and authentication errors were recently added by #2645 and #2665 and subsequent PRs.

Regarding the reconnect behavior: the client is sent an ERROR for ZAP 400 errors. The CURVEZMQ spec demands that the client does not reconnect in case it received an ERROR. However, the current implementation does not warrant this, to my knowledge. This is requested by #990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants