-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client fails to clean up session after a authentication failure #1033
Comments
OK, I now see what the issue is. Here is the scenario:
We can see that this logic has many undesirable effect among which the default retry window is far too long. Essentially, all requests that started with a bad certificate during login are prohibited by the user who got that error for 30 minutes., even if this was a transient error. OK, we don't know how transient it really is. While this may be OK for a regular client it wreaks havoc with a proxy server which essentially unusable for a very long time if a) a certificate is not renewed on time, or b) a transient authentication error occurs. I suppose the expedient thing here is to reduce the time window but that really is not a good solution in the long run. I suppose I'm looking for alternative suggestions here. |
Regular clients attached to a tty are given the possibility to renew the proxy. In this case that's not possible. Perhaps the classification as 'fatal error' is excessive in this case? What other options do we have? |
@abh3 : the length of time that needs to elapse after a fatal error before the client may attempt to reconnect to the server is controlled by http://xrootd.org/doc/xrdcl-docs/www/xrdcldocs.html#x1-410004.2.31 The default value of (alternatively, we can make the |
Yes, even making it authentication error aware doesn't really addess the
fundamental issue that any value above 0 allows a client to disable a
proxy server by just causing an error to happen when connecting. So, it
may be that a proxy simply needs to set it to zero.
…On Mon, 5 Aug 2019, simonmichal wrote:
@abh3 : the length of time that needs to elapse after a fatal error before the client may attempt to reconnect to the server is controlled by `XRD_STREAMERRORWINDOW` and not the `XRD_CONNECTIONWINDOW`, have a look at:
http://xrootd.org/doc/xrdcl-docs/www/xrdcldocs.html#x1-410004.2.31
http://xrootd.org/doc/xrdcl-docs/www/xrdcldocs.html#x1-550004.3.3
The default value of `StreamErrorWindow` as you correctly pointed out is 1800 seconds, you just need to tune it.
(alternatively, we can make the `StreamErrorWindow` authentication error aware)
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#1033 (comment)
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
It should be relatively safe to disable it, e.g. the EOS data servers run with the |
I think the solution here is to close the socket when authentication fails as you won't be able to do anything on that stream anyway. |
Hey Michal, any movement on this? |
Got a confirmation from @abh3 that is works, hence I'm closing the ticket. |
We start a proxy server that needs to GSI authenticate to the origin. We
start it with a certificate that is about to expire. We wait after the
expiration time and then do the following:
user 'u7' (as in u7@host) attempts to open a file (or do anything that
requires login/auth).
The client opens a connection and logs in. All well and good.
The client attempts to authenticate but authentication fails. The
specific message is rather opaque/ The log here contains:
[2019-07-02 16:05:59.204401 -0700][Debug ][XRootDTransport ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Logged in, session:
01000000682200000700000001000000
[2019-07-02 16:05:59.204448 -0700][Debug ][XRootDTransport ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Authentication is required:
&P=gsi,v:10400,c:ssl,ca:ba240aa8.0|f5f0dfc2.0
[2019-07-02 16:05:59.204483 -0700][Debug ][XRootDTransport ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Sending authentication data
[2019-07-02 16:05:59.206272 -0700][Debug ][XRootDTransport ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Trying to authenticate using gsi
[2019-07-02 16:05:59.238792 -0700][Debug ][XRootDTransport ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Cannot get credentials for
protocol gsi: Secgsi: ErrParseBuffer: error getting user proxies: kXGS_init
[2019-07-02 16:05:59.238898 -0700][Error ][XRootDTransport ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] No protocols left to try
[2019-07-02 16:05:59.238929 -0700][Error ][AsyncSock ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Socket error while handshaking:
[FATAL] Auth failed
[2019-07-02 16:05:59.238951 -0700][Debug ][AsyncSock ]
[u7@griddev04.slac.stanford.edu:1094 #0.0] Closing the socket
[2019-07-02 16:05:59.238980 -0700][Debug ][Poller ]
<[::ffff:134.79.120.194]:43466><--><[::ffff:134.79.129.117]:1094> Removing
socket from the poller
[2019-07-02 16:05:59.239066 -0700][Error ][PostMaster ]
[u7@griddev04.slac.stanford.edu:1094 #0] elapsed = 0, pConnectionWindow =
120 seconds.
[2019-07-02 16:05:59.239120 -0700][Error ][PostMaster ]
[u7@griddev04.slac.stanford.edu:1094 #0] Unable to recover: [FATAL] Auth
failed.
The client closes the connection and returns auth failed (this has been
verified by looking at the origin server, u7 no longer has a connection).
Since the request came through the Posix layer the auth failed is
translated to "permission denied" as that's the only errno code available.
This will explain why authentication errors may appear as permission errors.
The proxy cert is refreshed. We run the test again using user u20, as
expected, it succeeds. The client creates a socket, does the handshake,
a login, and a successful authentication and proceeds to open the file.
Now, we run the the test again but using user u7 (the one that failed to
authenticate).
The following sequence appears in the log:
190702 16:06:47 1227 yangw.19150:7@atlint01 XrootdProtocol: 0100 open rat
/tmp/README
190702 16:06:47 1227 yangw.19150:7@atlint01 ofs_open: 0-600 fn=/tmp/README
[2019-07-02 16:06:47.251377 -0700][Debug ][File ]
[0x1600@root://u7@griddev04.slac.stanford.edu:1094//tmp/README?oss.lcl=1&pss.tid=yangw.19150:7@atlint01]
Sending an open command
[2019-07-02 16:06:47.251433 -0700][Debug ][ExDbgMsg ]
[u7@griddev04.slac.stanford.edu:1094] MsgHandler created: 0x1e00 (message:
kXR_open (file: /tmp/README?oss.lcl=1&pss.tid=yangw.19150:7@atlint01, mode:
00, flags: kXR_open_read kXR_async kXR_retstat ) ).
[2019-07-02 16:06:47.251463 -0700][Error ][XRootD ]
[u7@griddev04.slac.stanford.edu:1094] Unable to send the message kXR_open
(file: /tmp/README?oss.lcl=1&pss.tid=yangw.19150:7@atlint01, mode: 00,
flags: kXR_open_read kXR_async kXR_retstat ): [FATAL] Auth failed
[2019-07-02 16:06:47.251478 -0700][Debug ][ExDbgMsg ]
[u7@griddev04.slac.stanford.edu:1094] Destroying MsgHandler: 0x1e00.
190702 16:06:47 1227 Posix_Open: [FATAL] Auth failed open
root://u7@griddev04.slac.stanford.edu:1094//tmp/README?pss.tid=yangw.19150:7@atlint01&oss.lcl=1
190702 16:06:47 1227 ofs_open: yangw.19150:7@atlint01 Unable to open
/tmp/README; permission denied
190702 16:06:47 1227 yangw.19150:7@atlint01 XrootdResponse: 0100 sending err
3010: Unable to open /tmp/README; permission denied
Note that the client never tried to open a socket and do the login auth
sequence. I should have since there is no socket for u7. However, it
remembers that the auth failed and refuses to do anything else but return
an authentication error. In essence, user u7 is permanently dead.
The 'u7' situation might be reasonable if an actual physical socket
existed to the server. In that case, one can't force re-authentication
and the only choice is to simply fail or force the user to do a forced
disconnect. However, in this case, the client closed the socket and removed
it from the poller yet it thinks it still has a session. There was no
attempt to connect and login. It immediately fails (note the origin server
never received a connection).
Oddly, this was partly corrected by patch
70569c3#diff-860d6aebc8e74302575c6f5ffa954c68
but that only helps to reload the client creds on a new connection. Prior to
that it would use whatever it did on the previous connection to the server
(which would likely fail). This problem is similar in that the
authentication failed but the client remembers that it did even though it
still needs to subsequently connect and login.
For completeness, here is a successful login/auth after the failed one.
[2019-07-02 16:06:31.320314 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Sending out kXR_login request,
username: u20, cgi:
?xrd.cc=us&xrd.tz=-8&xrd.appname=xrootd&xrd.info=&xrd.hostname=rhel6-64n.slac.stanford.edu&xrd.rn=v20190702-5cdfe18,
dual-stack: true, private IPv4: false, private IPv6: false
[2019-07-02 16:06:31.320970 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Logged in, session:
02000000682200001800000002000000
[2019-07-02 16:06:31.321024 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Authentication is required:
&P=gsi,v:10400,c:ssl,ca:ba240aa8.0|f5f0dfc2.0
[2019-07-02 16:06:31.321061 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Sending authentication data
[2019-07-02 16:06:31.321134 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Trying to authenticate using gsi
[2019-07-02 16:06:31.407977 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Sending more authentication data
for gsi
[2019-07-02 16:06:31.453803 -0700][Debug ][XRootDTransport ]
[u20@griddev04.slac.stanford.edu:1094 #0.0] Authenticated with gsi.
[2019-07-02 16:06:31.453918 -0700][Debug ][PostMaster ]
[u20@griddev04.slac.stanford.edu:1094 #0] Stream 0 connected.
[2019-07-02 16:06:31.454005 -0700][Debug ][ExDbgMsg ]
[u20@griddev04.slac.stanford.edu:1094] Moving MsgHandler: 0x4c004180
(message: kXR_open (file:
/tmp/README?oss.lcl=1&pss.tid=yangw.19139:20@atlint01, mode: 00, flags:
kXR_open_read kXR_async kXR_retstat ) ) from out-queu to in-queue.
[2019-07-02 16:06:31.455264 -0700][Debug ][ExDbgMsg ]
[u20@griddev04.slac.stanford.edu:1094] Calling MsgHandler: 0x4c004180
(message: kXR_open (file:
/tmp/README?oss.lcl=1&pss.tid=yangw.19139:20@atlint01, mode: 00, flags:
kXR_open_read kXR_async kXR_retstat ) ) with status: [SUCCESS]
The text was updated successfully, but these errors were encountered: