-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XrdTpc curl nss/sssd strange interaction part 1 #1428
Comments
Talking with @esindril - Current working theory is that the remote server hangs up the TCP connection as the client is unresponsive and |
I think anything above 10 sec triggers this. For example:
|
I think this has been solved, hasn't it? Can we close the issue? |
@simonmichal are you sure ? How ? From which version ? |
The current 5.2 version converts a directory based certs to a single file based cert which play mush nicer with curl. So, I would suggest going to 5.2 to avoid all the problems in this ticket. Mind you, there is still a problem depending on what the filesystem you choose to hold the certs and, according to some people, the centos release being used. That problem has been corrected in an upcoming patch or feature release. No matter, I suspect going to 5.2 will solve the problem. |
These issues have been resolved. |
While running XrdTpc in production in EOSLHCB we've noticed some strange behavior during HTTP TPC transfers. We are running the following configuration, as an example:
So this is the latest CentOS7 with all the packages up to date.
We see that sometimes, curl fails to connect throwing the following error:
This is not something 100% reproducible, sometimes it works, sometimes it doesn't. I suspect it is related to the load on the machine and the pressure on the nss subsystem.
The exact same command ran a few seconds later works as expected:
This shows up the FTS logs as the following error:
and in the XRootD logs as the following:
I read a bit about this type of error coming from NSS but there seems to be no good solution for this. Most of the time an upgrade of both curl and sssd are suggest but also this is not a bullet proof solution, apparently. I tried various things like clearing the sssd cache, restarting the daemon but nothing seems to help.
Has anyone been confronted with this? This is starting to affect 5-10% of the transfers on busy disk nodes in our instances.
Thanks,
Elvin
The text was updated successfully, but these errors were encountered: