-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XrdHttp loses requests under modest concurrency #810
Comments
I should mention the workaround to get
Not a complete fix to #809, but good enough to enable testing. |
The https://github.com/xrootd/xrootd/blob/master/src/XrdOfs/XrdOfsHandle.cc#L203 It appears there's a modest contention on the file descriptor table -- one that does not particularly play well with what appears to be an ad-hoc implementation of a timed lock: https://github.com/xrootd/xrootd/blob/master/src/XrdOfs/XrdOfsHandle.cc#L504 It's not obvious why one would utilize that instead of a wrapper around |
Well, because at the time the code was written (oh about 18 years ago) pthread_timed_lock didn’t exist in Solaris.
From: Brian Bockelman
Sent: Tuesday, August 28, 2018 6:43 PM
To: xrootd/xrootd
Cc: Subscribed
Subject: Re: [xrootd/xrootd] XrdHttp loses requests under modest concurrency (#810)
The ofs_Stall is not coming from the OSS but from here:
https://github.com/xrootd/xrootd/blob/master/src/XrdOfs/XrdOfsHandle.cc#L203
It appears there's a modest contention on the file descriptor table -- one that does not particularly play well with what appears to be an ad-hoc implementation of a timed lock:
https://github.com/xrootd/xrootd/blob/master/src/XrdOfs/XrdOfsHandle.cc#L504
It's not obvious why one would utilize that instead of a wrapper around pthread_mutex_timedlock; it appears the hand-rolled version has similar guarantees as the standard function but worse performance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok, found the culprit. It's here: https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdTransit.cc#L418 If a stall occurs (you can tweak the However, it doesn't appear to invoke the appropriate callback for |
After applying a workaround for issue #809, I notice that
ab
will quite consistently fail with only few number of repeated requests:This is a fairly vanilla setup - should be reproducible with the HTTP module on top of a POSIX filesystem.
I haven't been able to diagnose it precisely, but it seems to only occur when this line pops into the log:
I admit, I don't understand why the file would ever be considered as being staged with the default OFS plugin. However, the behavior is very much as if there's a callback not occurring.
NOTE this issue is based on an investigation into user complaints about the service. It may be a synthetic benchmark but it shows something actually observed in the wild.
The text was updated successfully, but these errors were encountered: