-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release Stream lock before invoking callbacks. #216 #247
Conversation
OnReadTimeout will call OnError with the Stream's lock held. This causes OnError to invoke the user's callback with the Stream lock held, possibly causing a lock ordering issue. This fixes an observed deadlock within CMSSW over the following objects: Thread 1 FileStateHandler::VectorRead takes FileStateHandler lock Stream::Send tries to take Stream lock Thread 5 FileTimer::Run takes FileTimer lock FileStateHandler::Tick tries to take FileStateHandler Thread 6 Stream::OnReadTimeout takes Stream lock DelayedClose::HandleResponseWithHosts (this is callback code in CMSSW); this deletes a File object, which eventually calls FileTimer::UnRegisterFileObject tries to take FileTimer lock
To avoid these you need to delegate all the callback processing to the worker thread pool. Currently error callbacks are processed in the poller thread. |
Indeed, I am hesitant to apply the patch because it may not fix all Andy On Mon, 6 Jul 2015, Lukasz Janyst wrote:
|
This patch is a no-op really because the first thing OnError does is lock the mutex again. The mutex is recursive because some of these methods may be called from both locked and unlocked contexts. I don't think the proper fix will be much code if done right, but it may be fairly complex. I would have to look at the code to say for sure though. |
@ljanyst - OnError does lock the mutex, but then it unlocks it before calling the handlers. When called from OnReadTimeout, the mutex is already locked by the callee so on that call path, the handlers are called with the Stream mutex held (resulting in deadlock for us). All other call paths I could find drop that particular mutex before invoking handlers. I'd be super-happy to have a more thorough patch, but I'd be sufficiently happy to have the Xrootd client not deadlock the Tier-0. |
I did not look at the code and it looks like my memory of it is fading :( That's definitely a strong point for applying the patch. |
Release Stream lock before invoking callbacks. #216
I will merge the pull request because it fixes the problem in this particular case. It seems obvious to me that the mutex should be unlocked before calling the handlers, that's why it is done so (line 778), but since the mutex is recursive it has no effect in this particular case. Unlocking the mutex in the callee class wont have any side effects because it will anyway go out of scope. I will also create a separate issue because the current implementation is error prone and problem has to be addressed. |
OnReadTimeout will call OnError with the Stream's lock held.
This causes OnError to invoke the user's callback with the
Stream lock held, possibly causing a lock ordering issue.
This fixes an observed deadlock within CMSSW over the following
objects (originally reported by Chris Jones):