Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XrdCl] Avoid race condition in AsyncSocketHander on use of reader/writer objects after link is re-enabled #1722

Merged
merged 1 commit into from
Jun 20, 2022

Conversation

smithdh
Copy link
Contributor

@smithdh smithdh commented Jun 20, 2022

Hi,

This patch is intended to fix a crash seen on EOS MGMs. The crash is consistent with the race between reset() and use of the unique_ptr rspreader and reqwriter. A trace obtained indicated a reset was happening in AsyncSocketHandler::OnReadTimeout() while the unique_ptr was being sued concurrently in AsyncMsgWriter::Write().

The race is understood to be started in AsyncSocketHandler::OnReadTimeout(), triggered by the pStream->OnReadTimeout call (XrdClAsyncSocketHandler.cc:698). On this timeout condition the XrdClStream may first Close() the AsyncSocketHandler. Close() calls pPoller->RemoveSocket(), which eventually calls XrdSys::IOEvents::Channel::Delete(), which will remove the socket from the IOEvents poller with XrdSys::IOEvents::Poller::Detach().

The above calls are done under the OnReadTimeout(), inside a callback from within the poller's event loop. The Detach correctly handles the situation where it is called from the poller's event loop, without blocking.

A later call by XrdClStream (in the same thread) reenables the link with EnableLink() in Stream::OnError(), which will call AsyncSocketHandler::Connect(). This will then re-add the socket to a poller by calling PollerBuiltIn::AddSocket(). AddSocket will assign the socket's XrdSys::IOEvents::Channel to a XrdSys::IOEvents::Poller from its poller pool. SInce EOS uses XRD_PARALLELEVTLOOP=4 the new poller may not be the same as the original (assuming the closed stream was the only one registered for the channelId). The AddSocket() is still being called from within the original poller's event loop, but if it is re-added to a different poller the AsyncSocketHandler event callbacks can now be called concurrently with the ongoing execution of the original

@amadio
Copy link
Member

amadio commented Jun 20, 2022

@smithdh I'd consider adding the explanation above to the commit message, it's very good. Cheers,

@simonmichal
Copy link
Contributor

@smithdh : thank you very much for the PR, it's superb :-) !!!

@simonmichal simonmichal merged commit 225c593 into xrootd:master Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants