Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XrdCl dead-locks during shut-down in v4.11 #1095

Closed
apeters1971 opened this issue Dec 3, 2019 · 2 comments
Closed

XrdCl dead-locks during shut-down in v4.11 #1095

apeters1971 opened this issue Dec 3, 2019 · 2 comments

Comments

@apeters1971
Copy link
Contributor

Symptom is, that a process gets stuck during shutdown of the application which creates a severe problem on our automounted FUSE mounts:

Thread 2 (Thread 0x7fd24c7ff700 (LWP 25552)):
#0 0x00007fd26260850d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd262603e5b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007fd262603d28 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd266272b1b in Lock (this=0x7fd24fcea230) at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysPthread.hh:220
#4 XrdSys::IOEvents::Poller::Detach (this=0x7fd24fcea1e0, cP=0x7fd248cd7b00, isLocked=@0x7fd24c7fe47f: true,
keep=) at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:809
#5 0x00007fd266272caf in XrdSys::IOEvents::Channel::Delete (this=0x7fd248cd7b00)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:300
#6 0x00007fd26653a2f9 in XrdCl::PollerBuiltIn::RemoveSocket (this=0x7fd24fc9a740, socket=0x7fd2491e5c00)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClPollerBuiltIn.cc:340
#7 0x00007fd2665a5e4b in XrdCl::AsyncSocketHandler::Close (this=0x7fd248d28580)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClAsyncSocketHandler.cc:197
#8 0x00007fd26653f7cf in XrdCl::Stream::ForceError (this=0x7fd249482a00, status=...)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClStream.cc:863
#9 0x00007fd26653d809 in XrdCl::Channel::ForceDisconnect (this=0x7fd249482400)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClChannel.cc:362
#10 0x00007fd26653b6f7 in XrdCl::PostMaster::ForceDisconnect (this=0x7fd24fc39200, url=...)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClPostMaster.cc:292
#11 0x00007fd266540b1c in XrdCl::Stream::OnReadTimeout (this=, substream=,
isBroken=@0x7fd24c7fe6af: false) at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClStream.cc:1013
#12 0x00007fd2665a67cf in XrdCl::AsyncSocketHandler::OnReadTimeout (this=this@entry=0x7fd248d28580)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClAsyncSocketHandler.cc:932
#13 0x00007fd2665a7c09 in XrdCl::AsyncSocketHandler::Event (this=0x7fd248d28580, type=)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClAsyncSocketHandler.cc:243
#14 0x00007fd266539327 in (anonymous namespace)::SocketCallBack::Event (this=0x7fd248c11640, chP=,
cbArg=, evFlags=) at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClPollerBuiltIn.cc:82
#15 0x00007fd26627462d in XrdSys::IOEvents::Poller::CbkXeq (this=this@entry=0x7fd24fcea1e0,
cP=cP@entry=0x7fd248cd7b00, events=events@entry=2, eNum=eNum@entry=0, eTxt=eTxt@entry=0x0)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:693
#16 0x00007fd266274946 in XrdSys::IOEvents::Poller::CbkTMO (this=0x7fd24fcea1e0)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:598
#17 0x00007fd266275998 in XrdSys::IOEvents::PollE::Begin (this=0x7fd24fcea1e0, syncsem=,
retcode=, eTxt=)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEventsPollE.icc:217
#18 0x00007fd26627237d in XrdSys::IOEvents::BootStrap::Start (parg=0x7fd250bfd830)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:131
#19 0x00007fd26627a9f7 in XrdSysThread_Xeq (myargs=0x7fd24fc3e6c0)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysPthread.cc:86
#20 0x00007fd262601e65 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fd26232a88d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fd266e3fe00 (LWP 25506)):
#0 0x00007fd262607afb in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007fd262607b8f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007fd262607c2b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x00007fd266273932 in Wait (this=) at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysPthread.hh:419
#4 XrdSys::IOEvents::Poller::SendCmd (this=this@entry=0x7fd24fcea1e0, cmd=...)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:974
#5 0x00007fd2662749c0 in XrdSys::IOEvents::Poller::Stop (this=this@entry=0x7fd24fcea1e0)
at /usr/src/debug/xrootd-4.11.0/src/XrdSys/XrdSysIOEvents.cc:1021
#6 0x00007fd266539727 in XrdCl::PollerBuiltIn::Stop (this=0x7fd24fc9a740)
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClPollerBuiltIn.cc:221
#7 0x00007fd2665255fa in XrdCl::DefaultEnv::Finalize ()
at /usr/src/debug/xrootd-4.11.0/src/XrdCl/XrdClDefaultEnv.cc:705
#8 0x00007fd26226600a in __cxa_finalize () from /lib64/libc.so.6
#9 0x00007fd266518e13 in __do_global_dtors_aux () from /opt/eos/xrootd/lib64/libXrdCl.so.2
#10 0x00007ffcfe6cd480 in ?? ()
#11 0x00007fd266c5703a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: frame did not save the PC
(gdb)

@simonmichal
Copy link
Contributor

simonmichal commented Dec 3, 2019

@apeters1971 : thanks for reporting the problem!

From what I see we are running here in to following deadlock scenario:

  • IOEvents::Poller::Stop() locks adMutex, subsequently it does SendCmd, which in turn waits on a semaphore that has been sent along a pipe in SendCmd

  • IOEvents::Poller::Detach() is waiting on adMutex and as a result IOEvents::PollE::Process() cannot proceed and post the semaphore

@abh : can we unlock the adMutex while waiting on the semaphore or replace the whole construct with a conditional variable?

@simonmichal
Copy link
Contributor

simonmichal commented Dec 4, 2019

@abh : I just create a PR with a fix, could you please review: #1096

simonmichal added a commit that referenced this issue Dec 9, 2019
[XrdSys] Avoid deadlock on poller stop, fixes #1095
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants