Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in poller occasionally not invoking callbacks on OSX #5

Closed
jlsalmon opened this issue Apr 16, 2013 · 12 comments
Closed

Built-in poller occasionally not invoking callbacks on OSX #5

jlsalmon opened this issue Apr 16, 2013 · 12 comments
Assignees
Milestone

Comments

@jlsalmon
Copy link
Contributor

Hi,

When using the built-in poller on OSX, there is an occasional condition whereby a client thread (thread 1) which attempts to connect a socket asynchronously does not get called back. This locks up the client until the connect operation times out.

This doesn't happen with the libevent poller. The frequency of occurrence of the problem can be increased by adding time delays in the client code, so it's more than likely a timing issue (maybe the poller invoking the callback too early?).

Here are the stack traces:

Thread 6 (process 17596):
#0  0x00007fff8d9da112 in sem_wait ()
#1  0x00000001000736c1 in XrdSysSemaphore::Wait (this=0x100512950) at XrdSysPthread.hh:335
#2  0x00000001000da19f in XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x100512a78) at XrdClSyncQueue.hh:66
#3  0x00000001000d9fa3 in XrdCl::JobManager::RunJobs (this=0x100512a60) at XrdClJobManager.cc:133
#4  0x00000001000d9cbd in RunRunnerThread (arg=0x100512a60) at XrdClJobManager.cc:33
#5  0x00007fff913497a2 in _pthread_start ()
#6  0x00007fff913361e1 in thread_start ()

Thread 5 (process 17596):
#0  0x00007fff8d9da112 in sem_wait ()
#1  0x00000001000736c1 in XrdSysSemaphore::Wait (this=0x100512950) at XrdSysPthread.hh:335
#2  0x00000001000da19f in XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x100512a78) at XrdClSyncQueue.hh:66
#3  0x00000001000d9fa3 in XrdCl::JobManager::RunJobs (this=0x100512a60) at XrdClJobManager.cc:133
#4  0x00000001000d9cbd in RunRunnerThread (arg=0x100512a60) at XrdClJobManager.cc:33
#5  0x00007fff913497a2 in _pthread_start ()
#6  0x00007fff913361e1 in thread_start ()

Thread 4 (process 17596):
#0  0x00007fff8d9da112 in sem_wait ()
#1  0x00000001000736c1 in XrdSysSemaphore::Wait (this=0x100512950) at XrdSysPthread.hh:335
#2  0x00000001000da19f in XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x100512a78) at XrdClSyncQueue.hh:66
#3  0x00000001000d9fa3 in XrdCl::JobManager::RunJobs (this=0x100512a60) at XrdClJobManager.cc:133
#4  0x00000001000d9cbd in RunRunnerThread (arg=0x100512a60) at XrdClJobManager.cc:33
#5  0x00007fff913497a2 in _pthread_start ()
#6  0x00007fff913361e1 in thread_start ()

Thread 3 (process 17596):
#0  0x00007fff8d9d9386 in __semwait_signal ()
#1  0x00007fff913d3800 in nanosleep ()
#2  0x00007fff913d368a in sleep ()
#3  0x000000010008c33f in XrdCl::TaskManager::RunTasks (this=0x100512980) at XrdClTaskManager.cc:238
#4  0x000000010008b49d in RunRunnerThread (arg=0x100512980) at XrdClTaskManager.cc:36
#5  0x00007fff913497a2 in _pthread_start ()
#6  0x00007fff913361e1 in thread_start ()

Thread 2 (process 17596):
#0  0x00007fff8d9d9f96 in poll ()
#1  0x00000001002e8be7 in XrdSys::IOEvents::PollPoll::Begin (this=0x100512d20, syncsem=0x7fff5fbfec98, retcode=@0x7fff5fbfec90, eTxt=0x7fff5fbfec88) at XrdSysIOEventsPollPoll.icc:191
#2  0x00000001002e6181 in XrdSys::IOEvents::BootStrap::Start (parg=0x7fff5fbfec80) at /Users/jussy/repos/xrootd/src/XrdSys/XrdSysIOEvents.cc:110
#3  0x00000001002e3f55 in XrdSysThread_Xeq (myargs=0x100512e60) at /Users/jussy/repos/xrootd/src/XrdSys/XrdSysPthread.cc:86
#4  0x00007fff913497a2 in _pthread_start ()
#5  0x00007fff913361e1 in thread_start ()

Thread 1 (process 17596):
#0  0x00007fff8d9da112 in sem_wait ()
#1  0x00000001000736c1 in XrdSysSemaphore::Wait (this=0x100511640) at XrdSysPthread.hh:335
#2  0x00000001000993f9 in XrdCl::SyncResponseHandler::WaitForResponse (this=0x7fff5fbff3c0) at XrdClMessageUtils.hh:85
#3  0x0000000100097eda in XrdCl::MessageUtils::WaitForResponse<XrdCl::StatInfo> (handler=0x7fff5fbff3c0, response=@0x7fff5fbff5d0) at XrdClMessageUtils.hh:135
#4  0x000000010009363b in XrdCl::FileSystem::Stat (this=0x100512530, path=@0x7fff5fbff600, response=@0x7fff5fbff5d0, timeout=0) at XrdClFileSystem.cc:709
#5  0x0000000100008344 in DoStat (fs=0x100512530, env=0x100512380, args=@0x7fff5fbff7e0) at XrdClFS.cc:740
#6  0x0000000100010ae7 in XrdCl::FSExecutor::Execute (this=0x1005124f0, commandline=@0x7fff5fbff930) at XrdClFSExecutor.cc:100
#7  0x000000010000ad0a in ExecuteCommand (ex=0x1005124f0, commandline=@0x7fff5fbff930) at XrdClFS.cc:1023
#8  0x000000010000b690 in ExecuteCommand (url=@0x7fff5fbff9c8, argc=2, argv=0x7fff5fbffad8) at XrdClFS.cc:1143
#9  0x000000010000ba0c in main (argc=4, argv=0x7fff5fbffac8) at XrdClFS.cc:1179

Cheers,
Justin

@ljanyst
Copy link
Contributor

ljanyst commented Apr 16, 2013

The async socket handler in the client is issuing a connect(2) on a non-blocking socket via XrdNetConnect::Connect (which returns EINPROGGRESS) and subscribes to write notifications for the socket in order to get notified when connect returns. This is done in accordance with the manual, which says:

The socket is nonblocking and the connection cannot be completed immediately. It is possible to
select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability,
use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether
connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the
usual error codes listed here, explaining the reason for the failure).

The write notification arrives erratically when using the built-in poller on MacOSX, we haven't observed this issue using the built-in poller on Linux, nor the libevent poller on any platform.

@ghost ghost assigned abh3 Apr 16, 2013
@jlsalmon
Copy link
Contributor Author

To reproduce on OSX 10.8.3 with a clean xrootd build from master, run any client command such as xrdcopy or xrdfs with any arguments. It happens 95% of the time.

@jlsalmon
Copy link
Contributor Author

After some more debugging, I have some more information:

In XrdSys::IOEvents::Poller::CbkXeq(), a check to see whether the "effective poller" (chPoller) is equal to the "real poller" (chPollXQ) fails:

// Verify that there is a callback here and the channel is ready. If not,
// disable this channel for the events being refelcted unless the event is a
// fatal error. In this case we need to abandon the channel since error events
// may continue to be generated as we can't always disable them.
//
   if (!(cP->chCB) || cP->chPoller != cP->chPollXQ)
      {if (eNum)
          {cP->chPoller = &pollErr1; cP->chFault = eNum;
           cP->inPSet   = 0;
           return false;
          }
       oldEvents = cP->chEvents;
       cP->chEvents = 0;
       Modify(cP, eNum, 0, isLocked);
       if (!isLocked) cP->chMutex.Lock();
       cP->chEvents = oldEvents;
       return true;
      }

This obviously then returns without invoking the callback. Note that eNum is not set in this case, so the poller is not set to the error object.

I'm not sure why this happens - I'll keep digging.

Edit:

The effective poller is still set to a PollerInit object:

(gdb) p &pollWait
$23 = ('XrdSys::IOEvents::PollerInit' *) 0x100366a88
(gdb) p cP->chPoller
$24 = ('XrdSys::IOEvents::Poller' *) 0x100366a88
(gdb) 

@jlsalmon
Copy link
Contributor Author

So it looks like there is a race condition in the poller initialisation. I am working on localhost, so what is happening is that the first handshake request comes back before the poller can initialise itself (see my edited post above).

The issue is with the recursive mutex (chMutex) in XrdSys::IOEvents::Channel. If a context switch happens in the poller thread after PollPoll::Include unlocks the channel mutex, but before it gets locked again in PollPoll::Init, then the callback thread can lock the mutex in PollPoll::CbkXeq and try to do stuff with the poller before the channel is ready.

The channel mutex gets locked and unlocked all over the place, and I'm finding it a little difficult to see a way of avoiding this at the moment.

@abh3
Copy link
Member

abh3 commented Apr 26, 2013

Thanks Jason! That saves me a huge amount of work. I think I can come with something here to avoid this problem. Of course, the nagging question is why this doesn't show up in Linux. Anyway, I will work on it tomorrow. Oh yes, the reason that the channel mutex gets unlocked all over the place is because there was a desire to be able to call any of the channel methods while you were in a callback, including deleting the channel itself. It becomes rather messy when you liberalize the calls that way.

@abh3
Copy link
Member

abh3 commented Apr 26, 2013

OK, this is clearly a client bug. The client does not properly handle the case when XrdNet::Connect() immediately returns with success instead of EINPROGRESS. You can see this in the following two traces (I added some print statements). This only happens in Solaris and MacOS when connecting to a local socket (Linux seems to always return EINPROGRESS). Everything works fine as long as you get EINPROGRESS status. It appears that there is a logic issue when testing for connection status on the first callback (which always happens). The other issue I saw is that the callback does not handle the case when more than one event is reflected (e.g. readytoread along with readytowrite). This also seems to cause problems. In any case, the builtin poller is working exactly as advertised.

Non working trace:

[2013-04-26 15:15:03 -0700][Debug  ][PostMaster        ] Creating new channel to: sysdev4500:1094 1 stream(s)
[2013-04-26 15:15:03 -0700][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: sysdev4500:1094" to be run at: [2013-04-26 15:15:18 -0700]
[2013-04-26 15:15:03 -0700][Dump   ][PostMaster        ] [sysdev4500:1094 #0] Sending message kXR_locate (path: */tmp/abh, flags: none) through substream 0 expecting answer at 0
[2013-04-26 15:15:03 -0700][Debug  ][PostMaster        ] [sysdev4500:1094] Found 1 address(es): [::172.23.16.108]:1094
[2013-04-26 15:15:03 -0700][Debug  ][AsyncSock         ] [sysdev4500:1094 #0.0] Attempting connection to [::172.23.16.108]:1094
ClSocket Connect satus=0
[2013-04-26 15:15:03 -0700][Debug  ][Poller            ] Adding socket 0x482280 to the poller
Init called ev=4 lock=1
SysIOEvents Callback ev=4 toq=1 cbk=1 ini=0
SysIOEvents Callback return ckok=1
[2013-04-26 15:15:04 -0700][Dump   ][TaskMgr           ] Running task: "FileTimer task"
[2013-04-26 15:15:04 -0700][Dump   ][TaskMgr           ] Will rerun task "FileTimer task" at [2013-04-26 15:15:19 -0700]

Working trace:

[2013-04-26 15:17:30 -0700][Debug  ][PostMaster        ] Creating new channel to: noric22:1094 1 stream(s)
[2013-04-26 15:17:30 -0700][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: noric22:1094" to be run at: [2013-04-26 15:17:45 -0700]
[2013-04-26 15:17:30 -0700][Dump   ][PostMaster        ] [noric22:1094 #0] Sending message kXR_locate (path: */tmp/abh, flags: none) through substream 0 expecting answer at 0
[2013-04-26 15:17:30 -0700][Debug  ][PostMaster        ] [noric22:1094] Found 1 address(es): [::134.79.200.115]:1094
[2013-04-26 15:17:30 -0700][Debug  ][AsyncSock         ] [noric22:1094 #0.0] Attempting connection to [::134.79.200.115]:1094
ClSocket Connect satus=150
[2013-04-26 15:17:30 -0700][Debug  ][Poller            ] Adding socket 0x482280 to the poller
Init called ev=4 lock=1
SysIOEvents Callback ev=4 toq=1 cbk=1 ini=0
[2013-04-26 15:17:30 -0700][Debug  ][AsyncSock         ] [noric22:1094 #0.0] Async connection call returned
[2013-04-26 15:17:30 -0700][Debug  ][XRootDTransport   ] [noric22:1094 #0.0] Sending out the initial hand shake + kXR_protocol
SysIOEvents Callback return ckok=1
SysIOEvents Callback ev=4 toq=1 cbk=1 ini=0
[2013-04-26 15:17:30 -0700][Dump   ][AsyncSock         ] [noric22:1094 #0.0] Wrote a message: , 44 bytes
SysIOEvents Callback return ckok=1
SysIOEvents Callback ev=1 toq=1 cbk=1 ini=0
[2013-04-26 15:17:30 -0700][Dump   ][AsyncSock         ] [noric22:1094 #0.0] Received message header, size: 8
[2013-04-26 15:17:30 -0700][Dump   ][AsyncSock         ] [noric22:1094 #0.0] Received a message of 16 bytes
[2013-04-26 15:17:30 -0700][Debug  ][XRootDTransport   ] [noric22:1094 #0.0] Got the server hand shake response (type: server [], protocol version 297)
SysIOEvents Callback return ckok=1
SysIOEvents Callback ev=1 toq=1 cbk=1 ini=0
[2013-04-26 15:17:30 -0700][Dump   ][AsyncSock         ] [noric22:1094 #0.0] Received message header, size: 8
[2013-04-26 15:17:30 -0700][Dump   ][AsyncSock         ] [noric22:1094 #0.0] Received a message of 16 bytes
[2013-04-26 15:17:30 -0700][Debug  ][XRootDTransport   ] [noric22:1094 #0.0] kXR_protocol successful (type: server [], protocol version 297)
[2013-04-26 15:17:30 -0700][Debug  ][XRootDTransport   ] [noric22:1094 #0.0] Sending out kXR_login request, username: abh

@ljanyst
Copy link
Contributor

ljanyst commented Apr 27, 2013

  1. The client only needs to check for fatal errors at this point. The poller is supposed to be level-triggered, so, if the connection was successful, when the socket becomes ready to write, the client should be notified about this fact. This is because the client has subscribed to receive write notifications for this socket.
  2. The client doesn't expect to receive read notification for this socket at this point, because it has not asked to be notified about read events for this socket at this point.

The problem is a race on poller init.

@abh3
Copy link
Member

abh3 commented Apr 27, 2013

I respectfully disagree and can show you the exact trace that shows why. After calling connect the client explicitly enables write notifications. The poller happily obliges and sends a "readytowrite" notifications because, in fact, this is true. There is no race condition at all. I can show you explicit traces that show that. The sequence is absolutely correct. The only difference is when the Connect method indicates that connect immediately succeeded,it does not correctly handle the readytowrite notification. This is likely because that notification should not have been enabled at that point (but always is).

@abh3
Copy link
Member

abh3 commented Apr 28, 2013

Let me explain further what the logic problem is in the client's builtin poller method. When a socket is immediately connected the socket status is marked as connected. The readytowrite callback is invoked after the enable for write notification is called The enable is called regardless of whether not the connect immediately succeeded. The callback method checks if there is a write outstanding or if the status is connecting. However, in this case we have a status of connected but no handshake outstanding because the logic is that the handshake will be driven via the callback.. In this instance, the callback ignores that notification because there is no write outstanding. So we have a condition where the handshake is never consummated when the connect immediately succeeds. I can produce explicit traces that show this. This is clearly an issue in the client. There is no race condition.Everything happens in the exact sequence that the client enables. It is ,merely a wrong assumption on the client's part when the connect immediately succeeds.

@abh3
Copy link
Member

abh3 commented Apr 28, 2013

Well, it's a small consolation but it appears that we were both correct. Indeed, there was a race condition that would affect MacOS. That problem was masked in my initial testing by the immediate connect problem. Adding sufficient tracing via cerr changed the timing to make the race issue obvious. So, the fix has been pushed. However, the immediate connect problem remains and can be best seen on Solaris when connecting to 'localhost' (or its equivalent). So, as a byproduct I now have a complete trace that points out the issue. When the client gets an immediate connect it fails to start the handshake but enables write events. When the event callback is invoked it sees that there is no outstanding write so it simply disables write notifications. That means the handshake is never preformed and the connection eventually times out. Below is the trace (not that lines starting with 'Cl' are in the client code and correspond to the first few characters of the associated file). The code was complicated enough that I couldn't see how to fix it. Again, you need to try this on Solaris as it always immediately connects to a server running on the same machine, MacOS sometimes does and sometimes not, and Linux never does.

sysdev4500> ./xrdfs sysdev4500 ls /tmp/abh
ClStr Calling Connect(120)
ClSoc Connect status=0
ClSoc Immediate connect; pStatus=connected
ClPB enabling write events!
IOE fd 10 Enable: ->Enable(4,60)
IOE fd 10 Init: begin events=4
IOE fd 10 Init: Include() returned true channel now locked
IOE fd 10 Init: Modify(4) == true [prior events=0] channel now locked
IOE fd 10 Modify: Init() returned true
IOE fd 10 Enable: Modify(4) == true [prior events=0] channel now locked
ClStr Connect status OK, pSubStreams[0]->status=Connecting
IOE fd 10 CbkXeq: callback events=4 toq=1 erc=0 callback present poller=norm
IOE fd 10 CbkXeq: invoking callback; events=4
ClPB In callback; evflags=4 ev=4
ClAsyn ReadyToWrite & socket connected & handshakes NOT Done; ->OnWriteWhileHandshaking()
ClAsyn pOutgoing=false; ->DisableUplink()
ClPB disabling write events!
IOE fd 10 Disable: Modify(0) == true [prior events=4] channel now locked
ClAsyn pOutgoing=false; simply return
IOE fd 10 CbkXeq: callback returned true
******* HANG *******

On the server side:

130427 22:43:24 001 XrdInet: Accepted connection from 22@sysdev4500.slac.stanford.edu
130427 22:43:24 005 XrdSched: running ?:22@sysdev4500 inq=0
130427 22:43:54 005 XrdProtocol: ?:22@sysdev4500 terminated handshake not received

@ljanyst
Copy link
Contributor

ljanyst commented May 6, 2013

Well, I respectfully disagree :) From the client perspective, three things can happen after XrdNetConnect::Connect returns:

  1. Immediate error (errno != 0 && errno != EINPROGRESS) - client runs/schedules recovery
  2. Async operation has been scheduled and the outcome is not known at the point (errno == EINPROGRESS) - client declares a temporary success, and expects to receive READYTOWRITE should the status change, in this case the possible error is handled in the callback, if there is no error it proceeds with the handshake
  3. Connect succeeded immediately (errno == 0) - client treats it as the case above. This is because, if you think about it, it really is the same. Since the socket is new and nothing has been written to it yet, it should become writable at some point, and since the poller is supposed to be level-triggered it should send READYTOWRITE notifications as long as the socket remains writable. So, the client should receive the callback, and when it does, it again invokes getsockopt SOL_SOCKET SO_ERROR, which returns success (it's an overhead to check for the same error twice, but performance-wise it's negligible), and then in proceeds with the handshake. The only "issue" here, is that success is not declared immediately, but in the callback instead.

So there is indeed a logic error. It is in the poller.

@ljanyst
Copy link
Contributor

ljanyst commented May 7, 2013

After carefully reading all the tickets and emails I have fixed my bug as well. I think we now indeed respectfully agree :)

Fixed:
8337b6a
3e08502

@ljanyst ljanyst closed this as completed May 7, 2013
bbockelm pushed a commit to bbockelm/xrootd that referenced this issue Jul 5, 2015
amadio pushed a commit to amadio/xrootd that referenced this issue May 12, 2023
Fixed and expanded statistics in ceph_close log.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants