Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle EINTR when closing sockets #56

Closed
daurnimator opened this issue Feb 24, 2015 · 10 comments
Closed

Handle EINTR when closing sockets #56

daurnimator opened this issue Feb 24, 2015 · 10 comments

Comments

@daurnimator
Copy link
Collaborator

In cqs_closefd, close is retried on EINTR.
You should not do this (from man page):

In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed.

Related links:

@daurnimator
Copy link
Collaborator Author

So originally I filed this bug in the other direction: that so_closesocket should retry; however after reading a few articles, it seems that that is the worse solution.

Looking even further, to be safe, we may need to do different things based on operating system...

  • HP-UX you try again
  • Linux, Solaris(?) and AIX you don't want to retry again
  • OSX can be either; though if you use close_nocancel instead you should not try again.

Making things even trickier is glibc, see this comment https://code.google.com/p/chromium/issues/detail?id=269623#c13

@wahern
Copy link
Owner

wahern commented Mar 2, 2015

On Mon, Feb 23, 2015 at 06:40:00PM -0800, daurnimator wrote:

So originally I filed this bug in the other direction: that so_closesocket should retry; however after reading a few articles, it seems that that is the worse solution.

Looking even further, to be safe, we may need to do different things based on operating system...

  • HP-UX you try again
  • Linux, Solaris(?) and AIX you don't want to retry again
  • OSX can be either; though if you use close_nocancel instead you should not try again.

I read this thread awhile ago

http://austingroupbugs.net/view.php?id=529

but forgot that OS X is complicated.

@daurnimator
Copy link
Collaborator Author

ps. i have edited/improved the original post, check it out on github :)

wahern added a commit that referenced this issue Jun 2, 2015
@daurnimator
Copy link
Collaborator Author

So that fixes the mac case; what about linux+glibc; should we use the raw syscall there?

@wahern
Copy link
Owner

wahern commented Jun 3, 2015

On Linux EINTR can be safely ignored. close will always close the descriptor as long as it's valid. See http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-09/3000.html and the comments at http://austingroupbugs.net/view.php?id=529.

AIX behaves the way as Linux. A comment in the Austin Group thread suggests that Solaris behaves like Linux, but the Solaris 11.1 man page says the state of the descriptor is unspecified. Regardless, except for the hack for OS X, there's no API on any platform that allows to work around the issue in a thread-safe manner. If the platform does the wrong thing, then we're forced to simply leak the descriptor.

@daurnimator
Copy link
Collaborator Author

On Linux EINTR can be safely ignored. close will always close the descriptor as long as it's valid. See http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-09/3000.html and the comments at http://austingroupbugs.net/view.php?id=529.

Yes.. however glibc wraps it to make it non-safe again ==> it calls __pthread_enable_asynccancel before it calls close syscall.
So I think we need to use the raw syscall directly?

@wahern
Copy link
Owner

wahern commented Jun 4, 2015

I think there are two completely separate issues here. The first is what to do if close has been interrupted by a signal and returns EINTR. The second is properly handling thread cancellation. Note that in the second case close will never return to the application.

For the first case, Linux and OS X already do the correct thing: if close returns EINTR to the application because of a signal then the file descriptor has already been closed. Even on OS X there's no need to use close_nocancel.

For the second case, both glibc and OS X have bugs where if a thread is marked for cancellation, the cancellation might happen before or after the descriptor has actually been closed, which means it's impossible for applications to know whether the descriptor has already been closed when running their cancellation handlers. For glibc[1] this is entirely a userspace bug because glibc tests for cancellation before calling the close syscall; but cancellation could also happen after the kernel has closed the descriptor. For OS X it's mostly a kernel issue[2]: the kernel close first tests to see if the thread was cancelled, and if so returns EINTR before the descriptor was invalidated. Otherwise, if cancellation happens during the close operation, it will return EINTR after the descriptor was invalidated. In either event, upon seeing EINTR the userspace wrapper tests for cancellation.

It seems many of the comments out in the wild are confusing the two issues. If and when we want to address the cancellation problem, the solution for both[3] platforms (and theoretically all) is easy: just disable cancellations with pthread_setcancelstate before calling close.

But currently cqueues cannot handle cancellations at all. It doesn't setup cancellation handlers to destroy the Lua context, and it doesn't disable cancellations in critical sections that might call into a cancellation point. We're going to leak everything, anyhow. That's definitely a worthwhile feature, but we should open a whole new issue tracker for that. I think we can close this issue out because (AFAICT) on all the supported platforms we can safely ignore EINTR from close and assume the descriptor has been closed.

[1] See this bug report, which describes everything in detail: https://sourceware.org/bugzilla/show_bug.cgi?id=12683

[2] It took me forever to track down exactly what OS X was doing. Here's the relevant code:

[3] Assuming the code is running in "conforming" mode (which is the default for 64-bit), where pthread_setcancelstate calls into the kernel (__pthread_canceled). Otherwise, we would have to use close_nocancel to avoid the race between closing the file descriptor and cancellation causing EINTR, where we can't know which happened first.

@daurnimator
Copy link
Collaborator Author

Consider this closed then :)

@daurnimator
Copy link
Collaborator Author

btw, this still exists in the (unused) kpoll library: https://github.com/wahern/cqueues/blob/master/src/lib/kpoll.c#L118 I suppose that's issue #24 though.

@1Hyena
Copy link

1Hyena commented Oct 14, 2015

Would it make sense to block all signals prior to any call to close and unblock the signals after close has returned? Would this be a perfect workaround to the EINTR possibility?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants