New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core.Stream uses select() and it is prone to its limitations #182

Closed
glpatcern opened this Issue Dec 10, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@glpatcern
Contributor

glpatcern commented Dec 10, 2015

The Stream class is supposed to use the lib.compat layer, in order to make use of poll() or select() depending on the OS. But the Stream.poll() method actually uses the builtin select() function straight, thus being prone to the select limitation of not handling file descriptors higher than 1024.

To prove this, you can try the following code (on Linux):

import rpyc
import traceback
import time

# make sure we open enough fd and keep them to go beyond 1024
c = []
for i in range(1030):
  c.append(open('/dev/null'))

# establish an RPyC connection
rpcconn = rpyc.connect('yourserver.org', YOURPORT)
remote_root_ref = rpcconn.async_request(rpyc.core.consts.HANDLE_GETROOT)
remote_root_ref.set_expiry(1.0)
c.append(rpcconn)
try:
  # use it - this calls Stream.poll(), which actually is a select() ...
  rpcconn._remote_root = remote_root_ref.value
  print 'Successfully connected, waiting 10 secs before terminating...'
  time.sleep(10)
except Exception, e:
  print i, e
  traceback.print_exc()

The stack trace includes the following error:

error: filedescriptor out of range in select()

The following is a simple patch to fix this bug, can you please review it? Thanks!

diff --git a/rpyc/core/stream.py b/rpyc/core/stream.py
index eed2d87..1e680a7 100644
--- a/rpyc/core/stream.py
+++ b/rpyc/core/stream.py
@@ -8,7 +8,7 @@ import socket
 import time
 import errno
 from rpyc.lib import safe_import
-from rpyc.lib.compat import select, select_error, BYTES_LITERAL, get_exc_errno, maxint
+from rpyc.lib.compat import poll, select_error, BYTES_LITERAL, get_exc_errno, maxint
 win32file = safe_import("win32file")
 win32pipe = safe_import("win32pipe")
 msvcrt = safe_import("msvcrt")
@@ -36,9 +36,11 @@ class Stream(object):
         """indicates whether the stream has data to read (within *timeout*
         seconds)"""
         try:
+            p = poll()   # from lib.compat, it may be a select object on non-Unix platforms
+            p.register(self.fileno(), "r")
             while True:
                 try:
-                    rl, _, _ = select([self], [], [], timeout)
+                    rl = p.poll(timeout)
                 except select_error:
                     ex = sys.exc_info()[1]
                     if ex.args[0] == errno.EINTR:
@@ -48,8 +50,10 @@ class Stream(object):
                 else:
                     break
         except ValueError:
-            # i get this some times: "ValueError: file descriptor cannot be a negative integer (-1)"
-            # let's translate it to select.error
+            # if the underlying call is a select(), then the following errors may happen:
+            # - "ValueError: filedescriptor cannot be a negative integer (-1)"
+            # - "ValueError: filedescriptor out of range in select()"
+            # let's translate them to select.error
             ex = sys.exc_info()[1]
             raise select_error(str(ex))
         return bool(rl)

@glpatcern glpatcern changed the title from core.Stream uses select() and is prone to its limitations to core.Stream uses select() and it is prone to its limitations Dec 10, 2015

glpatcern pushed a commit to glpatcern/rpyc that referenced this issue Dec 11, 2015

tomerfiliba added a commit that referenced this issue Feb 10, 2016

Merge pull request #183 from glpatcern/master
Fixed bug #182: Stream uses select() and it is prone to its limitations

glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016

glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016

glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016

Fixed usage of system-level poll().
The bug in compat.py was unveiled after fixing bug tomerfiliba#182 and observing spinning processes.
After that, the poll() call in server.py became too slow (1s instead of 1ms),
leading to timeouts. The new value allows for not spinning too fast yet
reacting quickly to new incoming connections.

glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016

Fixed usage of system-level poll().
The bug in compat.py was unveiled after fixing bug tomerfiliba#182 and
observing spinning processes. After that, the poll() call
in server.py became too slow (1s instead of 1ms),
leading to timeouts. The new value allows for not spinning
too fast yet reacting quickly to new incoming connections.

glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016

Fixed usage of system-level poll().
The bug in compat.py was unveiled after fixing bug tomerfiliba#182 and observing spinning processes. After that, the poll() call in server.py became too slow (1s instead of 1ms), leading to timeouts. The new value allows for not spinning too fast yet reacting quickly to new incoming connections.

yonie8storage added a commit to E8-Storage/rpyc that referenced this issue Dec 26, 2016

coldfix added a commit that referenced this issue May 29, 2017

Merge pull request #191 from glpatcern/master
Further fixes unveiled after fix for bug #182
@coldfix

This comment has been minimized.

Collaborator

coldfix commented Jul 26, 2017

Hi, can't test this since I get

Traceback (most recent call last):
  File "issues/i182.py", line 8, in <module>
OSError: [Errno 24] Too many open files: '/dev/null'

already while opening the files locally before doing any server related stuff.

According to the commit messages, the issue is supposed to be fixed, so closing this. Reopen if needed.

@coldfix coldfix closed this Jul 26, 2017

@glpatcern

This comment has been minimized.

Contributor

glpatcern commented Jul 26, 2017

Hi, it's true that we have this fix in production for over one year and we did not see any issue, but nevertheless can you send me please the test you're executing and the full stack trace?

@glpatcern

This comment has been minimized.

Contributor

glpatcern commented Jul 26, 2017

ops, sorry - you mean you executed my test case I guess. After >1 year I forgot I posted it. Now the 'Too many open files' is precisely the error you get before the patch, in principle you should not get it after it.

@coldfix

This comment has been minimized.

Collaborator

coldfix commented Jul 26, 2017

ops, sorry - you mean you executed my test case I guess. After >1 year I forgot I posted it.

:)

Now the 'Too many open files' is precisely the error you get before the patch, in principle you should not get it after it.

No, I mean the error I get is not due to rpyc. I get an error not during select but already when opening the files, here:

for i in range(1030):
  c.append(open('/dev/null'))

Therefore, I can't really verify whether the error in rpyc itself still exists if it were possible on my platform to open so many files. (archlinux, tried py2.7 and 3.6)

@glpatcern

This comment has been minimized.

Contributor

glpatcern commented Jul 26, 2017

OK, I see. This is most likely because recent platforms limit the number of open files to 1024 (I could reproduce your behavior on my desktop, while on our production platforms, py2.7.5 on CentOS 7, I can still run the full test). You can try and run ulimit -n unlimited prior to execute python and run the test: on my desktop the hard limit is 4096, plenty to validate the case.

@coldfix

This comment has been minimized.

Collaborator

coldfix commented Jul 26, 2017

Great, thanks. Seems to work!

@glpatcern

This comment has been minimized.

Contributor

glpatcern commented Jul 26, 2017

Cool, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment