Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core.Stream uses select() and it is prone to its limitations #182

Closed
glpatcern opened this issue Dec 10, 2015 · 7 comments
Closed

core.Stream uses select() and it is prone to its limitations #182

glpatcern opened this issue Dec 10, 2015 · 7 comments

Comments

@glpatcern
Copy link
Contributor

@glpatcern glpatcern commented Dec 10, 2015

The Stream class is supposed to use the lib.compat layer, in order to make use of poll() or select() depending on the OS. But the Stream.poll() method actually uses the builtin select() function straight, thus being prone to the select limitation of not handling file descriptors higher than 1024.

To prove this, you can try the following code (on Linux):

import rpyc
import traceback
import time

# make sure we open enough fd and keep them to go beyond 1024
c = []
for i in range(1030):
  c.append(open('/dev/null'))

# establish an RPyC connection
rpcconn = rpyc.connect('yourserver.org', YOURPORT)
remote_root_ref = rpcconn.async_request(rpyc.core.consts.HANDLE_GETROOT)
remote_root_ref.set_expiry(1.0)
c.append(rpcconn)
try:
  # use it - this calls Stream.poll(), which actually is a select() ...
  rpcconn._remote_root = remote_root_ref.value
  print 'Successfully connected, waiting 10 secs before terminating...'
  time.sleep(10)
except Exception, e:
  print i, e
  traceback.print_exc()

The stack trace includes the following error:

error: filedescriptor out of range in select()

The following is a simple patch to fix this bug, can you please review it? Thanks!

diff --git a/rpyc/core/stream.py b/rpyc/core/stream.py
index eed2d87..1e680a7 100644
--- a/rpyc/core/stream.py
+++ b/rpyc/core/stream.py
@@ -8,7 +8,7 @@ import socket
 import time
 import errno
 from rpyc.lib import safe_import
-from rpyc.lib.compat import select, select_error, BYTES_LITERAL, get_exc_errno, maxint
+from rpyc.lib.compat import poll, select_error, BYTES_LITERAL, get_exc_errno, maxint
 win32file = safe_import("win32file")
 win32pipe = safe_import("win32pipe")
 msvcrt = safe_import("msvcrt")
@@ -36,9 +36,11 @@ class Stream(object):
         """indicates whether the stream has data to read (within *timeout*
         seconds)"""
         try:
+            p = poll()   # from lib.compat, it may be a select object on non-Unix platforms
+            p.register(self.fileno(), "r")
             while True:
                 try:
-                    rl, _, _ = select([self], [], [], timeout)
+                    rl = p.poll(timeout)
                 except select_error:
                     ex = sys.exc_info()[1]
                     if ex.args[0] == errno.EINTR:
@@ -48,8 +50,10 @@ class Stream(object):
                 else:
                     break
         except ValueError:
-            # i get this some times: "ValueError: file descriptor cannot be a negative integer (-1)"
-            # let's translate it to select.error
+            # if the underlying call is a select(), then the following errors may happen:
+            # - "ValueError: filedescriptor cannot be a negative integer (-1)"
+            # - "ValueError: filedescriptor out of range in select()"
+            # let's translate them to select.error
             ex = sys.exc_info()[1]
             raise select_error(str(ex))
         return bool(rl)
@glpatcern glpatcern changed the title core.Stream uses select() and is prone to its limitations core.Stream uses select() and it is prone to its limitations Dec 10, 2015
glpatcern pushed a commit to glpatcern/rpyc that referenced this issue Dec 11, 2015
tomerfiliba added a commit that referenced this issue Feb 10, 2016
Fixed bug #182: Stream uses select() and it is prone to its limitations
glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016
glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016
glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016
The bug in compat.py was unveiled after fixing bug tomerfiliba-org#182 and observing spinning processes.
After that, the poll() call in server.py became too slow (1s instead of 1ms),
leading to timeouts. The new value allows for not spinning too fast yet
reacting quickly to new incoming connections.
glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016
The bug in compat.py was unveiled after fixing bug tomerfiliba-org#182 and
observing spinning processes. After that, the poll() call
in server.py became too slow (1s instead of 1ms),
leading to timeouts. The new value allows for not spinning
too fast yet reacting quickly to new incoming connections.
glpatcern added a commit to glpatcern/rpyc that referenced this issue Mar 31, 2016
The bug in compat.py was unveiled after fixing bug tomerfiliba-org#182 and observing spinning processes. After that, the poll() call in server.py became too slow (1s instead of 1ms), leading to timeouts. The new value allows for not spinning too fast yet reacting quickly to new incoming connections.
coldfix added a commit that referenced this issue May 29, 2017
Further fixes unveiled after fix for bug #182
@coldfix
Copy link
Contributor

@coldfix coldfix commented Jul 26, 2017

Hi, can't test this since I get

Traceback (most recent call last):
  File "issues/i182.py", line 8, in <module>
OSError: [Errno 24] Too many open files: '/dev/null'

already while opening the files locally before doing any server related stuff.

According to the commit messages, the issue is supposed to be fixed, so closing this. Reopen if needed.

@coldfix coldfix closed this Jul 26, 2017
@glpatcern
Copy link
Contributor Author

@glpatcern glpatcern commented Jul 26, 2017

Hi, it's true that we have this fix in production for over one year and we did not see any issue, but nevertheless can you send me please the test you're executing and the full stack trace?

@glpatcern
Copy link
Contributor Author

@glpatcern glpatcern commented Jul 26, 2017

ops, sorry - you mean you executed my test case I guess. After >1 year I forgot I posted it. Now the 'Too many open files' is precisely the error you get before the patch, in principle you should not get it after it.

@coldfix
Copy link
Contributor

@coldfix coldfix commented Jul 26, 2017

ops, sorry - you mean you executed my test case I guess. After >1 year I forgot I posted it.

:)

Now the 'Too many open files' is precisely the error you get before the patch, in principle you should not get it after it.

No, I mean the error I get is not due to rpyc. I get an error not during select but already when opening the files, here:

for i in range(1030):
  c.append(open('/dev/null'))

Therefore, I can't really verify whether the error in rpyc itself still exists if it were possible on my platform to open so many files. (archlinux, tried py2.7 and 3.6)

@glpatcern
Copy link
Contributor Author

@glpatcern glpatcern commented Jul 26, 2017

OK, I see. This is most likely because recent platforms limit the number of open files to 1024 (I could reproduce your behavior on my desktop, while on our production platforms, py2.7.5 on CentOS 7, I can still run the full test). You can try and run ulimit -n unlimited prior to execute python and run the test: on my desktop the hard limit is 4096, plenty to validate the case.

@coldfix
Copy link
Contributor

@coldfix coldfix commented Jul 26, 2017

Great, thanks. Seems to work!

@glpatcern
Copy link
Contributor Author

@glpatcern glpatcern commented Jul 26, 2017

Cool, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants