-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gets disconnected when recieving HIGH number of msgs per second #786
Comments
I don't see any error messages related to your issue which limits how much help I can provide. To handle large numbers of connections on Linux, which is probably what is used if redis is involved, you should make sure the value of |
Are you setting a ping interval and ping timeout in your code? Can you share the lines of code where you set these values? The error looks like the message from the server is delayed and the ping/pong timeout ends the connection. The delay might be happening on your local system if it cannot keep up with the high amount of server traffic sent to it. Modifying the ping/pong timeout to allow for a greater delay should help. |
yes 21 and 20 seconds (delay and timeout respectively)
These are set in the library. HERE
Yes. that's what I mentioned in the title itself. For a test, I just printed whatever message came in inside the message handler. I don't think there is any more lightweight thing to do with the msg. so I tried with nothing but a print statement, and it still disconnects with the same error.
What values do ya recommend instead of 21 and 20 seconds? |
follow up I tried the rel approach reusing the example linked above and it appeared to timeout after a while when no network was available Traceback (most recent call last):
File "/home/pssolanki/projects/tests/lib_test/test.py", line 35, in <module>
rel.dispatch()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/rel.py", line 205, in dispatch
registrar.dispatch()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/registrar.py", line 72, in dispatch
if not self.loop():
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/registrar.py", line 81, in loop
e = self.check_events()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/registrar.py", line 236, in check_events
self.handle_error(fd)
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/registrar.py", line 131, in handle_error
self.callback('read', fd)
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/registrar.py", line 125, in callback
self.events[etype][fd].callback()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/rel/listener.py", line 108, in callback
if not self.cb(*self.args) and not self.persist and self.active:
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_app.py", line 335, in read
op_code, frame = self.sock.recv_data_frame(True)
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_core.py", line 396, in recv_data_frame
frame = self.recv_frame()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_core.py", line 435, in recv_frame
return self.frame_buffer.recv_frame()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_abnf.py", line 337, in recv_frame
self.recv_header()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_abnf.py", line 293, in recv_header
header = self.recv_strict(2)
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_abnf.py", line 372, in recv_strict
bytes_ = self.recv(min(16384, shortage))
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_core.py", line 519, in _recv
return recv(self.sock, bufsize)
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_socket.py", line 113, in recv
bytes_ = _recv()
File "/home/pssolanki/projects/openSource/polygon/venv/lib/python3.8/site-packages/websocket/_socket.py", line 90, in _recv
return sock.recv(bufsize)
File "/usr/lib/python3.8/ssl.py", line 1226, in recv
return self.read(buflen)
File "/usr/lib/python3.8/ssl.py", line 1101, in read
return self._sslobj.read(len)
TimeoutError: [Errno 110] Connection timed out this is not very critical to me at the moment though. |
@bubbleboy14 can probably tell us whether this error is coming from websocket-client or rel. I can't tell at first glance. On a different topic, a suggestion for bubbleboy14: if you are willing to migrate or clone the rel repository to GitHub, you probably would 1. get more feedback on the project 2. it could be easier to improve documentation and take PRs. I'd like to add more documentation around using rel for websocket-client but google code isn't used much anymore.
I don't know what the server timeout is for your APIs, but one idea is to set the ping timeout values closer to that. For example, ping_interval=50 and ping_timeout=49 if the server timeout is 60 seconds. If the server timeout is 5 minutes, try ping_interval=290 and ping_timeout=280, etc. If you're dealing with large streams of network data, your network connection could be the bottleneck. In that case, reducing the amount of websocket data or improving your system's ability to handle large volumes of network data are solutions that are outside the scope of this library. |
Hey @engn33r - ah yes, indeed! Yeah, I think there's a rel Google Code project from waaaay back in the day -- the github repo just isn't called rel for some reason -- don't remember why, but I called it registeredeventlistener, here: https://github.com/bubbleboy14/registeredeventlistener Anyway, @pssolanki111 - the traceback points to sock.recv(bufsize) on line 90 of _socket.py: https://github.com/websocket-client/websocket-client/blob/master/websocket/_socket.py So I think the TimeoutError (hit on line 1101 of ssl.py) is being caught on line 93 of _socket.py (in _recv(), called on line 113), and then raised anyway due to line 97 - line 97 seems a little off right? Assuming errno.EAGAIN and errno.EWOULDBLOCK are two different values, line 97 would evaluate to true (and therefore raise the exception) no matter what - maybe that should be an "and" instead of "or"? I mean, I think it's correct to raise it in this case, but I still think the logic of line 97 should be revisited. Anyway, it looks like the error raised on line 98 is not caught by lines 114 or 117, possibly because line 98 simply reads "raise"? Not sure, but this may need to be replaced with "raise exc", possibly. Also, it should be noted that the error in the screenshot correctly reported a WebSocketTimeoutException, which in that (non-rel) case is raised on line 366 of _app.py. I must confess that the check() function that contains line 366 is not utilized by the rel dispatcher. So I can see three ways to correct this: option 1) _socket.py
option 2) your app -> lib_test/test.py
option 3) rel integration
So I like option 1. Thoughts? |
Thank you bubbleboy14! I forgot there was a github repo and the search engine showed me the old google code link. I added a link in this project's README with commit e49e0e8 so it is easier to find in the future. I will respond to the other details you shared later. |
Awesome, thanks @engn33r! I should note that although it's possible that the rel dispatcher facilitates a websocket-client reconnect mechanism, rel itself provides no such mechanism - it looked like you were hitting the TimeoutError in Python's core ssl library, a socket level timeout that would understandably fire at some point (we should probs handle this case by raising the correct [WebSocketTimeoutException] error as outlined in option 1, above). The big selling point of rel (in my book / with regard to websocket-client) is that you can use it to run as many WebSocketApps as you want w/o resorting to (complications such as) threads. There's a pretty decent example of what this looks like here: Anyway, with regard to the auto-reconnect, the last time I tested this (by turning my wifi off and on), it worked fine. I'd be interested (@pssolanki111) in hearing how well things work on your end - LMK if you think any of the options mentioned above are worth pursuing! Also, I would further note that a socket-level TimeoutError isn't necessarily something we'd want to "automatically" handle, beyond notifying the application (via a WebSocketTimeoutException, or possibly supporting a user-defined callback for it). I can imagine wanting to handle it in a variety of different ways, depending on the particulars of the app: perhaps an admin should be notified; perhaps reconnects should be attempted (with exponential backoff) up to a certain point; and after that point, should some other logic be triggered? Seems to me it largely depends on the app. But of course, I'm for considering all options. Thoughts? |
hey @bubbleboy14 highly appreciate the inputs. Didn't know you maintained
so i tested this in a few ways.
|
Ah, I see. I made a couple tweaks to address these issues - check it out: I adjusted lines 95-97 in _socket.py, which actually may not be related but is nevertheless (I think) a logical improvement (correct me if I'm wrong @engn33r). Also, _socket.recv() now catches TimeoutErrors, which are now expressed (for consistency) as WebSocketTimeoutExceptions. On the purely custom dispatcher side of things, I also added a WrappedDispatcher to _app.py, which incorporates support for ping_interval/ping_timeout (by way of check() function in _app), and also restores pyevent compatibility (and simplifies rel integration - rel.safe_read() is no longer required). Also, I noticed that 4 checks failed on the pull request -- do you know what that's about @engn33r? I tested in the first two (of the three) ways you mentioned @pssolanki111:
Anyway, I think that in the second and third cases, you'll now get a WebSocketTimeoutException, which your app can use to trigger whatever logic is appropriate -- switch to a different websocket feed, notify admins, attempt to resume after some period, etc. LMK what yall think! |
I think the changes sound good. some checks were unsuccessful on the pull but the changes seem good. |
Thanks for the detailed analysis @bubbleboy14! The git history shows that logic bug you found on line 97 of _socket.py has been there since 2011, so very nice catch. My understanding of the dispatcher code isn't as thorough as it should be, so thanks for the PR introducing the WrappedDispatcher. The failing checks were mostly due to some public WebSocket endpoint URLs used in tests going offline, so I replaced those. The only failing check remaining is the code coverage check, which can be handled later. One point you bring up which should get fixed later is the bare |
Right on! Hey @engn33r, I should mention that the docs should be updated in a couple places: https://github.com/websocket-client/websocket-client#long-lived-connection and https://websocket-client.readthedocs.io/en/latest/examples.html#dispatching-multiple-websocketapps Basically, "rel.safe_read()" is no longer necessary - now that we have the WrappedDispatcher, rel and pyevent will work fine without it. |
Good catch again, I removed the |
Hey There,
Appreciate all the work creating this amazing client implementation.
I used this to write the callback based streaming client in my polygon API package and it has been doing very good work.
A few days back, one of the users of my lib reported disconection issues when the number of msgs being recieved per second was high. I made sure they weren't doing any heavy lifting in the handler itself. In fact they were just pushing messages to a redis queue from threads
Now on to some pointers,
What could be the reasons behind why it's not able to process the messages quickly enough and what could be done to resolve it?
further have we found a reliable and robust way to reconnect in case it does disconnect? @engn33r
Thanks : )
The text was updated successfully, but these errors were encountered: