Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket disconnect and reconnect every 10 minutes #3258

Open
Reinering opened this issue Apr 24, 2023 · 3 comments · May be fixed by #3376
Open

websocket disconnect and reconnect every 10 minutes #3258

Reinering opened this issue Apr 24, 2023 · 3 comments · May be fixed by #3376

Comments

@Reinering
Copy link

code:
Application(urlpatterns,
websocket_ping_interval=20,
websocket_ping_timeout=60)

Connection is disconnect and reconnect every 10 minutes
image

If:
delete websocket_ping_interval and websocket_ping_timeout

The connection has been normal

@bdarnell
Copy link
Member

bdarnell commented May 2, 2023

Hmm, I can't think of anything in Tornado that would close a connection every 10 minutes (which suggests to me the possibility of proxy or NAT issues). What did the traffic look like in between these close packets? Were pings being sent back and forth ever 20 seconds as configured?

@Reinering
Copy link
Author

happening every 20 seconds
image

LAN without nat and proxy

@oliver-sanders
Copy link

I have also encountered this, I think there may be an issue with the way ping/pong responses are timed.

E.G, when I use this diff:

diff --git a/tornado/websocket.py b/tornado/websocket.py
index fbfd7008..0de0a439 100644
--- a/tornado/websocket.py
+++ b/tornado/websocket.py
@@ -11,6 +11,8 @@ defined in `RFC 6455 <http://tools.ietf.org/html/rfc6455>`_.
    Removed support for the draft 76 protocol version.
 """
 
+from time import time
+
 import abc
 import asyncio
 import base64
@@ -1230,6 +1232,7 @@ class WebSocketProtocol13(WebSocketProtocol):
             self.close(self.close_code)
         elif opcode == 0x9:
             # Ping
+            print(f':ping: {time()}')
             try:
                 self._write_frame(True, 0xA, data)
             except StreamClosedError:
@@ -1237,6 +1240,7 @@ class WebSocketProtocol13(WebSocketProtocol):
             self._run_callback(self.handler.on_ping, data)
         elif opcode == 0xA:
             # Pong
+            print(f':pong: {time()}')
             self.last_pong = IOLoop.current().time()
             return self._run_callback(self.handler.on_pong, data)
         else:
@@ -1326,13 +1330,16 @@ class WebSocketProtocol13(WebSocketProtocol):
         since_last_ping = now - self.last_ping
         assert self.ping_interval is not None
         assert self.ping_timeout is not None
+        print(f'{since_last_ping=} {since_last_pong=} {self.ping_interval=} {self.ping_timeout=}')
         if (
             since_last_ping < 2 * self.ping_interval
             and since_last_pong > self.ping_timeout
         ):
+            print('CLOSE')
             self.close()
             return
 
+        print(f':ping: {time()}')
         self.write_ping(b"")
         self.last_ping = now

With ping_interval = ping_timeout = 10 I get the following output:

since_last_ping=10.00104022026062 since_last_pong=10.00104022026062 self.ping_interval=10 self.ping_timeout=10
CLOSE
since_last_ping=10.001275062561035 since_last_pong=10.001275062561035 self.ping_interval=10 self.ping_timeout=10
CLOSE
since_last_ping=10.001463174819946 since_last_pong=10.001463174819946 self.ping_interval=10 self.ping_timeout=10
CLOSE

The connection is closed before the first ping is even sent. When the connection is re-established it does the same thing again.

I think the startup logic is causing this. When I repeat this experiment with the following diff:

diff --git a/tornado/websocket.py b/tornado/websocket.py
index 0de0a439..9e4230ba 100644
--- a/tornado/websocket.py
+++ b/tornado/websocket.py
@@ -1307,7 +1307,7 @@ class WebSocketProtocol13(WebSocketProtocol):
         """Start sending periodic pings to keep the connection alive"""
         assert self.ping_interval is not None
         if self.ping_interval > 0:
-            self.last_ping = self.last_pong = IOLoop.current().time()
+            self.last_ping = self.last_pong = IOLoop.current().time() + self.ping_interval  # we won't get a pong until *after* the first ping is sent
             self.ping_callback = PeriodicCallback(
                 self.periodic_ping, self.ping_interval * 1000
             )

I get different results:

since_last_ping=0.0011792182922363281 since_last_pong=0.0011792182922363281 self.ping_interval=10 self.ping_timeout=10
:ping: 1714471603.2959082
:pong: 1714471603.2985868
since_last_ping=9.999458312988281 since_last_pong=9.996686458587646 self.ping_interval=10 self.ping_timeout=10
:ping: 1714471613.2953525
:pong: 1714471613.2958899
since_last_ping=9.999933958053589 since_last_pong=9.999330759048462 self.ping_interval=10 self.ping_timeout=10
:ping: 1714471623.2952828
:pong: 1714471623.2957668
since_last_ping=10.000473499298096 since_last_pong=9.999927759170532 self.ping_interval=10 self.ping_timeout=10
:ping: 1714471633.295772
:pong: 1714471633.2964153
since_last_ping=10.001248121261597 since_last_pong=10.000519037246704 self.ping_interval=10 self.ping_timeout=10
CLOSE

Now, we get ping/pong responses as expected, however, the connection is eventually closed by the timeout logic despite the pong being received a fraction of a second within the ping (well within the timeout).

I think the way ping/pong timings are worked out is a bit off.

oliver-sanders added a commit to oliver-sanders/tornado that referenced this issue Apr 30, 2024
* Closes tornadoweb#3258
* Fixes an issue with the calculation of ping timeout interval that
  could cause connections to be erroneously timed out and closed
  from the server end.
oliver-sanders added a commit to oliver-sanders/tornado that referenced this issue Apr 30, 2024
* Closes tornadoweb#3258
* Fixes an issue with the calculation of ping timeout interval that
  could cause connections to be erroneously timed out and closed
  from the server end.
oliver-sanders added a commit to oliver-sanders/tornado that referenced this issue Apr 30, 2024
* Closes tornadoweb#3258
* Fixes an issue with the calculation of ping timeout interval that
  could cause connections to be erroneously timed out and closed
  from the server end.
oliver-sanders added a commit to oliver-sanders/tornado that referenced this issue Apr 30, 2024
* Closes tornadoweb#3258
* Fixes an issue with the calculation of ping timeout interval that
  could cause connections to be erroneously timed out and closed
  from the server end.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants