Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection stuck in readyState 0 (CONNECTING) #1948

Closed
1 task done
nolimitdev opened this issue Sep 16, 2021 · 20 comments
Closed
1 task done

Connection stuck in readyState 0 (CONNECTING) #1948

nolimitdev opened this issue Sep 16, 2021 · 20 comments

Comments

@nolimitdev
Copy link

nolimitdev commented Sep 16, 2021

  • I've searched for any related issues and avoided creating a duplicate issue.

Description

I have never seen this and I use "ws" module for years but always in server mode (not client mode). But now there is a difference that I use it not on regular server/hosting as server with high speed internet connection but I use it on raspberry PI with NB-IoT hat modem (there is a limited internet speed; pings in about 400ms are common) as client and there are often disconnections (sometimes I have uptime 5 hours without disconnect but sometimes I have in about 10 disconnects per hour). I already have implemented heartbeat and reconnection (after 2 secs after close event). I send message (max. 20 bytes - values from sensor) each 5 minutes but now happens that it could not be send for more then 30 hours because of readyState 0 (CONNECTING). See log below. Of course on websocket server side this stuck client is 30 hours as offline. I know it seems to be rare what happened but question is - is it a bug? How long can be ws client in readyState 0 (CONNECTING)? I read here #1125 that in readyState 2 (CLOSING) it can be max. 30 second. But what about readyState 0 (CONNECTING)? Or should I implement some checking of readyState and if it is several minutes in readyState 0 (CONNECTING) I need to terminate connection which calls close event and fires my reconnect mechanism? Thanks.

ws version

7.5.1

Node.js Version

10.24.1

System

OS: Linux 5.10 Raspbian GNU/Linux 10 (buster) 10 (buster)
CPU: (1) arm ARMv6-compatible processor rev 7 (v6l)
Memory: 177.87 MB / 430.09 MB
Container: Yes
Shell: 5.0.3 - /bin/bash

Expected result

Connection can not be in readyState 0 (CONNECTING) so long (several days).

Actual result

Connection is in readyState 0 (CONNECTING) too long (for 30 hours and still continues, it seems I will have to stop nodejs script)

Attachments

2021-09-15 14:39:54 WS Init
2021-09-15 14:40:09 WS Connected
2021-09-15 14:42:12 WS Terminated (zombie)
2021-09-15 14:42:12 WS Message Sending Error: WebSocket is not open: readyState 3 (CLOSED)
2021-09-15 14:42:14 WS Init
2021-09-15 14:42:15 WS Connected
2021-09-15 14:42:52 WS Terminated (zombie)
2021-09-15 14:42:54 WS Init
2021-09-15 14:42:56 WS Connected
2021-09-15 14:47:12 WS Message Sent
2021-09-15 14:48:32 WS Terminated (zombie)
2021-09-15 14:48:34 WS Init
2021-09-15 14:48:34 WS Connected
2021-09-15 14:51:32 WS Terminated (zombie)
2021-09-15 14:51:34 WS Init
2021-09-15 14:52:12 WS Message Sending Error: WebSocket is not open: readyState 0 (CONNECTING)
2021-09-15 14:52:38 WS Connected
2021-09-15 14:54:52 WS Terminated (zombie)
2021-09-15 14:54:54 WS Init
2021-09-15 14:57:12 WS Message Sending Error: WebSocket is not open: readyState 0 (CONNECTING)
2021-09-15 15:02:12 WS Message Sending Error: WebSocket is not open: readyState 0 (CONNECTING)
...
and the same each 5 minutes;
btw internet via NB-IoT works well (I have ping.log);
I switched system from NB-IoT to LAN cable but still in readyState 0 (CONNECTING), so this stuck/hangs seems to be permanent
...
2021-09-16 10:42:27 WS Message Sending Error: WebSocket is not open: readyState 0 (CONNECTING)
2021-09-16 10:47:27 WS Message Sending Error: WebSocket is not open: readyState 0 (CONNECTING)
...
etc.

UPDATE: I restarted nodejs script and ws error event was fired now and logged...
2021-09-16 11:48:29 Received signal SIGTERM
2021-09-16 11:48:29 WS Error WebSocket was closed before the connection was established
2021-09-16 11:48:33 WS Init
2021-09-16 11:48:33 WS Connected

@lpinca
Copy link
Member

lpinca commented Sep 16, 2021

It does not seem to be a ws bug. It seems that the handshake request does not complete. Verify that the initial HTTP GET request body is received on the server and the response body is received on the client (see #1946 (comment)) You can also use the handshakeTimeout option to specify a custom timeout.

@nolimitdev
Copy link
Author

nolimitdev commented Sep 16, 2021

Sorry I do not mention that ws client is connecting to simple ws:// not wss://
Btw. and also ws client and ws server are the same version v7.5.1

@besworks
Copy link

You can also use the handshakeTimeout option to specify a custom timeout.

What is the default value of handshakeTimeout? As far as I can tell it's undefined if not specified in the WebSocket constructor. So, under typical usage this code never runs:

ws/lib/websocket.js

Lines 717 to 721 in 055949f

if (opts.timeout) {
req.on('timeout', () => {
abortHandshake(websocket, req, 'Opening handshake has timed out');
});
}

Is that true, or am I missing something? I also can't find where that timeout event is emitted.

@lpinca
Copy link
Member

lpinca commented Sep 16, 2021

Correct. There is no default. The client will wait indefinitely unless there are some other OS settings to close inactive sockets. See https://nodejs.org/api/http.html#http_http_request_url_options_callback.

@lpinca
Copy link
Member

lpinca commented Sep 22, 2021

I'm closing this. Discussion can continue if needed.

@lpinca lpinca closed this as completed Sep 22, 2021
@nolimitdev
Copy link
Author

nolimitdev commented Sep 23, 2021

So is code example bellow correct to avoid this issue?

var ws = require('ws');
var conn = new ws('...url...', { handshakeTimeout : 10000 });

How can I please simulate it works?

In piece of code shared by @besworks opts.timeout is just used as condition - where is value used? In script https://github.com/websockets/ws/blob/055949fd23b0d7f6a23ba9b9532b7834909df192/lib/websocket.js it seems it is not used.

Could you please define some default handshakeTimeout to avoid this CONNECTING stuck/hangs issue by default? Im asking because readyState 2 (CLOSING) has default timeout...

const closeTimeout = 30 * 1000;
and there is not possible stuck/hangs but readyState 0 (CONNECTING) has no default timeout...
timeout: undefined,
Is there please some reason for undefined opts.timeout? Thanks.

@nolimitdev
Copy link
Author

The last question.... when I define handshakeTimeout will be close event fired in case of timeout? Because in close event only I have reconnection mechanism.

@nolimitdev
Copy link
Author

I have simulated handshakeTimeout by setting low value e.g. 10 ms and it works and both error and close events are fired so my reconnect in close callback works. I think you should set default handshakeTimeout e.g. to 30 sec such as close timeout to avoid this issue ;)

@legendhimself
Copy link

legendhimself commented Feb 28, 2022

I have simulated handshakeTimeout by setting low value e.g. 10 ms and it works and both error and close events are fired so my reconnect in close callback works. I think you should set default handshakeTimeout e.g. to 30 sec such as close timeout to avoid this issue ;)

For me, it still doesn't work. It's stuck on CLOSING readyState. I am using ws v 8.5.0.
ws.close() never emits close event when the ws was asked to reconnect randomly , and according to my logic of code, I reconnect when the ws emits a close event. Since there are no emits It never reconnects. I tried ws.terminate() as well and the handshakeTimeout of 30s and nothing emits close event. To just confirm the readyState, after calling ws.close() I started to log the state every 5s and its indefinitely stuck on CLOSING

ws connection code

    const ws = (this.connection = WebSocket.create(gateway, wsQuery, { handshakeTimeout : 30000 }));
    // Set it to 10ms if you want to test the timeout, it does work but not when it hits 4009 randomly

interval to check if ready state ever changes from CLOSING

if (closeCode === 4009) {
      setInterval(() => {
        this.debug(`WS State [Interval 5s]: ${CONNECTION_STATE[this.connection?.readyState]}`);
      }, 5000);
    }

Any fixes? @lpinca @besworks

@nolimitdev
Copy link
Author

@legendhimslef I think that your issue does not belong to this my issue. This issue is about CONNECTING state stuck because of as found out it unfortunately has no default value which is bad I think and I suggested to define some default but no answer from maintainer.

But you have problem with CLOSING state. I think you are working something wrong because I have never had problem with stuck in CLOSING state and as mentioned above it have 30 sec timeout by default so close event should be fired and this state should not take longer then 30 sec. I have implemented terminating of zombie connections and reconnect mechanism is based on close event in many different projects and it works well long years - never happens that reconnect does not work. Maybe provide some minimum repo example and we will find some problem in your code.

@legendhimself
Copy link

legendhimself commented Mar 1, 2022

@legendhimslef I think that your issue does not belong to this my issue. This issue is about CONNECTING state stuck because of as found out it unfortunately has no default value which is bad I think and I suggested to define some default but no answer from maintainer.

But you have problem with CLOSING state. I think you are working something wrong because I have never had problem with stuck in CLOSING state and as mentioned above it have 30 sec timeout by default so close event should be fired and this state should not take longer then 30 sec. I have implemented terminating of zombie connections and reconnect mechanism is based on close event in many different projects and it works well long years - never happens that reconnect does not work. Maybe provide some minimum repo example and we will find some problem in your code.

I found this issue from CLOSING state issue, you mentioned this issues there soo yeah..
here is my issue.
discordjs/discord.js#7450 (comment)
the library code https://github.com/discordjs/discord.js/blob/stable/src/client/websocket/WebSocketShard.js#L704
The one I re-wrote and works https://sourceb.in/CWJ1RQI9IY
the pr: discordjs/discord.js#7581

WebSocket close does not work only on a specific closeCode [4009] ie session timeout and that too its rarely with others and frequent with mine.

@lpinca
Copy link
Member

lpinca commented Mar 1, 2022

Is the close code received in the close frame? If so it does not make sense because the only way to read it is from a listener of the 'close' event.

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 0 }, function () {
  const ws = new WebSocket(`ws://localhost:${wss.address().port}`);

  ws.on('close', function (code) {
    console.log(code);
    wss.close();
  });
});

wss.on('connection', function (ws) {
  ws.close(4009);
});

@imranbarbhuiya
Copy link

Is the close code received in the close frame? If so it does not make sense because the only way to read it is from a listener of the 'close' event.

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 0 }, function () {
  const ws = new WebSocket(`ws://localhost:${wss.address().port}`);

  ws.on('close', function (code) {
    console.log(code);
    wss.close();
  });
});

wss.on('connection', function (ws) {
  ws.close(4009);
});

If I'm not wrong the ws.onClose and ws.on("close", cb) are the same.

In our case (@legendhimslef ) the onClose event isn't getting emitted. The readyState is stuck at CLOSING state (30s+).

here's our ws implementation by djs https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L260.

We've tried both close(4009) and terminate() but it's not working.

@legendhimself
Copy link

legendhimself commented Mar 1, 2022

Is the close code received in the close frame? If so it does not make sense because the only way to read it is from a listener of the 'close' event.

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 0 }, function () {
  const ws = new WebSocket(`ws://localhost:${wss.address().port}`);

  ws.on('close', function (code) {
    console.log(code);
    wss.close();
  });
});

wss.on('connection', function (ws) {
  ws.close(4009);
});

If I'm not wrong the ws.onClose and ws.on("close", cb) are the same.

In our case (@legendhimslef ) the onClose event isn't getting emitted. The readyState is stuck at CLOSING state (30s+).

here's our ws implementation by djs https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L260.

We've tried both close(4009) and terminate() but it's not working.

@lpinca
Yea, So we now closing it using

ws._socket.destroy();
ws.emitClose();

its not the best solution but we couldn't find any working solution So we tried this and it works. We only do it when the readyState is stuck at CLOSING for more than 30s.

@lpinca
Copy link
Member

lpinca commented Mar 1, 2022

ws.terminate() calls socket.destroy(). The websocket then emits the 'close' event after the socket emits the 'close' event. Make sure listeners are not removed before the 'close' event is emitted.

@imranbarbhuiya
Copy link

ws.terminate() calls socket.destroy(). The websocket then emits the 'close' event after the socket emits the 'close' event. Make sure listeners are not removed before the 'close' event is emitted.

If you check https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L567
and
https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L720

we aren't removing any listeners when getting 4009 code. We are removing listeners only in
https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L768

and this method isn't getting used in 4009 code.

@lpinca
Copy link
Member

lpinca commented Mar 1, 2022

If you write a reproducible test case using only ws (no external dependencies), then I can try to take a look at it. I'm sorry but I do not have the time do debug external code.

@legendhimself
Copy link

legendhimself commented Mar 1, 2022

ws.terminate() calls socket.destroy(). The websocket then emits the 'close' event after the socket emits the 'close' event. Make sure listeners are not removed before the 'close' event is emitted.

we only removeListeners for below events. I can assure you that none of the event were emitted which could have removed the listeners, cleanup() removed the listeners.
image

Here is a log after calling ws.close() and then ws.terminate()
image
Here onSetIntervals of 5s we checked for the readyState and its indefinetaly on stuck on CLOSING for 10mins+ until I restarted the process.

and what do you think about my fixes by directly using socket.destroy() then emitClose(). So far for 24hrs, it reconnected back without a sweat on 4009.

@lpinca
Copy link
Member

lpinca commented Mar 1, 2022

I have no idea, but doesn't this https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L725 removes the listeners before the the 'close' event is emitted? I mean can't WebSocketShard#destroy() be called while the websocket is in "CLOSING" state?

Again, please try to reproduce the issue using only ws, otherwise I can't be of any help.

@legendhimself
Copy link

legendhimself commented Mar 1, 2022

I have no idea, but doesn't this https://github.com/discordjs/discord.js/blob/988a51b7641f8b33cc9387664605ddc02134859d/src/client/websocket/WebSocketShard.js#L725 remove the listeners before the the 'close' event is emitted? I mean can't WebSocketShard#destroy() be called while the websocket is in "CLOSING" state?

Again, please try to reproduce the issue using only ws, otherwise I can't be of any help.

Yes but that is in the else block [connection not open] and that is not executed in my case, it executes the if block ie if the connection object exists and is probably opened, and then tries to close it in the next few steps . Thank you very much for the replies though. I'll try to recreate it, But on my side ws. close() won't emit the close event, I added console logs in both ws.close() and ws.terminate() and it doesn't seem it emits the close event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants