Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if this[kWebSocket] = undefined , then must check it exists else where #1606

Closed
wants to merge 1 commit into from

Conversation

amark
Copy link

@amark amark commented Jul 12, 2019

ws is fantastic, and devs/users of my Open Source library have been using my library with ws in it for years in production, they currently have like ~14M monthly active users! None of these devs/users are paying me commercially though, so this patch is on my own OSS time for their sake, but I've been very thankful that ws is also OSS like my library is, it is very much appreciated.

I'm issuing this patch because I cannot catch these errors that happen when I (1) run a websocket server on localhost:8765, (2) connect browser to websocket server (3) connect websocket server to a non-existent localhost:8080 & retry every 2 seconds (4) refresh browser & reconnect to localhost:8765 . Sometimes I may need a dgram socket running in the same instance, but it also crashes when I disable that (though maybe flooding the router with a bunch of UDP packets may be necessary to replicate).

node_modules/ws/lib/websocket.js:837
  websocket.readyState = WebSocket.CLOSING;
                       ^
TypeError: Cannot set property 'readyState' of undefined
    at Socket.socketOnClose

Either way, setting this[kWebSocket] = undefined seems dangerous (when most of the listeners start with a this[kWebSocket]) unless it is impossible for the listeners to be called again (which clearly is not the case, since I'm getting a crash where this[kWebSocket] is undefined which can only be possible if socketOnClose was called (and clearly this.removeListener('close', socketOnClose); is not working correctly).

I suspect that somehow the listeners are still active, so that should separately be fixed (but I'm not sure how), but the if(!websocket){ return } are safeguards. If the rest of the code works as intended, these lines will not break anything. If the code does not work as intended, these lines at least prevent an un-catch-able crash on production servers, which is really really important for me - so I hope this can be pulled & published as the solution, or pulled & published until another fix comes along.

Thanks again!

`ws` is fantastic, and devs/users of my Open Source library have been using my library with `ws` in it for years in production, they currently have like ~14M monthly active users! None of these devs/users are paying me commercially though, so this patch is on my own OSS time for their sake, but I've been very thankful that `ws` is also OSS like my library is, it is very much appreciated.

I'm issuing this patch because I cannot catch these errors that happen when I (1) run a websocket server on localhost:8765, (2) connect browser to websocket server (3) connect websocket server to a non-existent localhost:8080 & retry every 2 seconds (4) refresh browser & reconnect to localhost:8765 . Sometimes I may need a `dgram` socket running in the same instance, but it also crashes when I disable that (though maybe flooding the router with a bunch of UDP packets may be necessary to replicate).

```
node_modules/ws/lib/websocket.js:837
  websocket.readyState = WebSocket.CLOSING;
                       ^
TypeError: Cannot set property 'readyState' of undefined
    at Socket.socketOnClose
```

Either way, setting `this[kWebSocket] = undefined` seems dangerous (when most of the listeners start with a `this[kWebSocket]`) unless it is **impossible** for the listeners to be called again (which clearly is not the case, since I'm getting a crash where `this[kWebSocket]` is `undefined` which can only be possible if `socketOnClose` was called (and clearly `this.removeListener('close', socketOnClose);` is not working correctly).

I suspect that somehow the listeners are still active, so that should separately be fixed (but I'm not sure how), but the `if(!websocket){ return }` are safeguards. If the rest of the code works as intended, these lines will not break anything. If the code does not work as intended, these lines at least **prevent** an un-catch-able crash on production servers, which is really really important for me - so I hope this can be pulled & published as the solution, or pulled & published until another fix comes along.

Thanks again!
@amark
Copy link
Author

amark commented Jul 12, 2019

It does not seem to like the code format - I leave it to you to run whatever prettifier you want on the code/patch (I have no clue what configs you have. Why doesn't it just re-format automatically? Seems silly to leave it up to clumsy humans).

@lpinca
Copy link
Member

lpinca commented Jul 12, 2019

I'm not going to merge this without a way to reproduce the issue. This is masking a possible bug either in ws, or Node.js, not fixing it.

@amark
Copy link
Author

amark commented Jul 12, 2019

I tried to explain in my post why this isn't a mask, would you be willing to re-read it again? Thanks!

It is not a matter of a mask, the problem is doing a this[kWebSocket] = undefined in code that frequently uses this[kWebSocket] without checking if it exists is dangerous. @lpinca for instance, would you consider this code to be buggy?

var object = {foo: {bar: 1}};
setTimeout(function(){
  object.foo = undefined;
}, Math.random() * 1000);

setTimeout(function(){
  var a = object.foo.bar * 2;
}, Math.random() * 1000);

Figuring out why removeListener seems to not be working should be handled by some other unit test (that is what the bug is, and my code is not masking that or preventing it). My code would not mask any actual unit test for this. All it does is provide safety type/existence checks in case anything else fails or a race condition happens.

To reproduce, please try steps (1) ~ (4).

Would you be willing to put yourself into my shoes? My OSS user's are using this in production with millions of users (I'm not getting paid for this, and I assume you aren't either, so you can at least empathize?), and they need a fix that will work now/ASAP - I am doing this for them, I took the time to handle it, and I'm certainly not an expert in your code - I think these checks should exist even if the other unit tests pass, but I'm totally fine if you delete my code after that, but until then, this is crashing stuff and it currently cannot be caught by me on my end.

Or at least could you make this catch-able in the meanwhile? So it won't crash servers?

Thanks so much.

@lpinca
Copy link
Member

lpinca commented Jul 12, 2019

It is not a matter of a mask, the problem is doing a this[kWebSocket] = undefined in code that frequently uses this[kWebSocket] without checking if it exists is dangerous.

It is done by design. None of the listeners for which you added the guard should be called after this[kWebSocket] = undefined is set. If they do it's a bug and this patch is masking it.

The 'close' event on the {net,tls}.Socket should not be emitted more than once. Furthermore we remove the listener itself after it is called. So the real question is why the socketOnClose listener is called more than once? What is causing it?

Would you be willing to put yourself into my shoes? My OSS user's are using this in production with millions of users (I'm not getting paid for this, and I assume you aren't either, so you can at least empathize?), and they need a fix that will work now/ASAP

I'm also not payed to maintain ws, and this is completely irrelevant. gun.js is only one of the thousands of projects that use ws.

I think these checks should exist even if the other unit tests pass

No, read above.

@amark
Copy link
Author

amark commented Jul 12, 2019

None of the listeners for which you added the guard should be called

@lpinca , yet it is. Isn't a bug defined as "something that happens when it should not"?

thousands of projects that use ws.

Right, this benefits all of them.

I'm also not payed to maintain ws, and this is completely irrelevant.

Exactly, which is why I made the patch for you!

I don't expect you have the time to work on this bug.

I don't have time to figure out why removeListener fails (which again: my code doesn't mask this).

Now this is just a lose-lose-lose situation. You have an issue left open, I + others are stuck with a server that crashes, and who else is gonna fix except you or me (but my fix is not allowed)?

Sigh. I tried. Rather than just complaining. Have a good weekend. Let me know if/when/ever you'll think this'll get fixed. Bye.

@lpinca
Copy link
Member

lpinca commented Jul 12, 2019

Let me know if/when/ever you'll think this'll get fixed.

Probably never if the underlying issue is not identified and explained. I don't want to repeat myself but in my opinion this is a blind fix that does not actually fixes the underlying issue. It masks it.

We received only one bug report with the same stack trace and that user was using guess what? gun.js #1564. I have no clue why the issue happens only with gun.js.

If you really need this patch you can monkey patch ws in gun.js or temporarily fork ws and use it until a proper fix is found.

I can't merge this, sorry.

@amark
Copy link
Author

amark commented Jul 12, 2019

Did you see @nooop3 's comment #1564 (comment) on that thread?

@lpinca
Copy link
Member

lpinca commented Jul 12, 2019

Yes, but I don't understand it? What is its meaning?

@nooop3
Copy link

nooop3 commented Jul 13, 2019 via email

@jadbox
Copy link

jadbox commented Jul 13, 2019

@amark @lpinca
Maybe the reason why it seems that removeListener is not working is because the same listener is actually being added multiple times (which is allowable by node)?

"removeListener will remove, at most, one instance of a listener from the listener array. If any single listener has been added multiple times to the listener array for the specified eventName, then removeListener must be called multiple times to remove each instance"

@lpinca
Copy link
Member

lpinca commented Jul 13, 2019

It is not added multiple times, at least not by ws.

socket.on('close', socketOnClose);

@jadbox
Copy link

jadbox commented Jul 13, 2019

Is it possible that req.on('upgrade', (res, socket, head) => { is being triggered multiple times, which calls setSocket which adds the listeners? I'm not sure the flow here- when does the 'upgrade' event happen on req object and should it recall setSocket every time?

@amark
Copy link
Author

amark commented Jul 14, 2019

I do see a double emit on connection, but I'm guessing that piece is my fault - however that should cause it to register as separate sockets (in which case the close event is only fired once), but it seems like they're on the same socket, that is causing OnClose to be called a 2nd time.

The fact it is coded to support multiple listeners is cool, but assuming the opposite, that OnClose won't be called multiple times, seems like a bug waiting to happen.

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

Is it possible that req.on('upgrade', (res, socket, head) => { is being triggered multiple times, which calls setSocket which adds the listeners? I'm not sure the flow here- when does the 'upgrade' event happen on req object and should it recall setSocket every time?

The 'upgrade' event is emitted only once. It is emitted when the Upgrade: websocket and Connection: Upgrade headers are received. It requires a whole HTTP request/response roundtrip.

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

'connection' is not emitted multiple times for the same socket. There is a 1-1 correspondence between 'connection' events and WebSocket instances and another 1-1 correspondence between WebSocket instances and {net,tls}.Socket instances.

If user code changes this behavior all kind of issues may arise.

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

To reproduce, please try steps (1) ~ (4).

I'm trying to create a test case based on this.

(3) connect websocket server to a non-existent localhost:8080 & retry every 2 seconds

What does this mean?

'use strict';

const WebSocket = require('ws');
const http = require('http');

const data1 = `<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
  </head>
  <body>
    <script>
      (function () {
        var ws = new WebSocket('ws://localhost:8080');

        ws.onopen = function () {
          console.log('open');
        };
      })();
    </script>
  </body>
</html>`;

const data2 = `<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
  </head>
  <body>
    <script>
      (function () {
        function connect() {
          var ws = new WebSocket('ws://localhost:3000');

          ws.onclose = function() {
            setTimeout(connect, 2000);
          };
        }

        connect();
      })();
    </script>
  </body>
</html>`;

const server = http.createServer();

server.on('request', (req, res) => {
  if (req.url === '/foo') {
    res.writeHead(200, { 'Content-Type': 'text/html' });
    res.end(data1);
  } else if (req.url === '/bar') {
    res.writeHead(200, { 'Content-Type': 'text/html' });
    res.end(data2);
  } else {
    res.writeHead(404, { 'Content-Type': 'text/plain' });
    res.end(http.STATUS_CODES[404]);
  }
});

const wss = new WebSocket.Server({ server });

server.listen(8080, () => {
  console.log('Listening on *:8080');
});

Is this code correct?

I open /foo and establish a WebSocket connection. (1) and (2)
I open /bar in another tab and watch it fails to connect several times. (3)
I refresh the /foo tab. (4) I also tried closing the whole browser and reopening /foo again.

No error is thrown.

@amark
Copy link
Author

amark commented Jul 14, 2019

@lpinca I nailed it!

Turns out it was not a race condition with pinging another server, its a shared socket on receiving servers.

Here are options for a fix:

  • (What I hate, but guessing you'll like best) Crash on 2+ WebSocket Servers being attached to the same port.
  • (What I recommend) Use type & existence checks to safe guard your code.
  • (What is probably the most "pure" option) Don't use shared references.
  • (What is probably the most correct option) Rewrite your removeListeners to remove all listeners from all servers that share same socket.

Test to instantly replicate:

var WebSocket = require('ws');

function server(){
	return require('http').createServer().listen(8765);
}
var opt = {server: server()};

function sockets(){
	var wss = new WebSocket.Server(opt);

	wss.on('connection', function connection(ws) {

		ws.onmessage = function incoming(data) {
		  console.log('1msg', data);
		};

		ws.onerror = function(err){
			console.log('1err', ''+err);
		};

		ws.onclose = function(){
			console.log('1bye');
			ws.close();
		};

	});

	return wss;
}

var wsServer1 = sockets();
var wsServer2 = sockets();

setTimeout(function(){

	var ws = new WebSocket('ws://localhost:8765');
	ws.onclose = function(){
		console.log("bye");
	};
	ws.onerror = function(err){
		console.log("err", ''+err);
	};
	ws.onopen = function(){
		console.log("open");
	}
	ws.onmessage = function(msg){
		console.log("msg", msg);
	};

}, 100);

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

The example above is attaching the same HTTP server on two different WebSocketServer. This is not ok. It is invalid usage as the 'upgrade' event will be handled by both WebSocketServer.

If you need a shared HTTP server between multiple WebSocketServer use this https://github.com/websockets/ws#multiple-servers-sharing-a-single-https-server.

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

Closing as we now have an explanation and it's not an issue in ws. Feel free to submit a patch that detects this invalid usage and throws an error. I'm more than happy to merge it.

@lpinca lpinca closed this Jul 14, 2019
@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

For reference #885.

@amark
Copy link
Author

amark commented Jul 14, 2019

@lpinca

(A) saying it is not OK but the code being OK with it is a bug.

(B) it being documented as allowed but it then being possible to crash because no safe guards exists on those shared sockets is a bug.

If I were to submit a patch for (A) to throw on two socket servers with same http server, then it would contradict your documented use case of such (which is very valid!).

Please consider that (B) is a very reasonable case of exactly why safeguards should be added, so the use case is possible and doesn't crash.

It is your project, so you can do as you wish. Just please recognize you claimed my bug/patch was not a concern because your code only adds a listener once, yet it actually will add it more than once, and you said my fix is a mask to a deeper issue, but that deeper issue is actually a documented feature. It is fine to reject, just please note to others that either ws theory does not match to implementation, or that documented features can produce known unstable buggy side-effects in production.

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

If I were to submit a patch for (A) to throw on two socket servers with same http server, then it would contradict your documented use case of such (which is very valid!).

What does it contradict specifically?

const server = http.createServer();

const wss1 = new WebSocket.Server({ server });
const wss2 = new WebSocket.Server({ server });

This is invalid usage, not documented anywhere and we should throw an error if the user does this.

const server = http.createServer();
const wss1 = new WebSocket.Server({ noServer: true });
const wss2 = new WebSocket.Server({ noServer: true });

server.on('upgrade', function(request, socket, head) {
  const pathname = url.parse(request.url).pathname;

  if (pathname === '/foo') {
    wss1.handleUpgrade(request, socket, head, function(ws) {
      wss1.emit('connection', ws, request);
    });
  } else if (pathname === '/bar') {
    wss2.handleUpgrade(request, socket, head, function(ws) {
      wss2.emit('connection', ws, request);
    });
  } else {
    socket.destroy();
  }
});

This is valid usage and does not crash anything if used.

Just please recognize you claimed my bug/patch was not a concern because your code only adds a listener once, yet it actually will add it more than once, and you said my fix is a mask to a deeper issue, but that deeper issue is actually a documented feature.

Please show me where we document this usage:

const server = http.createServer();

const wss1 = new WebSocket.Server({ server });
const wss2 = new WebSocket.Server({ server });

Yes, exactly your patch would have masked the invalid usage.

In my opinion broken code should crash and not keep running in a broken state.

@amark
Copy link
Author

amark commented Jul 14, 2019

I missed the new WebSocket.Server({ server }); vs new WebSocket.Server({ noServer: true });, fair enough point.

Where we disagree: It is not OK to crash other people's in-production servers. Maybe if you throw them an error they can catch, then at least they can opt-out. But this bug was not emit to the .onerror callback (what is the point of that callback then?) and it cannot be caught. That forces everybody else to suffer simply because you have a particular preference - I personally have experienced sleepless nights while my parents were visiting from out of town, trying to get this salvaged on behalf of my dev community so it is fixed before their launches go out. Some of these machines I have limited access to and therefore cannot update after-the-fact, or have millions of users, or are running in an offline self-sustaining mesh network that is a template for refugee/disaster survival relief camps. This hurts other people, real world humans, that have no clue about code or tech, and they especially don't care about my (or yours) theoretical desire for code "purity".

The analogy here is code is like a building people use. The philosophy that code should crash rather than running in a broken state, is literally akin to planting layers of C4 bombs to explode if any 1 of 10 elevators goes out of service, or if a toilet gets clogged. A crash blows up the entire building and everyone in it and all the session/memory, it brings the whole server down, potentially indefinitely, on machines I may or may not have control to auto-restart. That sucks for me, but I can debug it, what really sucks is for them, they're the ones who suffer the most.

@lpinca
Copy link
Member

lpinca commented Jul 14, 2019

For what is worth here is the patch I was referring to 3641266. No footgun (pun intended) for other developers.

Yes, we disagree. Patches like this are not useful and if you keep adding them, in the long run they will make your code impossible to debug.

Furthermore this was affecting only a small percentage of ws users (only gun users?) and this made me think from the beginning that there was something wrong with how ws was used (#1564 (comment)).

This wasn't an emergency patch to fix production servers from crashing. This was a blind patch to fix an invalid usage of a dependency in gun and the blind patch was submitted upstream...

That said even if not documented, the invalid usage is allowed and that is my fault. I hope 3641266 will help in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants