Close the connection cleanly when an error occurs #1899

lpinca · 2021-06-08T10:22:28Z

Instead of destroying the socket, try to close the connection cleanly
if an error (such as a data framing error) occurs after the opening
handshake has completed.

Also, to comply with the specification, use the 1006 status code if no
close frame is received, even if the connection is closed due to an
error.

Fixes #1898

Instead of destroying the socket, try to close the connection cleanly if an error (such as a data framing error) occurs after the opening handshake has completed. Also, to comply with the specification, use the 1006 status code if no close frame is received, even if the connection is closed due to an error. Fixes #1898

lpinca · 2021-06-08T10:52:06Z

@pimterry This should address #1898 with one caveat/gotcha. The streaming interface destroys the stream when an error occurs

ws/lib/stream.js

Line 84 in 05b8ccd

duplex.destroy(err);

so the close frame might not be sent as the socket is immediately destroyed, similarly to #1892 (comment).

What do you think?

pimterry

It seems like the correct code is emitted now after errors, and the close frame is correctly sent back, which is great, thank you! I'm not sure the connection teardown is totally 100% solid in some edge cases though (see comments).

Regarding websocket streams - yes, I agree, the behaviour there is slightly different now. It would be nice to make that send the close frame too if possible...

That looks easy to do though - in general AFAICT you don't ever need to call ws.terminate() on the websocket (in _destroy) if it has already thrown an error. In that case you always know that it's already tearing itself down, since that's the contract of an error event.

I'm not sure if there's an easy way to check whether an error was thrown from outside, but you could add a wsErrored variable inside createWebSocketStream to track this without much trouble and skip ws.terminate() in that case. Would that work?

Looks great overall otherwise, very neatly done, thanks for looking into this so quickly! 👍

pimterry · 2021-06-08T11:18:54Z

lib/websocket.js

@@ -808,11 +808,10 @@ function receiverOnError(err) {
  const websocket = this[kWebSocket];

  websocket._socket.removeListener('data', socketOnData);
+  websocket._socket.resume();


Just to make sure I've understood how this works - this dumps all incoming data (and so ignores the echoed close frame) because the socket is still flowing but there's now no data listener (previous line), is that right?

Seems correct, just checking I've understood.

Yes, it allows the 'end' event on socket to be emitted, so that socket.end() is then called on it.

pimterry · 2021-06-08T11:26:57Z

lib/websocket.js

  websocket.emit('error', err);
-  websocket._socket.destroy();


And here, this no longer destroys the socket - instead it's now either closed cleanly when the remote end closes its end of the socket (by socketOnEnd) or it's destroyed after the close timeout. Right?

I'm not sure this is totally safe as-is. As far as I can tell, we don't explicitly call .end() on the socket when sending (sender never calls it anywhere I can see). That means we send the close frame, but then we keep the outgoing socket fully open even though we're not sending any more data.

That's normally OK when there are no errors, because the remote peer will echo the close frame, our receiver will parse it, emit conclude, then we go to websocket.close(), and that calls socket.end() because we have two close frames. All good 👍

With this change though, that won't happen, because we've disconnected the receiver.

If the remote peer is healthy and well behaved it is still fine: they'll receive a close frame from this error, so they'll send a close frame back, then they'll end() their socket (because they have two close frames) so socketOnEnd will fire here, and we'll end our socket too. Great 👍

If the remote peer is not OK though, or if it just doesn't close its end of the socket for some reason, then we're in trouble, and we will keep the socket open until the timeout unnecessarily.

Imagine two ws peers both using this code who both hit an error at the same time (the connection gets generally corrupted somewhere, who knows).

Both peers will send each other a close frame with an error code, and call socket.resume() to ignore all incoming traffic.

Neither peer will call .end() yet.

Both peers will never parse the other peer's close frame, so the receiver never concludes, so we never call websocket.close() with two close frames, so we never call .end() there.

Both peers didn't call .end() so we never call socketOnEnd.

We sit in a deadlock for a while with the sockets fully open, until we timeout 👎

Not a disaster, but unnecessarily messy. I think this is easy to fix though: I think the sender should always call socket.end() after it sends a close frame, so that it half-closes the socket. It seems like that's always the right choice, even when there are no errors, becacuse we know for sure that we're not sending any data after the close frame. In that case, in the above example both peers would go to socketOnEnd and everything works fine.

We still need to wait for the timeout if the remote peer never closes the socket regardless, but that's OK I think, that's when the timeout really is intended.

Does that make sense? I think I've followed this though, but let me know if I'm missing something.

And here, this no longer destroys the socket - instead it's now either closed cleanly when the remote end closes its end of the socket (by socketOnEnd) or it's destroyed after the close timeout. Right?

Correct.

...

Not a disaster, but unnecessarily messy. I think this is easy to fix though: I think the sender should always call socket.end() after it sends a close frame, so that it half-closes the socket. It seems like that's always the right choice, even when there are no errors, becacuse we know for sure that we're not sending any data after the close frame. In that case, in the above example both peers would go to socketOnEnd and everything works fine.

We still need to wait for the timeout if the remote peer never closes the socket regardless, but that's OK I think, that's when the timeout really is intended.

We can't do this because the data might be compressed. If a peer sends compressed data, a close frame, and then calls socket.end(), the 'end' event on the socket of the receiving peer might be emitted before the data is decompressed because that is done asynchronously.

It sounds like that's an issue with the receiving peer, rather than an issue with ending the socket, isn't it? Ending the socket should not necessarily immediately emit end on the websocket, if there's still asynchronous data being decompressed.

Does that problem exist today without this change? If a non-ws websocket client connects to a ws server, sends compressed data and then immediately shuts down the connection, what happens?

Does that problem exist today without this change? If a non-ws websocket client connects to a ws server, sends compressed data and then immediately shuts down the connection, what happens?

All data is read but the closing handshake will not work as expected. The close frame is not sent because the socket is closed before we had a chance to write to it.

FWIW it was working as per your suggestion before support for permessage-deflate was added and then fixed with various patches.

The close frame is not sent because the socket is closed before we had a chance to write to it.

That's fixable though, I think... It's the receiving peer who is incorrectly closing the outgoing half of the socket that it needs to send back the close frame. It just needs to wait for its own processing before it does so. I think that's possible with the normal streams events.

Right now the logic (mostly in socketOnError) is just:

Incoming socket end event -> immediate receiver.end() and outgoing socket.end()

What would happen if this was instead:

Incoming socket end event -> receiver.end()

Receiver finish event -> outgoing socket.end()

I.e. wait until the receiver is finished writing everything (including any outstanding close frames, if we've received a close frame) before the outgoing socket shuts down.

This only affects the compression case, where the first .end() call might not actually end the receiver immediately. I think the logic here to handle errors on the receiver will handle any cases where this goes wrong.

I think (I hope?) that after end() a writable will always either eventually emit 'finish' or 'error' so this ensures that one way or another we always clean up the socket, but also should mean that the data is fully decompressed and the close frame is sent before we close the outgoing half of the socket, if everything goes well (or the error will close the socket, if it doesn't).

Does that make sense? I'm not sure I've got the details 100% right, but I think this is close to something that would improve the compression behaviour in general, and there should be a route through here where we slightly decouple socket events from receiver events to allow this kind of async processing.

(This is getting a bit off topic here sorry, it seems that this is an existing bug even without this change, but it would be good to fix it up at the same time since they're related)

For completeness, with the current version of Node.js (v16.3.0), the server always gives us a socket allowed to be half-open, but the client, by default, gives us a socket not allowed to be half-open. We create the socket ourselves via the createConnection option so we could create a socket with the allowHalfOpen option set to true, but we also allow users to use a custom agent, so we would still need to handle the case where the client socket is not allowed to be half-open.

const http = require('http'); const server = http.createServer(); server.on('upgrade', function (request, socket) { console.log('Server', socket.allowHalfOpen); socket.write( 'HTTP/1.1 101 Switching Protocols\r\n' + 'Upgrade: websocket\r\n' + 'Connection: Upgrade\r\n' + '\r\n' ); }); server.listen(function () { const request = http.get({ headers: { Connection: 'Upgrade', Upgrade: 'websocket' }, port: server.address().port }); request.on('upgrade', function (response, socket) { console.log('Client', socket.allowHalfOpen); }); });

See also nodejs/node@c3b8e50. It was added in Node.js v12.9.0 but not backported to the 10.x and 8.x release lines.

That's really useful, thank you!

I think that was what I was missing, that makes it clearer why some of the current behaviour is useful. My comments above apply for the "half-open allowed" case, but I see how lots of that is more complicated when sockets might not allow that.

Even with the change in node that you linked, I agree we still have to deal with this, in both directions, because either a server or a client might provide a socket that we didn't initialize ourselves.

I think there might be an interesting improvement, where we just take over every socket when we receive it and change them to always allow half-open connections? I.e. socket.allowHalfOpen = true. That would make everything more consistent and easier to test - now all WS sockets now behave the same, and we can always properly complete the closing handshake with no risk of data loss. I don't think it would matter to code using ws - ws is going to take over .end() for the stream with or without this, it's just an implementation detail from their POV (AFAICT?).

That allowHalfOpen property is a duplex property (present at least back to node 0.10) that isn't used until the connection is closed, so this looks safe to me if it's set immediately. It's not formally documented though, I've filed an issue on node to confirm how this works and get that documented (if it is indeed safe to use it): nodejs/node#38989

I'll play with this and open a pull request in the next few days, once I've investigated everything more thoroughly, watch this space! Thanks for all the info, this is really interesting & useful.

Before you spend time on investigating and preparing a PR I have to tell you that I have just remembered another reason why the "closing handshake" is implemented in its current form (that is socket.end() is called only after a close frame is both received and sent):

Other clients/languages might not work like ws or support half-open sockets. For example see https://gist.github.com/lpinca/be56024eca9d4a1efc701364e6ec9513.

Run it and open a browser (I tried with Chrome and Safari) then wait for some data to be buffered (I waited until ~20 MiB of data was buffered).

Hit Ctrl-C. A close frame will be sent to the browser client.

Wait until the data is written, then the node process will exit by its own.

Check the file. The data includes a close frame and is framed but despite this, its size is smaller than the buffered data.

The browser client simply drops all buffered data as soon as it detects that the socket is "half-closed".

It might still worth investigating as removing socket.end() from the gist above does not work as expected on Chrome but it does on Safari.

Call `ws.terminate()` only if `duplex.destroy()` is called directly by the user and not indirectly by the listener of the `'error'` event of the `WebSocket` object. Calling `ws.terminate()` right after the `'error'` event is emitted on the `WebSocket` object, might prevent the close frame from being sent to the other peer.

lpinca · 2021-06-09T07:50:55Z

lib/stream.js

@@ -108,7 +119,8 @@ function createWebSocketStream(ws, options) {
      if (!called) callback(err);
      process.nextTick(emitClose, duplex);
    });
-    ws.terminate();
+
+    if (terminateOnDestroy) ws.terminate();


I'm not sure how I feel about this. Waiting up to 30s does not seems good, especially because calling duplex.destroy() again has no effect. When not using the Duplex wrapper users can call websocket.terminate() if they do not wait the timeout. They actually can still do that but, meh, they have to use the WebSocket object instead of the Duplex wrapper.

I don't think it's so bad, this looks good to me 👍

Users shouldn't forcibly close the websocket on error anyway - they should follow proper websocket teardown instead. If they do really want to then it's still possible by using the websocket directly, as you say, and I think it's reasonable that the stream wrapper does correct cleanup by default.

The timeout only comes into play if there's an error and sending the close frame is successful (so there's no underlying socket error) and then the remote peer never replies or closes the socket. Should be rare, impact is limited AFAICT, and closing the socket immediately instead would create the risk of the remote peer losing data we might've already sent (the last paragraph of https://datatracker.ietf.org/doc/html/rfc6455#section-1.4 covers this in more detail).

- When the socket emits the `'end'` event, do not call `socket.end()`. End only the `receiver` stream. - Do not wait for a close frame to be received and sent before calling `socket.end()`. Call it right after the close frame is sent. - When the `receiver` stream emits `'finish'`, send a close frame if no close frame is received. The assumption is that the socket is allowed to be half-open. On the server side this is always true (unless the user explicitly sets the `allowHalfOpen` property of the socket to `false`). On the client side the user might use an agent so we set `socket.allowHalfOpen` to `true` when the `http.ClientRequest` object emits the `'upgrade'` event. Refs: #1899

- When the socket emits the `'end'` event, do not call `socket.end()`. End only the `receiver` stream. - Do not wait for a close frame to be received and sent before calling `socket.end()`. Call it right after the close frame is written. - When the `receiver` stream emits `'finish'`, send a close frame if no close frame is received. The assumption is that the socket is allowed to be half-open. On the server side this is always true (unless the user explicitly sets the `allowHalfOpen` property of the socket to `false`). On the client side the user might use an agent so we set `socket.allowHalfOpen` to `true` when the `http.ClientRequest` object emits the `'upgrade'` event. Refs: #1899

pimterry reviewed Jun 8, 2021

View reviewed changes

lpinca commented Jun 9, 2021

View reviewed changes

pimterry approved these changes Jun 9, 2021

View reviewed changes

lpinca merged commit 074e6a8 into master Jun 9, 2021

lpinca deleted the gh-1898 branch June 9, 2021 19:31

lpinca mentioned this pull request Jun 14, 2021

Simplify the closing handshake #1902

Closed

pimterry mentioned this pull request Jun 22, 2021

Add some extra closing handshake tests #1906

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close the connection cleanly when an error occurs #1899

Close the connection cleanly when an error occurs #1899

lpinca commented Jun 8, 2021

lpinca commented Jun 8, 2021 •

edited

pimterry left a comment

pimterry Jun 8, 2021

lpinca Jun 8, 2021

pimterry Jun 8, 2021 •

edited

lpinca Jun 8, 2021

pimterry Jun 8, 2021

lpinca Jun 8, 2021 •

edited

pimterry Jun 8, 2021

lpinca Jun 10, 2021 •

edited

lpinca Jun 10, 2021 •

edited

pimterry Jun 10, 2021

lpinca Jun 10, 2021 •

edited

lpinca Jun 11, 2021 •

edited

lpinca Jun 9, 2021

pimterry Jun 9, 2021

Close the connection cleanly when an error occurs #1899

Close the connection cleanly when an error occurs #1899

Conversation

lpinca commented Jun 8, 2021

lpinca commented Jun 8, 2021 • edited

pimterry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pimterry Jun 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lpinca Jun 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lpinca Jun 10, 2021 • edited

Choose a reason for hiding this comment

lpinca Jun 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lpinca Jun 10, 2021 • edited

Choose a reason for hiding this comment

lpinca Jun 11, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lpinca commented Jun 8, 2021 •

edited

pimterry Jun 8, 2021 •

edited

lpinca Jun 8, 2021 •

edited

lpinca Jun 10, 2021 •

edited

lpinca Jun 10, 2021 •

edited

lpinca Jun 10, 2021 •

edited

lpinca Jun 11, 2021 •

edited