New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket disconnect from Slack after ~1m #1471

Closed
mathias-lang-sociomantic opened this Issue Apr 11, 2016 · 6 comments

Comments

Projects
None yet
2 participants
@mathias-lang-sociomantic
Contributor

mathias-lang-sociomantic commented Apr 11, 2016

I am trying to interface with slack using Vibe.d.
I am successfully connecting, sending messages and receiving them.
However after less than a minute (Usually from 26 to 50s), I get disconnected.

[...] Connection accepted
[...] Nothing happen (or something happen, doesn't matter).

[8B4F3F15:00000000 trc] socket 9 read event!
[8B4F3F15:8B545D15 dbg] Read frame: ping true
[8B4F3F15:8B545D15 trc] evbuffer_add (fd 9): 85 B
[8B4F3F15:8B545D15 trc] bufferevent_flush
[8B4F3F15:00000000 trc] socket 9 write event (false)!
[8B4F3F15:00000000 trc] socket 9 read event!
[8B4F3F15:8B545D15 dbg] Read frame: text true
Received: {"type":"reconnect_url","url":"wss://mpmulti-a1ni.slack-msgs.com/websocket/XXXXX"}
[8B4F3F15:00000000 trc] socket 9 read event!
[8B4F3F15:8B545D15 dia] Error while reading websocket message: Reading from TLS stream was unsuccessful with ret 0
[8B4F3F15:8B545D15 dia] Closing connection.
[8B4F3F15:8B545D15 trc] OpenSSLStream finalize
[8B4F3F15:8B545D15 trc] evbuffer_add (fd 9): 85 B
[8B4F3F15:8B545D15 dbg] TCP close request true open
[8B4F3F15:8B545D15 dia] Actively closing TCP connection
[8B4F3F15:8B545D15 trc] Closing socket 9...
[8B4F3F15:00000000 dbg] Socket event on fd 9: 17 (1E78298 vs 1E78298)
[8B4F3F15:00000000 dbg] Connection was closed by remote peer (fd 9).
[8B4F3F15:00000000 trc] resuming corresponding task...
[8B4F3F15:8B545D15 trc] ...socket 9 closed.
[8B4F3F15:8B4FE715 CRITICAL] Task terminated with uncaught exception: Connection closed while reading message.
[8B4F3F15:8B4FE715 dbg] Full error: vibe.http.websockets.WebSocketException@../../.dub/packages/vibe-d-0.7.28/source/vibe/http/websockets.d(462): Connection closed while reading message.
[8B4F3F15:8B4FE715 dbg] ----------------
/usr/include/dmd/phobos/std/exception.d:617 bool std.exception.enforceEx!(vibe.http.websockets.WebSocketException).enforceEx!(bool).enforceEx(bool, lazy immutable(char)[], immutable(char)[], ulong) [0x80080e]
../../.dub/packages/vibe-d-0.7.28/source/vibe/http/websockets.d:462 _D4vibe4http10websockets9WebSocket7receiveMFMDFMC4vibe4http10websockets24IncomingWebSocketMessageZvZ9__lambda2MFZv [0x7ea567]
../../.dub/packages/vibe-d-0.7.28/source/vibe/core/sync.d:101 void vibe.core.sync.performLocked!(_D4vibe4http10websockets9WebSocket7receiveMFMDFMC4vibe4http10websockets24IncomingWebSocketMessageZvZ9__lambda2MFZv, vibe.core.sync.InterruptibleTaskMutex).performLocked(vibe.core.sync.InterruptibleTaskMutex) [0x7ea4dd]
../../.dub/packages/vibe-d-0.7.28/source/vibe/http/websockets.d:460 void vibe.http.websockets.WebSocket.receive(scope void delegate(scope vibe.http.websockets.IncomingWebSocketMessage)) [0x7e892e]
../../.dub/packages/vibe-d-0.7.28/source/vibe/http/websockets.d:446 immutable(char)[] vibe.http.websockets.WebSocket.receiveText(bool) [0x7e87f2]
source/slackdot/app.d:57 void app.connectToSlack() [0x72d486]
source/slackdot/app.d:25 void app._sharedStaticCtor2().__lambda1() [0x72d2b8]
../../.dub/packages/vibe-d-0.7.28/source/vibe/core/core.d:486 void vibe.core.core.makeTaskFuncInfo!(void delegate()).makeTaskFuncInfo(ref void delegate()).callDelegate(vibe.core.core.TaskFuncInfo*) [0x72b679]
../../.dub/packages/vibe-d-0.7.28/source/vibe/core/core.d:1084 void vibe.core.core.CoreTask.run() [0x8122a9]
??:? void core.thread.Fiber.run() [0x90b8c9]
??:? fiber_entryPoint [0x90b64b]
??:? [0xffffffff]

Slack very rarely closes WebSocket connection so I am assuming it is something on Vibe.d side.

The code is however very simple ATM:

shared static this ()
{
    runTask(() => connectToSlack());
}

void connectToSlack ()
{
    Json infos;
    requestHTTP("https://slack.com/api/rtm.start?token=" ~ auth_token,
                (scope HTTPClientRequest req) {},
                (scope HTTPClientResponse res)
                {
                    infos = res.readJson;
                });
    if (!infos["ok"].get!bool)
    {
        logInfo("Error while connecting to Slack!");
        assert(0);
    }
    auto url = infos["url"].get!istring;
    logInfo("Successfully connected to slack, provided URL: %s", url);
    auto ws_url = URL(url);
    auto ws = connectWebSocket(ws_url);
    assert(ws.receiveText() == `{"type":"hello"}`);
    logInfo("WebSocket connected");

    while (ws.connected())
    {
        auto txt = ws.receiveText;
        logInfo("Received: %s", txt);
    }
    logFatal("Connection to Slack lost!");
}

Any suggestion on how to debug this ?

@mathias-lang-sociomantic mathias-lang-sociomantic changed the title from WebSocket disconnect after to WebSocket disconnect from Slack after ~1m Apr 11, 2016

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Apr 21, 2016

Member

Could it be that it expects keep-alive messages? Since nothing is sent, I don't know either how to debug this apart from trying to send something. But you should use waitForData() instead of connected to check for read-readyness, otherwise it will be most likely be stuck in receiveText when the connection is closed rather than gracefully leaving the loop.

Member

s-ludwig commented Apr 21, 2016

Could it be that it expects keep-alive messages? Since nothing is sent, I don't know either how to debug this apart from trying to send something. But you should use waitForData() instead of connected to check for read-readyness, otherwise it will be most likely be stuck in receiveText when the connection is closed rather than gracefully leaving the loop.

@mathias-lang-sociomantic

This comment has been minimized.

Show comment
Hide comment
@mathias-lang-sociomantic

mathias-lang-sociomantic Apr 22, 2016

Contributor

The log I posted doesn't include message to make it more readable.
However, I tried to make slack send data way more often (every 3 to 5 seconds), and it didn't have any visible effect, that is I got disconnected in less than a minute.

It could somehow be related to the ping functionality, as most of the time the close happen in the vicinity of this even, though not all the time.

But you should use waitForData() instead of connected to check for read-readyness, otherwise it will be most likely be stuck in receiveText when the connection is closed rather than gracefully leaving the loop.

That was my initial though, however its not what the example shows.

Contributor

mathias-lang-sociomantic commented Apr 22, 2016

The log I posted doesn't include message to make it more readable.
However, I tried to make slack send data way more often (every 3 to 5 seconds), and it didn't have any visible effect, that is I got disconnected in less than a minute.

It could somehow be related to the ping functionality, as most of the time the close happen in the vicinity of this even, though not all the time.

But you should use waitForData() instead of connected to check for read-readyness, otherwise it will be most likely be stuck in receiveText when the connection is closed rather than gracefully leaving the loop.

That was my initial though, however its not what the example shows.

@mathias-lang-sociomantic

This comment has been minimized.

Show comment
Hide comment
@mathias-lang-sociomantic

mathias-lang-sociomantic Apr 28, 2016

Contributor

After one of my colleague rolled out his own slack bot, it comes down to the fact that Slack believe the client hung up and closes the connection.

Slack support WebSocket ping, but know that some browser might not, so they have a fallback "ping" packet, which I sent every 5 seconds in order to keep the connection kickin'.
However, the Slack documentation doesn't mention ping as a requirement, but as a recommendation (for UX). So I have reasons to believe (albeit no proof) that the ping implementation might be incomplete.

Trying to leave a bot open did however uncover another issue. I came back to my bot after a couple of hour, and had the following output:

Received: {"type":"pong","reply_to":1012}
Received: {"type":"pong","reply_to":1013}
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Connection to Slack lost!
Program exited with code -11
dub --compiler=gdc  12,38s user 0,87s system 0% cpu 1:26:04,40 total

Looking at the WebSocket code quickly, the only place SystemRNG is instantiated from is here. This is only called from (writeFrame)[https://github.com/rejectedsoftware/vibe.d/blob/85178cfd3f1d8299c6e73956ef9eb4a31225b488/source/vibe/http/websockets.d#L716], which in turn is called from close, and... sendPing.

So currently the WebSocket implementation open a new FD every [ping interval].

Contributor

mathias-lang-sociomantic commented Apr 28, 2016

After one of my colleague rolled out his own slack bot, it comes down to the fact that Slack believe the client hung up and closes the connection.

Slack support WebSocket ping, but know that some browser might not, so they have a fallback "ping" packet, which I sent every 5 seconds in order to keep the connection kickin'.
However, the Slack documentation doesn't mention ping as a requirement, but as a recommendation (for UX). So I have reasons to believe (albeit no proof) that the ping implementation might be incomplete.

Trying to leave a bot open did however uncover another issue. I came back to my bot after a couple of hour, and had the following output:

Received: {"type":"pong","reply_to":1012}
Received: {"type":"pong","reply_to":1013}
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Task terminated with uncaught exception: Cannot init SystemRNG: Error id is 24, Error message is: "Cannot open file `/dev/urandom' in mode `rb' (Too many open files)"
Connection to Slack lost!
Program exited with code -11
dub --compiler=gdc  12,38s user 0,87s system 0% cpu 1:26:04,40 total

Looking at the WebSocket code quickly, the only place SystemRNG is instantiated from is here. This is only called from (writeFrame)[https://github.com/rejectedsoftware/vibe.d/blob/85178cfd3f1d8299c6e73956ef9eb4a31225b488/source/vibe/http/websockets.d#L716], which in turn is called from close, and... sendPing.

So currently the WebSocket implementation open a new FD every [ping interval].

@s-ludwig s-ludwig added the http label Jul 5, 2017

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Jul 5, 2017

Member

With the SystemRNG allocation issue being fixed and the pings in place, did the original issue still occur?

Member

s-ludwig commented Jul 5, 2017

With the SystemRNG allocation issue being fixed and the pings in place, did the original issue still occur?

@mathias-lang-sociomantic

This comment has been minimized.

Show comment
Hide comment
@mathias-lang-sociomantic

mathias-lang-sociomantic Jul 5, 2017

Contributor

Last time I tried, it still did, but I didn't spend time investigating. I'll try to run the test code again soon ™️

Contributor

mathias-lang-sociomantic commented Jul 5, 2017

Last time I tried, it still did, but I didn't spend time investigating. I'll try to run the test code again soon ™️

s-ludwig added a commit that referenced this issue Jul 17, 2017

Fix WebSocket ping logic. Fixes #1471.
- Fixes server initiated PINGs to be properly masked
- Fixes the detection logic for matching PONG messages
- Uses send() to send control messages instead of manually assembling frames
- Streamlines processing of control messages

@s-ludwig s-ludwig closed this in 9c11b67 Jul 17, 2017

s-ludwig added a commit that referenced this issue Jul 17, 2017

@mathias-lang-sociomantic

This comment has been minimized.

Show comment
Hide comment
@mathias-lang-sociomantic

mathias-lang-sociomantic Oct 20, 2017

Contributor

Looks like everything is working correctly, thanks again !

Contributor

mathias-lang-sociomantic commented Oct 20, 2017

Looks like everything is working correctly, thanks again !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment