New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket connections remain alive and merge with subsequent web socket connections. #407

Closed
ohjames opened this Issue Nov 26, 2013 · 14 comments

Comments

Projects
None yet
3 participants
@ohjames

ohjames commented Nov 26, 2013

Oh you're going to love this one...

So lets say I have a websocket server which sends "tick" every 1 second over the web socket.

Sometimes when I disconnect the websocket connection (by reloading the page) then my second connection will start getting two ticks per second instead of one. Maybe after a few more reload I'll start getting 4.

Essentially what's happened is "connected" on the previous web socket keeps returning true so the first websocket fibre remains active. And then when the TCP connection is re-used the timer event on the first socket is still causing websocket data from a previous websocket connection to bleed into the next web socket connection that has used the same TCP socket.

After many reloads sometimes I get up to 4 sets of duplicate ticks... If I leave the browser window open for a long time and reload eventually some of the additional connections might go away, but the only way to terminate them for sure is to restart the browser.

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Nov 26, 2013

I think ultimately this bug is a result of "connected" on a web socket being used by a fibre failing to realise the connection is over and thus persisting it beyond the browser re-using the TCP connection.

ohjames commented Nov 26, 2013

I think ultimately this bug is a result of "connected" on a web socket being used by a fibre failing to realise the connection is over and thus persisting it beyond the browser re-using the TCP connection.

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Nov 26, 2013

Some code to illustrate this:

void newWebsocketConnection(WebSocket sock) {
  auto run = true;

  runTask(() {
    while (run) {
      sleep(1.seconds);
      sock.send("tick");
    }
  })

  while (sock.connected) {
    // read data etc.
  }

  // stop the ticking fibre
  // ERROR: Sometimes a new websocket connection can be handled on the same TCP connection before we get here
  //        and thus we have two tick tasks running on the same TCP connection (one that should have been terminated).
  run = false;
}

ohjames commented Nov 26, 2013

Some code to illustrate this:

void newWebsocketConnection(WebSocket sock) {
  auto run = true;

  runTask(() {
    while (run) {
      sleep(1.seconds);
      sock.send("tick");
    }
  })

  while (sock.connected) {
    // read data etc.
  }

  // stop the ticking fibre
  // ERROR: Sometimes a new websocket connection can be handled on the same TCP connection before we get here
  //        and thus we have two tick tasks running on the same TCP connection (one that should have been terminated).
  run = false;
}
@schancel

This comment has been minimized.

Show comment
Hide comment
@schancel

schancel Nov 26, 2013

Definitely seen some similar behavior. I am writing a chat client and a couple times I have gotten it to end up with getting duplicate copies of messages. I could never track it down, and it's very difficult to reproduce. I thought that the fix for Issue #390 also fixed this issue?

schancel commented Nov 26, 2013

Definitely seen some similar behavior. I am writing a chat client and a couple times I have gotten it to end up with getting duplicate copies of messages. I could never track it down, and it's very difficult to reproduce. I thought that the fix for Issue #390 also fixed this issue?

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Nov 26, 2013

@schancel The version of vibed I'm using is from after that fix (3 days old) so this must be a different issue if not similar.

ohjames commented Nov 26, 2013

@schancel The version of vibed I'm using is from after that fix (3 days old) so this must be a different issue if not similar.

@schancel

This comment has been minimized.

Show comment
Hide comment
@schancel

schancel Nov 26, 2013

I do not believe that the error happens for the reasons you are stating. It's unlikely that the sockets are recycled fast enough for the ID to be reused in time to cause this. I think something more nefarious is happening with the way WebSocket messages are being buffered and sent. But, with that said, I haven't been able to get this to happen since the patch or Issue #390.

schancel commented Nov 26, 2013

I do not believe that the error happens for the reasons you are stating. It's unlikely that the sockets are recycled fast enough for the ID to be reused in time to cause this. I think something more nefarious is happening with the way WebSocket messages are being buffered and sent. But, with that said, I haven't been able to get this to happen since the patch or Issue #390.

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Nov 26, 2013

Member

I wonder if the browser behaves correctly here. If it reuses the same connection, it should (AFAICS) send a close frame before that and in that case connected should also return false. Assuming latest master of course. I'll try to reproduce locally later.

Member

s-ludwig commented Nov 26, 2013

I wonder if the browser behaves correctly here. If it reuses the same connection, it should (AFAICS) send a close frame before that and in that case connected should also return false. Assuming latest master of course. I'll try to reproduce locally later.

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Nov 26, 2013

@s-ludwig I'll verify later by trying multiple browsers and using tcpdump to monitor what they are sending, I'll also update my vibed from the 3-day old master to the latest master.

ohjames commented Nov 26, 2013

@s-ludwig I'll verify later by trying multiple browsers and using tcpdump to monitor what they are sending, I'll also update my vibed from the 3-day old master to the latest master.

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Nov 26, 2013

Member

There is an issue in the code - if any exception is thrown in the read loop (like for example because the connection has been terminated), run = false; will never be executed. Is it possible that this is the case and the ticks still arrive on the same web socket connection somehow?

I'd change the code as follows:

void newWebsocketConnection(WebSocket sock) {
  runTask(() {
    // sock.connected doesn't block anymore, so can be used safely for sending data
    while (sock.connected) {
      sleep(1.seconds);
      sock.send("tick");
    }
  })

  // using waitForData avoids sock.receive() to throw when the connection is closed remotely
  while (sock.waitForData()) {
    // read data etc.
  }
}

But of course only after the cause of the error has been verified.

Member

s-ludwig commented Nov 26, 2013

There is an issue in the code - if any exception is thrown in the read loop (like for example because the connection has been terminated), run = false; will never be executed. Is it possible that this is the case and the ticks still arrive on the same web socket connection somehow?

I'd change the code as follows:

void newWebsocketConnection(WebSocket sock) {
  runTask(() {
    // sock.connected doesn't block anymore, so can be used safely for sending data
    while (sock.connected) {
      sleep(1.seconds);
      sock.send("tick");
    }
  })

  // using waitForData avoids sock.receive() to throw when the connection is closed remotely
  while (sock.waitForData()) {
    // read data etc.
  }
}

But of course only after the cause of the error has been verified.

@schancel

This comment has been minimized.

Show comment
Hide comment
@schancel

schancel Nov 26, 2013

Definitely possible. Is it possible that FireFox reuses the HTTP 1.1 connection when the WebSocket itself fails?

schancel commented Nov 26, 2013

Definitely possible. Is it possible that FireFox reuses the HTTP 1.1 connection when the WebSocket itself fails?

@schancel

This comment has been minimized.

Show comment
Hide comment
@schancel

schancel Nov 26, 2013

Actually, IIRC. Sönke what you say is probably exactly what is happening. Prior to the fix for Issue #390 my send handlers were not getting closed down properly when the read thread failed in my read-task. I thought somehow vibe.concurrency was sending the same task two messages since I was seeing two messages send out via WebSockets according to the my log file. However, I wasn't able to reproduce it -- because fixing issue #370 fixed my exception handling so I properly interrupted() my writeTask.

I am really confused though as to how the data manages to make it all the way back to the webbrowser though? Is it really possible the socket # gets reused so quickly?

schancel commented Nov 26, 2013

Actually, IIRC. Sönke what you say is probably exactly what is happening. Prior to the fix for Issue #390 my send handlers were not getting closed down properly when the read thread failed in my read-task. I thought somehow vibe.concurrency was sending the same task two messages since I was seeing two messages send out via WebSockets according to the my log file. However, I wasn't able to reproduce it -- because fixing issue #370 fixed my exception handling so I properly interrupted() my writeTask.

I am really confused though as to how the data manages to make it all the way back to the webbrowser though? Is it really possible the socket # gets reused so quickly?

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Nov 26, 2013

Member

Hard to say. Did you get the change to test with a different browser? But anywhere, there has to be a hole somewhere allowing messages to be sent after the original HTTP request has been handled. I'll inspect the code a bit manually.

Member

s-ludwig commented Nov 26, 2013

Hard to say. Did you get the change to test with a different browser? But anywhere, there has to be a hole somewhere allowing messages to be sent after the original HTTP request has been handled. I'll inspect the code a bit manually.

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Nov 26, 2013

I'm at work at the moment so I haven't had a chance to test, but I can confirm you're right about the exception causing the issue. I can work around it using waitForData or with an exception handler and then the issue goes away.

So I guess after all this is just a case of lacking graceful error condition handling.

ohjames commented Nov 26, 2013

I'm at work at the moment so I haven't had a chance to test, but I can confirm you're right about the exception causing the issue. I can work around it using waitForData or with an exception handler and then the issue goes away.

So I guess after all this is just a case of lacking graceful error condition handling.

s-ludwig added a commit that referenced this issue Nov 26, 2013

Ensure that no messages can be sent on the same connection by a WebSo…
…cket whose handler has thrown an exception. See #407.

s-ludwig added a commit that referenced this issue Nov 26, 2013

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Nov 26, 2013

Thanks for fixing this so fast (again). There's no reason for anyone to fear using bleeding edge tech like vibed with such a responsive community! I've confirmed that the issue was replicable before the change (in Chromium and Firefox). I have been unable to replicate the issue after those commits, I think it's fixed it! Feel free to close.

ohjames commented Nov 26, 2013

Thanks for fixing this so fast (again). There's no reason for anyone to fear using bleeding edge tech like vibed with such a responsive community! I've confirmed that the issue was replicable before the change (in Chromium and Firefox). I have been unable to replicate the issue after those commits, I think it's fixed it! Feel free to close.

@ohjames ohjames closed this Nov 26, 2013

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Nov 26, 2013

Member

Thanks for confirming!

Member

s-ludwig commented Nov 26, 2013

Thanks for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment