Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when using an event bus bridge hook (events handler), between socket closed event and other events #1380

Closed
julien3 opened this issue Sep 9, 2019 · 4 comments
Assignees
Labels
Milestone

Comments

@julien3
Copy link
Contributor

julien3 commented Sep 9, 2019

Version

  • vert.x core: 3.7.1 (should apply to 3.8.1, master)
  • vert.x web: 3.7.1 (should apply to 3.8.1, master)

Context

We started to see that some messages were not delivered to the same event bus (bridge) address. After multiple re-connections, a client can't listen anymore to some particular address as it receives nothing or receives too few messages out of all the messages sent. The server needs to be restarted.

When an event bus bridge hook (events handler) is used, it seems like there's a race condition where io.vertx.ext.web.handler.sockjs.impl.EventBusBridgeImpl.java may call handleSocketClosed while we're still in internalHandleRegister (or unregister, sendorpub). This can happen before checkCallHook's okAction is run as we're still waiting for the hook (event handlers) to accept the event...

Do you have a reproducer?

See steps below

Steps to reproduce

We reproduced this with custom tools but I think this scenario is sufficient to reproduce it:

  1. ... Set an event bus bridge events handler
  2. ... Intercept "REGISTER" event and wait enough time before accepting the event
  3. ... Meanwhile, disconnect the client
  4. ... A MessageConsumer is created for the given address although socket is already closed and any message sent to this address might now land into this MessageConsumer associated to nothing

Extra

Should apply to 3.8.1 and master as nothing was changed that could fix this issue in this file (EventBusBridgeImpl.java).

We fixed this internally by just checking for
if (!sockInfos.containsKey(sock)) return;
at the beginning of checkCallHook's okAction handlers.

It could probably explain #1327: the message consumer's handler references the closed socket when the race condition issue happens while a REGISTER event is being handled.

@julien3
Copy link
Contributor Author

julien3 commented Jun 14, 2021

Still happening on release v4.1.

A quick reproducer based on https://github.com/vert-x3/vertx-examples/tree/4.x/web-examples/src/main/java/io/vertx/example/web/realtime:

package io.vertx.example.web.realtime;

import io.vertx.core.AbstractVerticle;
import io.vertx.example.util.Runner;
import io.vertx.ext.bridge.BridgeEventType;
import io.vertx.ext.bridge.PermittedOptions;
import io.vertx.ext.web.Router;
import io.vertx.ext.web.handler.StaticHandler;
import io.vertx.ext.web.handler.sockjs.SockJSBridgeOptions;
import io.vertx.ext.web.handler.sockjs.SockJSHandler;

/*
 * @author <a href="http://tfox.org">Tim Fox</a>
 */
public class Server extends AbstractVerticle {
  private static int _counter = 1;

  // Convenience method so you can run it in your IDE
  public static void main(String[] args) {
    Runner.runExample(Server.class);
  }

  @Override
  public void start() throws Exception {

    Router router = Router.router(vertx);

    // Allow outbound traffic to the news-feed address

    SockJSBridgeOptions options = new SockJSBridgeOptions()
      .addOutboundPermitted(new PermittedOptions().setAddress("news-feed"));
    SockJSHandler sockJSHandler = SockJSHandler.create(vertx);
    Router subRouter = sockJSHandler.bridge(options, event -> {

      // You can also optionally provide a handler like this which will be passed any events that occur on the bridge
      // You can use this for monitoring or logging, or to change the raw messages in-flight.
      // It can also be used for fine grained access control.

      if (event.type() == BridgeEventType.SOCKET_CREATED) {
        System.out.println("A socket was created");
        event.complete(true);
      } else if(event.type() == BridgeEventType.SOCKET_CLOSED) {
        System.out.println("A socket was closed");
        event.complete(true);
      } else if(event.type() == BridgeEventType.REGISTER) {
        System.out.println("Delaying REGISTER event...");
        vertx.executeBlocking(future -> {
          try {
            Thread.sleep(5000);
            future.complete();
          } catch (InterruptedException e) {
            future.fail(e);
          }
        }, blocking_res -> {
          if(blocking_res.succeeded()) {
            System.out.println("Accept REGISTER event");
            event.complete(true);
          } else {
            System.err.println("Couldn't sleep after receiving a REGISTER bridge event" +
              (blocking_res.cause() != null ? ": " + blocking_res.cause().getMessage() : ""));
            event.complete(false);
          }
        });
      } else {
        event.complete(true);
      }
    });
    router.mountSubRouter("/eventbus", subRouter);

    // Serve the static resources
    router.route().handler(StaticHandler.create());

    vertx.createHttpServer().requestHandler(router).listen(8080);
    System.out.println("Server is listening on port 8080");

    // Publish a message to the address "news-feed" every second
    vertx.setPeriodic(1000, t -> vertx.eventBus().send("news-feed", "news #" + _counter++ + " from the server!"));
  }
}

Steps to reproduce

  • Run the Server verticle
  • Open your browser to http://localhost:8080
  • Refresh the page within 5 seconds
  • What the server will print out to the console:

Server is listening on port 8080
Jun 14, 2021 3:57:33 PM io.vertx.core.impl.launcher.commands.VertxIsolatedDeployer
INFO: Succeeded in deploying verticle
A socket was created
Delaying REGISTER event...
A socket was closed
A socket was created
Delaying REGISTER event...
Accept REGISTER event
Accept REGISTER event

  • What the browser shows:

Latest news:

news 25 from the server!
news 23 from the server!
news 21 from the server!
news 19 from the server!
news 17 from the server!
news 15 from the server!

  • What's expected: the browser should show continuous news #number messages
  • What's going on: the first socket was closed although the REGISTER bridge event was being processed. When the event was processed, a message consumer was added to the list along with the already closed socket. news 16, 18, etc. messages are sent to that zombie consumer.

@pmlopes pmlopes self-assigned this Jun 14, 2021
@pmlopes pmlopes added the bug label Jun 14, 2021
@pmlopes pmlopes added this to the 4.1.1 milestone Jun 14, 2021
@vietj
Copy link
Contributor

vietj commented Jun 17, 2021

can you investigate this @pmlopes it looks related to a websocket potential leak

@pmlopes
Copy link
Member

pmlopes commented Jun 18, 2021

@julien3 I've created PR to address the issue. Like you researched, this is a timing issue, given that the interceptor is run as an asyncrhonous call, it can happen that the call completes the socket state has changed. The map that holds the data is still valid, there are no issues there, so the fix, is about asserting that the socket state (more exactly, the presence of it) is the same at the beginning and at the end of the interceptor call.

@pmlopes
Copy link
Member

pmlopes commented Jun 18, 2021

@vietj I believe that the issue is fixed in the referred PR. I don't think there's any leak as it turned out that it was a timing bug.

pmlopes added a commit that referenced this issue Jun 21, 2021
Fix #1380: Only run success action if socket map state is still valid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants