Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast "smart client" vert.x cluster member is not removed after forced shutdown #24

Closed
imperatorx opened this issue Feb 17, 2016 · 7 comments

Comments

@imperatorx
Copy link

When using a "smart client" ClusterManager, if the vert.x instance is killed or disconnected from the network the eventbus registrations of this instance are not removed.

The memberRemoved(MembershipEvent membershipEvent) method is never called on surviving instances, and messages are delivered to dead addresses.

The Hazelcast MembershipListener interface, which is used by the cluster manager provides only member disconnection events and no client events.

@cescoffier
Copy link
Member

Could you provide a simple reproducer ?

@imperatorx
Copy link
Author

Sure.

Frontend (Hazelcast member) https://github.com/imperatorx/reproducer-frontend
Backend (Hazelcast client) https://github.com/imperatorx/reproducer-backend

  • Compile and run frontend first. It will try to send a message to the backend every second. It logs errors and successes to stdout.
  • Compile and run backend when frontend is fully functional. You should see the messages are successfully delivered, and the reply is logged.
  • Now shut down backend ungracefully (sigkill, task manager kill in windows, etc.).
  • You should see errors in the frontend log
  • Restart the backend
  • You should see alternating errors and successes, every second message gets lost, because the killed vert.x node is not removed, there is no error detection. Member disconnections would be detected normally.

Smart client nodes are a great use case from some scenarios (where backend nodes need to have higher security level), even if Hazelcast 3.6 will bring back lite members.

@imperatorx
Copy link
Author

I think the solution would be to make HazelcastClusterManager implement not only the MembershipListener, but also the com.hazelcast.core.ClientListener interface, and register as a client listener. Ungracefully stopped clients cause the clientDisconnected(Client) event to be raised, and the UUID of the node can be extracted.

@cescoffier
Copy link
Member

Fancy a PR ?

@imperatorx
Copy link
Author

I have investigated the usage of the ClientListener as seen at https://github.com/imperatorx/vertx-hazelcast/tree/hazelcast-client-disconnect-detection

However, sadly this workaround only works if there is only at most one smart client node in the cluster. Member nodes can register to detect client events, but clients cannot register to see them (ClientService not avaiable).

Even if a client disconnection is detected by a member using this patch, if there is more than one client node alive, there is a chance that the removal of the dead client node will not happen. On crash, each node that receives the notification, calculates a hash, and checks if it is itself responsible to call the "nodeCrashedHandler" of the HaManager. If the hash happens to be belonging to a client node (that did not get any clientDisconnected notification), that client will do nothing (doesn't even know about the event), and other members will also do nothing (since the hash doesn't belong to them).

I suggest removing this feature (smart client nodes), if reliable disconnection detection cannot be achieved, as this violates the requirements for a ClusterManager defined in the header of the HaManager.java file.

@cescoffier
Copy link
Member

Yes, I think it's a reasonable drop.
Do you have experience with the new light client of HZ 3.6 ?

@imperatorx
Copy link
Author

HZ 3.6 brings back Lite Members. These nodes do not own keys, store no data, so if you configure a Hazelcast config with setLiteMember(true), that member will not receive any load from storing Hazelcast data.

The bad thing is, that it can still be used as an IExecutorService, so any member can send to it any Runnable class, that is peresent on the lite members classpath, and it will execute it. There is no option to disable the IExecutorService to my knowledge.

Vert.x is functioning OK with some nodes as lite members (at least one has to be a normal member).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants