Investigate the blocked thread when leaving the cluster. #30

cescoffier · 2016-05-01T15:36:13Z

It may be normal, but worth an investigation.

When leaving the cluster, the thread is blocked. Probably because the balancing of the data is done in the background.

io.vertx.core.VertxException: Thread blocked
    at java.lang.Object.wait(Native Method)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.pollResponse(InvocationFuture.java:300)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.waitForResponse(InvocationFuture.java:245)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:222)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:202)
    at com.hazelcast.concurrent.lock.LockProxySupport.tryLock(LockProxySupport.java:136)
    at com.hazelcast.concurrent.lock.LockProxySupport.tryLock(LockProxySupport.java:125)
    at com.hazelcast.concurrent.lock.LockProxy.tryLock(LockProxy.java:93)
    at io.vertx.spi.cluster.hazelcast.HazelcastClusterManager.beforeLeave(HazelcastClusterManager.java:315)
    at io.vertx.core.impl.VertxImpl.closeClusterManager(VertxImpl.java:439)
    at io.vertx.core.impl.VertxImpl.lambda$null$420(VertxImpl.java:471)
    at io.vertx.core.impl.VertxImpl$$Lambda$78/776405214.handle(Unknown Source)
    at io.vertx.core.eventbus.impl.clustered.ClusteredEventBus.lambda$null$144(ClusteredEventBus.java:163)
    at io.vertx.core.eventbus.impl.clustered.ClusteredEventBus$$Lambda$88/1936993836.handle(Unknown Source)
    at io.vertx.core.net.impl.NetServerImpl.lambda$executeCloseDone$30(NetServerImpl.java:375)
    at io.vertx.core.net.impl.NetServerImpl$$Lambda$96/1795172180.handle(Unknown Source)
    at io.vertx.core.impl.ContextImpl.lambda$wrapTask$3(ContextImpl.java:337)
    at io.vertx.core.impl.ContextImpl$$Lambda$13/2099574098.run(Unknown Source)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:339)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:373)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
    at java.lang.Thread.run(Thread.java:745)

[127.0.0.1]:5702 [dev] [3.6.1] Address[127.0.0.1]:5702 is SHUTTING_DOWN

The text was updated successfully, but these errors were encountered:

cescoffier · 2016-05-01T15:37:40Z

Also, lots of:

[127.0.0.1]:5703 [dev] [3.6.1] Committing/rolling-back alive transactions of Member [127.0.0.1]:5702, UUID: 8545de7d-6390-4cb7-b952-d3dd3cd9e696 
[127.0.0.1]:5701 [dev] [3.6.1] Committing/rolling-back alive transactions of Member [127.0.0.1]:5702, UUID: 8545de7d-6390-4cb7-b952-d3dd3cd9e696 
[127.0.0.1]:5702 [dev] [3.6.1] Connection [Address[127.0.0.1]:5703] lost. Reason: Socket explicitly closed 
[127.0.0.1]:5702 [dev] [3.6.1] Connection [Address[127.0.0.1]:5703] lost. Reason: Socket explicitly closed 
[127.0.0.1]:5703 [dev] [3.6.1] Connection [Address[127.0.0.1]:5702] lost. Reason: java.io.EOFException[Remote socket closed!] 
[127.0.0.1]:5703 [dev] [3.6.1] hz._hzInstance_401_dev.IO.thread-in-1 Closing socket to endpoint Address[127.0.0.1]:5702, Cause:java.io.EOFException: Remote socket closed! 
[127.0.0.1]:5702 [dev] [3.6.1] Shutting down node engine... 
[127.0.0.1]:5703 [dev] [3.6.1] Address[127.0.0.1]:5703 is SHUTTING_DOWN 
[127.0.0.1]:5702 [dev] [3.6.1] Destroying node NodeExtension. 
[127.0.0.1]:5702 [dev] [3.6.1] Hazelcast Shutdown is completed in 19018 ms. 
[127.0.0.1]:5702 [dev] [3.6.1] Address[127.0.0.1]:5702 is SHUTDOWN 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5703 [dev] [3.6.1] Waiting for the master node to complete remaining migrations! 
[127.0.0.1]:5701 [dev] [3.6.1] Partition balance is ok, no need to re-partition cluster data...  
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 584999ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 583998ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 582997ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 581996ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 580995ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 579994ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 578993ms 
[127.0.0.1]:5703 [dev] [3.6.1] Some backup replicas are inconsistent with primary, waiting for synchronization. Timeout: 577992ms 
[127.0.0.1]:5703 [dev] [3.6.1] Shutting down multicast service... 
[127.0.0.1]:5701 [dev] [3.6.1] Connection [Address[127.0.0.1]:5703] lost. Reason: Socket explicitly closed 
[127.0.0.1]:5701 [dev] [3.6.1] Removing Member [127.0.0.1]:5703 
[127.0.0.1]:5701 [dev] [3.6.1] hz._hzInstance_276_dev.IO.thread-out-2 Closing socket to endpoint Address[127.0.0.1]:5703, Cause:java.nio.channels.CancelledKeyException 
java.nio.channels.CancelledKeyException
    at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
    at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
    at com.hazelcast.nio.tcp.nonblocking.AbstractHandler.unregisterOp(AbstractHandler.java:73)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter.unschedule(NonBlockingSocketWriter.java:295)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter.handle(NonBlockingSocketWriter.java:344)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter.run(NonBlockingSocketWriter.java:423)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.executeTask(NonBlockingIOThread.java:236)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.processTaskQueue(NonBlockingIOThread.java:229)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.runSelectLoop(NonBlockingIOThread.java:201)
    at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.run(NonBlockingIOThread.java:163)

[127.0.0.1]:5703 [dev] [3.6.1] Connection [Address[127.0.0.1]:5701] lost. Reason: java.io.EOFException[Remote socket closed!] 
[127.0.0.1]:5703 [dev] [3.6.1] Shutting down connection manager... 
[127.0.0.1]:5703 [dev] [3.6.1] hz._hzInstance_401_dev.IO.thread-in-0 Closing socket to endpoint Address[127.0.0.1]:5701, Cause:java.io.EOFException: Remote socket closed!

It fixes #30.

cescoffier added this to the 3.3.0 milestone May 1, 2016

cescoffier added a commit that referenced this issue May 27, 2016

Improve the sequence when leaving the cluster.

c05b230

It fixes #30.

cescoffier mentioned this issue May 27, 2016

Improve the sequence when leaving the cluster. #39

Merged

cescoffier self-assigned this May 27, 2016

cescoffier added the to review label May 27, 2016

cescoffier closed this as completed in #39 Jun 6, 2016

cescoffier removed the to review label Jun 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate the blocked thread when leaving the cluster. #30

Investigate the blocked thread when leaving the cluster. #30

cescoffier commented May 1, 2016

cescoffier commented May 1, 2016

Investigate the blocked thread when leaving the cluster. #30

Investigate the blocked thread when leaving the cluster. #30

Comments

cescoffier commented May 1, 2016

cescoffier commented May 1, 2016