Investigate CI issue #26

cescoffier · 2016-07-18T12:17:15Z

The JGroups cluster manager didn't get a successful run since quite some time now:
https://vertx.ci.cloudbees.com/view/vert.x-3/job/vert.x3-jgroups/

fmarinelli · 2016-07-19T14:37:35Z

If you execute a single test it completes with success, when you execute all test at once it completes with fail. We should probably check for something not cleared fine on cluster shutdown.

cescoffier · 2016-07-29T08:05:51Z

It is because of the last jgroups update ?

fmarinelli · 2016-07-29T10:06:08Z

No, I don't think so. But I'll have to investigate in any direction.

Sanne · 2016-08-04T16:36:43Z

I've sent pr #27 to improve the tests, although it's unlikely to be enough to fix them all as I'm still investigating some deeper things.

fmarinelli · 2016-08-04T20:41:27Z

I don't want to say that I found it because I don't understand it.
If I remove the testSubsRemovedForKilledNode2 inside the JGroupsClusteredEventbusTest everything works.
With that test on I get an unhandled exception for a mypojoencoder2 not found (probably just a matter of execution order)

Sanne · 2016-08-05T11:18:25Z

Indeed, there are some leaks.

With my pr #27 I've introduced a JUnit rule to identify tests which don't close the JGroups channel, and there were many. I fixed some, but for others which I couldn't fix I had to introduce a "lenient" option for the rule to be forgiving (i.e. not fail the test) but it will still nuke the state of the SHARED_LOOPBACK protocol.

This isn't enough though: while I'm now ensuring that the network channel is clean, a JChannel encapsulates other resources which aren't being closed correctly.

Some of the vert.x core tests like io.vertx.test.core.ComplexHATest - which are being extended by the vertx-jgroups module - invoke an helper method to "kill" the node.
This kill method is however implemented as a simple "graceful" shutdown on the ClusterManager, not least the JGroupsClusterManager doesn't close the JChannel if this was passed to the constructor.

I think the implementation of the JGroupsClusterManager to not close the JChannel is correct as its not his responsibility, so it looks like this "simulateKill" in vertx-core could use a bit of redesigning.

Not least in JGroups you'd have some tools to kill the connection in a less graceful way, e.g. use the DISCARD protocol to block any message flow as first thing, then tear down the rest.

See:

io.vertx.core.impl.VertxInternal.simulateKill()
io.vertx.core.impl.HAManager.simulateKill()

In short.. yes there are more leaks which need to be cleaned up.

luengnat · 2016-08-30T18:51:14Z

Is this still not resolved? #27 seems to be committed already

cescoffier · 2016-08-31T06:49:46Z

@luengnat nope, still failing on the CI.

cescoffier added the bug label Jul 18, 2016

pmlopes mentioned this issue Jul 18, 2016

Vert.x 3.3.3 umbrella vert-x3/issues#136

Closed

60 tasks

vietj closed this as completed Feb 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate CI issue #26

Investigate CI issue #26

cescoffier commented Jul 18, 2016

fmarinelli commented Jul 19, 2016

cescoffier commented Jul 29, 2016

fmarinelli commented Jul 29, 2016

Sanne commented Aug 4, 2016

fmarinelli commented Aug 4, 2016

Sanne commented Aug 5, 2016 •

edited

luengnat commented Aug 30, 2016

cescoffier commented Aug 31, 2016

Investigate CI issue #26

Investigate CI issue #26

Comments

cescoffier commented Jul 18, 2016

fmarinelli commented Jul 19, 2016

cescoffier commented Jul 29, 2016

fmarinelli commented Jul 29, 2016

Sanne commented Aug 4, 2016

fmarinelli commented Aug 4, 2016

Sanne commented Aug 5, 2016 • edited

luengnat commented Aug 30, 2016

cescoffier commented Aug 31, 2016

Sanne commented Aug 5, 2016 •

edited