Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hazelcast Example #6117

Merged
merged 16 commits into from
Nov 4, 2022

Conversation

tomazfernandes
Copy link
Contributor

A simple Hazelcast example with both single container and cluster tests that I created with @eddumelendez.

Please let me know if there's anything you'd like me to change.

Thanks!

Closes #6115

@tomazfernandes tomazfernandes requested a review from a team as a code owner November 3, 2022 01:00
@tomazfernandes
Copy link
Contributor Author

I clicked the Update Branch button and it merged main to my branch - I'll rebase it to remove this merge commit if that's ok.

Copy link
Member

@eddumelendez eddumelendez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomazfernandes ! Can you add a link here, please?

}

dependencies {
testImplementation 'org.testcontainers:testcontainers:1.17.5'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
testImplementation 'org.testcontainers:testcontainers:1.17.5'
testImplementation 'org.testcontainers:testcontainers'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works without version since we are doing a Gradle composite build with the examples.

@eddumelendez
Copy link
Member

also, please do not forget the logback-test.xml file

@tomazfernandes
Copy link
Contributor Author

Though the hazelcast test passed, I think the output looked strange, I'll take a look.

@tomazfernandes
Copy link
Contributor Author

tomazfernandes commented Nov 3, 2022

Ok, so there were two issues with the cluster example.

The first is that sometimes the client would connect to the cluster before both nodes were registered. Given the purpose of the example is having a cluster with two nodes I added a Wait.forLogMessage condition to wait for both nodes to be registered on containers startup.

The other issue is a warning message that shows:
WARNING: hz.client_1 [test-cluster] [5.2.0] Could not connect to member 3cb04aeb-658c-4cf3-8f73-55653af94f82, reason com.hazelcast.core.HazelcastException: java.io.IOException: Connect timed out to address /192.168.16.3:5701

The problem is that once the client connects to the first node, it fetches the cluster's Members, and they only know their address inside the docker network, not the one we provided using the TestContainer's host and port.

There's this logic executed by the client:

TcpClientConnectionManager.java

    public void tryConnectToAllClusterMembers(boolean sync) {
        if (!isSmartRoutingEnabled) {
            return;
        }

        if (sync) {
            for (Member member : client.getClientClusterService().getMemberList()) {
                try {
                    getOrConnectToMember(member, false);
                } catch (Exception e) {
                    EmptyStatement.ignore(e);
                }
            }
        }

        executor.scheduleWithFixedDelay(new ConnectToAllClusterMembersTask(), 1, 1, TimeUnit.SECONDS);
    }

It'll basically try to connect to all members every second, and just log if it can't:

ConnectToAllClusterMembersTask.java

try {
    if (!client.getLifecycleService().isRunning()) {
        return;
    }
    getOrConnectToMember(member, false);
} catch (Exception e) {
    if (logger.isFineEnabled()) {
        logger.warning("Could not connect to member " + uuid, e);
    } else {
        logger.warning("Could not connect to member " + uuid + ", reason " + e);
    }
} finally {
    connectingAddresses.remove(uuid);
}

I'm not sure what's the purpose of this attempt to connect to the members and ignore if it fails. Nevertheless, it doesn't affect the actual functionality of the test.

(EDIT: Well, probably the purpose is to log if it can't connect 😄)

But it might affect more complex scenarios where the client needs to make a request to a node that isn't the original one we configured the client with, since the client won't know the address exposed by the container for that node.

A simple Hazelcast example using both a single container and a cluster with two nodes.

Fixes testcontainers#6115
Sometimes the client would connect to the first container before the second one registers in the cluster.

Add a condition to wait for both nodes to be registered when starting the containers.
Copy link
Member

@eddumelendez eddumelendez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomazfernandes thanks for looking at it! I found a very interesting setup here but I didn't make it work. Also, found this and makes me think it is not that bad :D

clientConfig
.setClusterName(TEST_CLUSTER_NAME)
.getNetworkConfig()
.addAddress(container1.getHost() + HOST_PORT_SEPARATOR + container1.getFirstMappedPort());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to register both instances so if one of them are down the client can switch, just as a reference given that the example will make sure there are 2 members in the cluster.

Suggested change
.addAddress(container1.getHost() + HOST_PORT_SEPARATOR + container1.getFirstMappedPort());
.addAddress(
container1.getHost() + HOST_PORT_SEPARATOR + container1.getFirstMappedPort(),
container2.getHost() + HOST_PORT_SEPARATOR + container2.getFirstMappedPort()
);


private static final int DEFAULT_EXPOSED_PORT = 5701;

private static final String CLUSTER_STARTUP_LOG_MESSAGE_REGEX = ".*Members \\{size:2, ver:2\\}.*";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size and ver are interesting. According to the docs

Here, you can see the size of your cluster (size) and member list version (ver). The member list version is incremented when changes happen to the cluster, e.g., a member leaving from or joining to the cluster.

I will really only on Members \\{size:2

Source: https://docs.hazelcast.com/imdg/4.2/starting-members-clients

@kiview
Copy link
Member

kiview commented Nov 3, 2022

@tomazfernandes Regarding:

I'll rebase it to remove this merge commit if that's ok.

No need to do rebase in PRs in most cases, since we do a squash when merging 🙂

@tomazfernandes
Copy link
Contributor Author

@tomazfernandes thanks for looking at it!

No worries, thank you both for the suggestions!

I found a very interesting setup here but I didn't make it work.

If I understand correctly this is a way of programmatically creating the cluster, so it would be an alternative to using a container, rather than something we can use along, right?

Also, found hazelcast/hazelcast#21096 (comment) and makes me think it is not that bad :D

Nice! As far as I understand the way Hazelcast works is, with this smart routing enabled, when you ask for data the client will try to connect to the node that has it directly. So it's not only about resilience, but also efficiency.

I've checked and indeed setting smart routing to false removes the warning - but since users might prefer having the WARNING to keep smart routing on, I've left it commented. Let me know if you prefer a different approach.

Tests are always passing here - let's see if with the REGEX change it'll pass on the CI as well.

@eddumelendez
Copy link
Member

If I understand correctly this is a way of programmatically creating the cluster, so it would be an alternative to using a container, rather than something we can use along, right?

You can do it via env vars too but didn't succeed.

I've checked and indeed setting smart routing to false removes the warning - but since users might prefer having the WARNING to keep smart routing on, I've left it commented. Let me know if you prefer a different approach.

I think it is safe.

Thanks again!

Cluster test is failing in the CI, attempt to fix it by starting containers sequentially instead of in parallel.
@tomazfernandes
Copy link
Contributor Author

You can do it via env vars too but didn't succeed.

Just to register, I've also looked into some alternatives to make Hazelcast's automatic discovery work using network aliases and the HZ_NETWORK_PUBLICADDRESS env var, but we don't know the port yet when creating the container and can't access the aliases from outside the Docker network, so the client can't reach the nodes using that anyway.

An alternative to use automatic discovery without knowing the ports is to use the host network, but that didn't work with GenericContainer since it expects the ports to be published and with host networks the ports are not published. Not sure if I'm missing something there, but since using the host network doesn't seem like a great idea overall I didn't pursue it much further.

IMHO the solution is robust as is with the last changes, and in the future if a user asks for something this example doesn't cover we can look into that.

Let's see if the CI tests pass.

Thanks!

@eddumelendez
Copy link
Member

Just to register, I've also looked into some alternatives to make Hazelcast's automatic discovery work using network aliases and the HZ_NETWORK_PUBLICADDRESS env var, but we don't know the port yet when creating the container

I just was doing some hacky things last night but got other issue related to protocols 🤷🏽‍♂️

    static class HazelcastContainer extends GenericContainer<HazelcastContainer> {

        private static final String CLUSTER_STARTUP_LOG_MESSAGE_REGEX = ".*Members \\{size:2.*";

        private static final String STARTER_SCRIPT = "/testcontainers_start.sh";

        HazelcastContainer(DockerImageName image) {
            super(image);
            addExposedPort(5701);
            withCreateContainerCmdModifier(cmd -> {
                cmd.withEntrypoint("sh");
            });
            withCommand("-c", "while [ ! -f " + STARTER_SCRIPT + " ]; do sleep 0.1; done; " + STARTER_SCRIPT);
            waitingFor(Wait.forLogMessage(CLUSTER_STARTUP_LOG_MESSAGE_REGEX, 1));
        }

        @Override
        protected void containerIsStarting(InspectContainerResponse containerInfo) {
            super.containerIsStarting(containerInfo);
            String host;
            try {
                host = InetAddress.getLocalHost().getHostAddress();
            } catch (UnknownHostException e) {
                throw new RuntimeException(e);
            }
            String networkAlias = getNetworkAliases().get(0);

            String ip = getContainerInfo().getNetworkSettings().getNetworks().get(((Network.NetworkImpl) getNetwork()).getName()).getIpAddress();
            String command = "#!/bin/bash\n";
//            command += "export HZ_NETWORK_PUBLICADDRESS=" + host + ":" + getMappedPort(5701) + "\n";
            command += "export HZ_ADVANCEDNETWORK_ENABLED=true\n";
//            command += "export HZ_NETWORK_PORT_AUTOINCREMENT=false\n";
//            command += "export HZ_ADVANCEDNETWORK_MEMBERSERVERSOCKETENDPOINTCONFIG_PUBLICADDRESS=" + ip + "\n";
            command += "export HZ_ADVANCEDNETWORK_MEMBERSERVERSOCKETENDPOINTCONFIG_PORT_PORT=5701\n";
            command += "export HZ_ADVANCEDNETWORK_CLIENTSERVERSOCKETENDPOINTCONFIG_PUBLICADDRESS=" + host + "\n";
            command += "export HZ_ADVANCEDNETWORK_CLIENTSERVERSOCKETENDPOINTCONFIG_PORT_PORT=" + getMappedPort(5701) + "\n";
            command += "/opt/hazelcast/bin/hz start";

            copyFileToContainer(Transferable.of(command, 0777), STARTER_SCRIPT);
        }
    }

So far looks good and the comment will cover it. No need to rush on this. I am pretty sure people with more experience on Hazelcast will contribute later

@tomazfernandes
Copy link
Contributor Author

I just was doing some hacky things last night but got other issue related to protocols 🤷🏽‍♂️

Cool stuff!

So far looks good and the comment will cover it. No need to rush on this. I am pretty sure people with more experience on Hazelcast will contribute later

Nice!

Regarding the CI, I've changed the test to start the containers in sequence rather than in parallel - looked like there was a racing condition where both containers would emit the log before joining the cluster.

I had also looked into waiting for client.getCluster().getMembers() to be 2, but the client fetches this info when it first connects and doesn't seem to update it later, so it got messy. But if the log solution doesn't work on the CI maybe it's worth looking into that again.

@eddumelendez
Copy link
Member

Run ./gradlew :hazelcast:spotlessApply, please.

@tomazfernandes
Copy link
Contributor Author

Seems like even running the containers sequentially the second one still emits the log before joining the cluster in the CI 😕

Unless you have any other suggestions, I'll look for a different approach.

@eddumelendez
Copy link
Member

I would suggest to use the same wait strategy for server 2

@tomazfernandes
Copy link
Contributor Author

I would suggest to use the same wait strategy for server 2

You mean looking for Members: size: 1? Thing is when I run locally the second node emits the log with two members, so it fails:

Screen Shot 2022-11-03 at 18 20 05

@eddumelendez
Copy link
Member

You mean looking for Members: size: 1?

I totally sent the wrong message. Like that is ok. I wonder if it is related to the startup timeout and should be increased.

Change strategy back to parallel
@tomazfernandes
Copy link
Contributor Author

You mean looking for Members: size: 1?

I totally sent the wrong message. Like that is ok. I wonder if it is related to the startup timeout and should be increased.

Might be. I changed back to starting the containers in parallel and now they both wait for size:2. It works locally and I've set the timeout to 4 minutes to see if it succeeds in the CI.

Seems I might have broken the first test though, I'm checking.

@tomazfernandes
Copy link
Contributor Author

I think it's fine - let's see the next CI execution. Hopefully it'll pass this time! 🤞🏼

@tomazfernandes
Copy link
Contributor Author

I can try to increase the timeout even further, or maybe go back to starting the containers sequentially.

But I wonder if maybe something in the CI environment is preventing the containers from finding each other - I don't think we ever saw two members in the CI runs.

Also, since this is an example and not production code test, perhaps for now we can remove the wait and not assert the cluster.

@tomazfernandes
Copy link
Contributor Author

But I wonder if maybe something in the CI environment is preventing the containers from finding each other - I don't think we ever saw two members in the CI runs.

I think we might have something along these lines.

In the CI we have these logs:

2022-11-03 22:48:50,823 [ INFO] [main] [c.h.s.d.i.DiscoveryService]: [172.18.0.2]:5701 [test-cluster] [5.2.0] Auto-detection selected discovery strategy: class com.hazelcast.azure.AzureDiscoveryStrategyFactory

2022-11-03 22:48:52,642 [ WARN] [main] [c.h.a.AzureDiscoveryStrategy]: No Azure credentials found! Starting standalone. To use Hazelcast Azure discovery, configure properties (client-id, tenant-id, client-secret) or assign a managed identity to the Azure Compute instance

While locally:

2022-11-04 01:04:22,780 [ INFO] [main] [c.h.i.i.Node]: [192.168.80.2]:5701 [test-cluster] [5.2.0] Using Multicast discovery

I'll investigate this further.

@eddumelendez
Copy link
Member

weird! we can try with an env var HZ_NETWORK_JOIN_AZURE_ENABLED=false

@tomazfernandes
Copy link
Contributor Author

HZ_NETWORK_JOIN_AZURE_ENABLED

Are you sure about this name? Couldn't find anything online 😄

@eddumelendez
Copy link
Member

reference here

Use HZ for server and HZ_CLIENT for client :D All things we can learn in one night 🥱

@eddumelendez
Copy link
Member

./gradlew :hazelcast:spotlessApply pretty please :)

@tomazfernandes
Copy link
Contributor Author

tomazfernandes commented Nov 4, 2022

reference here

Use HZ for server and HZ_CLIENT for client :D All things we can learn in one night 🥱

Oh, I see, you can use the yaml / xml configuration as environment variables. Cool!

I had run spotless, but changed the code before committing 🤦🏻‍♂️

@eddumelendez
Copy link
Member

more info about env vars here

@tomazfernandes
Copy link
Contributor Author

No change:

2022-11-04 01:54:26,720 [ WARN] [main] [c.h.a.AzureDiscoveryStrategy]: No Azure credentials found! Starting standalone. To use Hazelcast Azure discovery, configure properties (client-id, tenant-id, client-secret) or assign a managed identity to the Azure Compute instance [2278](https://github.com/testcontainers/testcontainers-java/actions/runs/3390630101/jobs/5635009485#step:6:2279)

@eddumelendez
Copy link
Member

Azure Auto Detection
To start Hazelcast on Azure Virtual Machines, you have to perform the following steps for each VM:

https://hazelcast.com/blog/hazelcast-discovery-auto-detection/

now, it makes sense ☝🏽

I think we are close, can we do one last test with HZ_NETWORK_JOIN_MULTICAST_ENABLED=true, please?

@eddumelendez
Copy link
Member

🥳

@eddumelendez
Copy link
Member

can we remove the custom startup? I think the default one will work. Also, can we add a comment about the flags are needed because are running on GHA which use a Azure VM -.-

@tomazfernandes
Copy link
Contributor Author

can we remove the custom startup? I think the default one will work.

You mean the log waiting?

Also, can we add a comment about the flags are needed because are running on GHA which use a Azure VM -.-

Sure!

@eddumelendez
Copy link
Member

remove this, please

Remove custom wait timeout
@eddumelendez eddumelendez added this to the next milestone Nov 4, 2022
@eddumelendez eddumelendez merged commit 2b61912 into testcontainers:main Nov 4, 2022
@eddumelendez
Copy link
Member

Thanks @tomazfernandes ! this is now in main branch and it will be available soon in https://www.testcontainers.org/examples/

@tomazfernandes
Copy link
Contributor Author

Thanks @tomazfernandes !

Thanks @eddumelendez !

this is now in main branch and it will be available soon in https://www.testcontainers.org/examples/

It's available already, cool! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: Add Hazelcast Example
3 participants