Skip to content
This repository has been archived by the owner on Aug 20, 2021. It is now read-only.

RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException #3

Closed
gitizenme opened this issue Nov 27, 2019 · 18 comments
Closed
Labels
information-requested question Further information is requested

Comments

@gitizenme
Copy link

Running

helm install zeebe-full zeebe/zeebe-full

The cluster fails to transition into the running state:

NAME READY STATUS RESTARTS AGE elasticsearch-master-0 0/1 Pending 0 25s elasticsearch-master-1 0/1 Pending 0 25s elasticsearch-master-2 0/1 Pending 0 25s zeebe-full-nginx-ingress-controller-6c689bb4cc-b5mlw 1/1 Running 0 26s zeebe-full-nginx-ingress-default-backend-849f468f76-gjzg8 1/1 Running 0 26s zeebe-full-operate-84c9c66d8-kj85v 1/1 Running 0 26s zeebe-full-zeebe-0 0/1 Pending 0 26s zeebe-full-zeebe-1 0/1 Running 0 25s zeebe-full-zeebe-2 0/1 Pending 0 25s

and the log for one of the zeebe-full nodes reports:

2019-11-27 19:56:30.529 [] [zb-blocking-task-runner-1-zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501] INFO io.zeebe.gateway - Version: 0.21.1 2019-11-27 19:56:30.531 [] [zb-blocking-task-runner-1-zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501] INFO io.zeebe.gateway - Starting gateway with configuration { "enable": true, "network": { "host": "0.0.0.0", "port": 26500 }, "cluster": { "contactPoint": "zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26502", "maxMessageSize": "4M", "requestTimeout": "15s", "clusterName": "zeebe-cluster", "memberId": "gateway", "host": "0.0.0.0", "port": 26502 }, "threads": { "managementThreads": 1 }, "monitoring": { "enabled": false, "host": "zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local", "port": 9600 }, "security": { "enabled": false } } 2019-11-27 19:56:31.201 [] [atomix-cluster-events] DEBUG io.zeebe.broker.clustering - Member 1 received event ClusterMembershipEvent{type=MEMBER_ADDED, subject=Member{id=1, address=zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26502, properties={brokerInfo=EADJAAAAAQABAAAAA 2019-11-27 19:56:31.203 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.gateway - Received membership event: ClusterMembershipEvent{type=MEMBER_ADDED, subject=Member{id=1, addr 2019-11-27 19:56:31.212 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501-zb-actors-0] INFO io.zeebe.transport.endpoint - Registering endpoint for node '1' with address 'zeebe-full-zeebe-1.zeebe-full-zeebe 2019-11-27 19:56:36.683 [] [raft-server-system-partition-1] WARN io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException 2019-11-27 19:56:36.684 [] [raft-server-system-partition-1] WARN io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException

Version info:
helm version version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}
kubectl version Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Any guidance on how to fix the issue causing the exception?

@salaboy
Copy link
Collaborator

salaboy commented Nov 27, 2019

@gitizenme hi there, thanks a lot for reporting this. Can you please share more information about where are you trying to run the charts? Which cloud provider?

@salaboy
Copy link
Collaborator

salaboy commented Nov 28, 2019

@gitizenme if you can provide more information we should be able to help .

@salaboy
Copy link
Collaborator

salaboy commented Nov 30, 2019

@gitizenme does this still applies? Can you provide more information?

@salaboy salaboy added question Further information is requested information-requested labels Nov 30, 2019
@gitizenme
Copy link
Author

Sorry, I was out on the long holiday. Yes, the issue is still present. I'm running on macOS 10.15.1 using Docker Desktop
image
image

@salaboy
Copy link
Collaborator

salaboy commented Dec 3, 2019

@gitizenme I haven't had the time to check with with Kubernetes in Docker For Mac, but the same charts are working in Kubernetes KIND.. so I bet that there is a small difference somewhere that we need to tune for Docker for Mac. Can you try doing an update on the charts and try again? I've released a new version of the charts.

@gitizenme
Copy link
Author

@salaboy no change, still receiving the same error.

@salaboy
Copy link
Collaborator

salaboy commented Dec 3, 2019

@gitizenme ok.. give me until tomorrow so I can try it locally to see if I can find what the problem is and create a new release. Docker for Mac was not in my immediate plans.. but since you ask.. I will give it a go

@gitizenme
Copy link
Author

gitizenme commented Dec 3, 2019

@salaboy Sounds good, thanks for following up so quickly.
FYI - our use cases are:

  • local dev on macOS / Docker Desktop / Helm
  • Deploy to Amazon EKS via Terraform / Helm

@salaboy
Copy link
Collaborator

salaboy commented Dec 4, 2019

@gitizenme Are you using Helm3 right?

@salaboy
Copy link
Collaborator

salaboy commented Dec 4, 2019

@gitizenme running the same in docker for Mac with Helm2, in a default setup I am getting:

2019-12-04 09:54:13.307 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.gateway - Received membership event: ClusterMembershipEvent{type=METADATA_CHANGED, subject=Member{id=0, address=salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26502, properties={brokerInfo=EADJAAAAAQAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk9AAAAc2FsYWJveS16ZWViZS0wLnNhbGFib3ktemVlYmUuZGVmYXVsdC5zdmMuY2x1c3Rlci5sb2NhbDoyNjUwMQUAAQMAAAAB}}, time=1575453251550} with BrokerInfo{nodeId=0, partitionsCount=3, clusterSize=3, replicationFactor=3, partitionRoles={3=FOLLOWER}} 
2019-12-04 09:54:13.307 [service-controller] [salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26501-zb-actors-1] ERROR io.zeebe.util.actor - Actor failed in phase 'STARTED'. Continue with next job.
java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(Unknown Source) ~[?:?]
	at java.nio.ByteBuffer.allocate(Unknown Source) ~[?:?]
	at io.zeebe.distributedlog.restore.log.impl.DefaultLogReplicationRequestHandler.<init>(DefaultLogReplicationRequestHandler.java:34) ~[zeebe-logstreams-0.21.1.jar:0.21.1]
	at io.zeebe.distributedlog.restore.log.impl.DefaultLogReplicationRequestHandler.<init>(DefaultLogReplicationRequestHandler.java:29) ~[zeebe-logstreams-0.21.1.jar:0.21.1]
	at io.zeebe.broker.logstreams.restore.BrokerRestoreServer.start(BrokerRestoreServer.java:62) ~[zeebe-broker-0.21.1.jar:0.21.1]
	at io.zeebe.broker.clustering.base.partitions.Partition.startRestoreServer(Partition.java:142) ~[zeebe-broker-0.21.1.jar:0.21.1]
	at io.zeebe.broker.clustering.base.partitions.Partition.start(Partition.java:116) ~[zeebe-broker-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.onDependenciesAvailable(ServiceController.java:260) ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept(ServiceController.java:213) ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept(ServiceController.java:207) ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController.onServiceEvent(ServiceController.java:105) ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$$Lambda$139/0x0000000100259040.run(Unknown Source) ~[?:?]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) ~[zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:127) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:107) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:91) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:195) [zeebe-util-0.21.1.jar:0.21.1]

I will start looking into how to tune Docker for Mac to make sure that there is no resources problems

@salaboy
Copy link
Collaborator

salaboy commented Dec 4, 2019

@gitizenme after sorting out the resources problem:

salaboy-nginx-ingress-controller-844d5784d7-t7h2z       1/1     Running   1          48m
salaboy-nginx-ingress-default-backend-f66f7758b-trlf2   1/1     Running   1          48m
salaboy-operate-5d7bd95f44-gphbc                        1/1     Running   15         48m
salaboy-zeebe-0                                         1/1     Running   1          48m
salaboy-zeebe-1                                         1/1     Running   1          48m
salaboy-zeebe-2                                         1/1     Running   0          72s

The only big difference that I can think of .. is helm 3

@salaboy
Copy link
Collaborator

salaboy commented Dec 4, 2019

In order to get ElasticSearch working in docker for Mac you only need to do some tweaks to your values file as follows:

zeebe:
  elasticsearch:
    imageTag: 6.8.3
    # Permit co-located instances for solitary minikube virtual machines.
    antiAffinity: "soft"

    # Shrink default JVM heap.
    esJavaOpts: "-Xmx128m -Xms128m"

    # Allocate smaller chunks of memory per pod.
    resources:
      requests:
        cpu: "100m"
        memory: "512M"
    limits:
      cpu: "1000m"
      memory: "512M"

  # Request smaller persistent volumes.
  volumeClaimTemplate:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "hostpath"
    resources:
      requests:
        storage: 100M

That configuration is coming from the elasticsearch chart official examples.

@salaboy
Copy link
Collaborator

salaboy commented Dec 4, 2019

Everything is running here.. @gitizenme can you please double check that you don't have a nasty java.lang.OutOfMemoryError: Java heap space in your pod logs?

NAME                                                    READY   STATUS    RESTARTS   AGE
elasticsearch-master-0                                  1/1     Running   0          9m9s
elasticsearch-master-1                                  1/1     Running   0          9m9s
elasticsearch-master-2                                  1/1     Running   0          9m9s
salaboy-nginx-ingress-controller-844d5784d7-sgfgf       1/1     Running   0          9m9s
salaboy-nginx-ingress-default-backend-f66f7758b-2vrql   1/1     Running   0          9m9s
salaboy-operate-5d7bd95f44-f245b                        1/1     Running   3          9m9s
salaboy-zeebe-0                                         1/1     Running   0          4m36s
salaboy-zeebe-1                                         1/1     Running   0          9m9s
salaboy-zeebe-2                                         1/1     Running   0          9m9s

@gitizenme
Copy link
Author

I'll have time to check the value changes tomorrow @salaboy

@gitizenme
Copy link
Author

In order to get ElasticSearch working in docker for Mac you only need to do some tweaks to your values file as follows:

zeebe:
  elasticsearch:
    imageTag: 6.8.3
    # Permit co-located instances for solitary minikube virtual machines.
    antiAffinity: "soft"

    # Shrink default JVM heap.
    esJavaOpts: "-Xmx128m -Xms128m"

    # Allocate smaller chunks of memory per pod.
    resources:
      requests:
        cpu: "100m"
        memory: "512M"
    limits:
      cpu: "1000m"
      memory: "512M"

  # Request smaller persistent volumes.
  volumeClaimTemplate:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "hostpath"
    resources:
      requests:
        storage: 100M

That configuration is coming from the elasticsearch chart official examples.

Which values file?

@salaboy
Copy link
Collaborator

salaboy commented Dec 7, 2019

@gitizenme just put the content that I've listed in the comment in a file .yaml and then when you call install use the -f to send that file to the install.. as it is in the docs for KIND.

@salaboy
Copy link
Collaborator

salaboy commented Dec 12, 2019

@gitizenme did you manage to try with that values file? I am currently updating the charts and testing again.. I will appreciate feedback to see if you still find this issue.

@salaboy
Copy link
Collaborator

salaboy commented Jan 29, 2021

inactive for too long

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
information-requested question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants