RaftServer{system-partition-1}{role=FOLLOWER} - #3

gitizenme opened this issue Nov 27, 2019 · 18 comments
information-requested question Further information is requested


helm install zeebe-full zeebe/zeebe-full

The cluster fails to transition into the running state:

NAME READY STATUS RESTARTS AGE elasticsearch-master-0 0/1 Pending 0 25s elasticsearch-master-1 0/1 Pending 0 25s elasticsearch-master-2 0/1 Pending 0 25s zeebe-full-nginx-ingress-controller-6c689bb4cc-b5mlw 1/1 Running 0 26s zeebe-full-nginx-ingress-default-backend-849f468f76-gjzg8 1/1 Running 0 26s zeebe-full-operate-84c9c66d8-kj85v 1/1 Running 0 26s zeebe-full-zeebe-0 0/1 Pending 0 26s zeebe-full-zeebe-1 0/1 Running 0 25s zeebe-full-zeebe-2 0/1 Pending 0 25s

and the log for one of the zeebe-full nodes reports:

2019-11-27 19:56:30.529 [] [zb-blocking-task-runner-1-zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501] INFO io.zeebe.gateway - Version: 0.21.1 2019-11-27 19:56:30.531 [] [zb-blocking-task-runner-1-zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501] INFO io.zeebe.gateway - Starting gateway with configuration { "enable": true, "network": { "host": "", "port": 26500 }, "cluster": { "contactPoint": "zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26502", "maxMessageSize": "4M", "requestTimeout": "15s", "clusterName": "zeebe-cluster", "memberId": "gateway", "host": "", "port": 26502 }, "threads": { "managementThreads": 1 }, "monitoring": { "enabled": false, "host": "zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local", "port": 9600 }, "security": { "enabled": false } } 2019-11-27 19:56:31.201 [] [atomix-cluster-events] DEBUG - Member 1 received event ClusterMembershipEvent{type=MEMBER_ADDED, subject=Member{id=1, address=zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26502, properties={brokerInfo=EADJAAAAAQABAAAAA 2019-11-27 19:56:31.203 [] [zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.gateway - Received membership event: ClusterMembershipEvent{type=MEMBER_ADDED, subject=Member{id=1, addr 2019-11-27 19:56:31.212 [] [zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501-zb-actors-0] INFO io.zeebe.transport.endpoint - Registering endpoint for node '1' with address 'zeebe-full-zeebe-1.zeebe-full-zeebe 2019-11-27 19:56:36.683 [] [raft-server-system-partition-1] WARN io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - 2019-11-27 19:56:36.684 [] [raft-server-system-partition-1] WARN io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} -

Version info:
helm version version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"}
kubectl version Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Any guidance on how to fix the issue causing the exception?

salaboy commented Nov 27, 2019

@gitizenme hi there, thanks a lot for reporting this. Can you please share more information about where are you trying to run the charts? Which cloud provider?

salaboy commented Nov 28, 2019

@gitizenme if you can provide more information we should be able to help .

salaboy commented Nov 30, 2019

@gitizenme does this still applies? Can you provide more information?

Sorry, I was out on the long holiday. Yes, the issue is still present. I'm running on macOS 10.15.1 using Docker Desktop

salaboy commented Dec 3, 2019

@gitizenme I haven't had the time to check with with Kubernetes in Docker For Mac, but the same charts are working in Kubernetes KIND.. so I bet that there is a small difference somewhere that we need to tune for Docker for Mac. Can you try doing an update on the charts and try again? I've released a new version of the charts.

@salaboy no change, still receiving the same error.

salaboy commented Dec 3, 2019

@gitizenme ok.. give me until tomorrow so I can try it locally to see if I can find what the problem is and create a new release. Docker for Mac was not in my immediate plans.. but since you ask.. I will give it a go

gitizenme commented Dec 3, 2019

@salaboy Sounds good, thanks for following up so quickly.
FYI - our use cases are:

  • local dev on macOS / Docker Desktop / Helm
  • Deploy to Amazon EKS via Terraform / Helm

salaboy commented Dec 4, 2019

@gitizenme Are you using Helm3 right?

salaboy commented Dec 4, 2019

@gitizenme running the same in docker for Mac with Helm2, in a default setup I am getting:

2019-12-04 09:54:13.307 [] [salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.gateway - Received membership event: ClusterMembershipEvent{type=METADATA_CHANGED, subject=Member{id=0, address=salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26502, properties={brokerInfo=EADJAAAAAQAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk9AAAAc2FsYWJveS16ZWViZS0wLnNhbGFib3ktemVlYmUuZGVmYXVsdC5zdmMuY2x1c3Rlci5sb2NhbDoyNjUwMQUAAQMAAAAB}}, time=1575453251550} with BrokerInfo{nodeId=0, partitionsCount=3, clusterSize=3, replicationFactor=3, partitionRoles={3=FOLLOWER}} 
2019-12-04 09:54:13.307 [service-controller] [salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26501-zb-actors-1] ERROR - Actor failed in phase 'STARTED'. Continue with next job.
java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(Unknown Source) ~[?:?]
	at java.nio.ByteBuffer.allocate(Unknown Source) ~[?:?]
	at io.zeebe.distributedlog.restore.log.impl.DefaultLogReplicationRequestHandler.<init>( ~[zeebe-logstreams-0.21.1.jar:0.21.1]
	at io.zeebe.distributedlog.restore.log.impl.DefaultLogReplicationRequestHandler.<init>( ~[zeebe-logstreams-0.21.1.jar:0.21.1]
	at ~[zeebe-broker-0.21.1.jar:0.21.1]
	at ~[zeebe-broker-0.21.1.jar:0.21.1]
	at ~[zeebe-broker-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.onDependenciesAvailable( ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept( ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept( ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController.onServiceEvent( ~[zeebe-service-container-0.21.1.jar:0.21.1]
	at io.zeebe.servicecontainer.impl.ServiceController$$Lambda$139/ Source) ~[?:?]
	at io.zeebe.util.sched.ActorJob.invoke( ~[zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorJob.execute( [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorTask.execute( [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask( [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.doWork( [zeebe-util-0.21.1.jar:0.21.1]
	at [zeebe-util-0.21.1.jar:0.21.1]

I will start looking into how to tune Docker for Mac to make sure that there is no resources problems

salaboy commented Dec 4, 2019

@gitizenme after sorting out the resources problem:

salaboy-nginx-ingress-controller-844d5784d7-t7h2z       1/1     Running   1          48m
salaboy-nginx-ingress-default-backend-f66f7758b-trlf2   1/1     Running   1          48m
salaboy-operate-5d7bd95f44-gphbc                        1/1     Running   15         48m
salaboy-zeebe-0                                         1/1     Running   1          48m
salaboy-zeebe-1                                         1/1     Running   1          48m
salaboy-zeebe-2                                         1/1     Running   0          72s

The only big difference that I can think of .. is helm 3

salaboy commented Dec 4, 2019

In order to get ElasticSearch working in docker for Mac you only need to do some tweaks to your values file as follows:

    imageTag: 6.8.3
    # Permit co-located instances for solitary minikube virtual machines.
    antiAffinity: "soft"

    # Shrink default JVM heap.
    esJavaOpts: "-Xmx128m -Xms128m"

    # Allocate smaller chunks of memory per pod.
        cpu: "100m"
        memory: "512M"
      cpu: "1000m"
      memory: "512M"

  # Request smaller persistent volumes.
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "hostpath"
        storage: 100M

That configuration is coming from the elasticsearch chart official examples.

salaboy commented Dec 4, 2019

Everything is running here.. @gitizenme can you please double check that you don't have a nasty java.lang.OutOfMemoryError: Java heap space in your pod logs?

NAME                                                    READY   STATUS    RESTARTS   AGE
elasticsearch-master-0                                  1/1     Running   0          9m9s
elasticsearch-master-1                                  1/1     Running   0          9m9s
elasticsearch-master-2                                  1/1     Running   0          9m9s
salaboy-nginx-ingress-controller-844d5784d7-sgfgf       1/1     Running   0          9m9s
salaboy-nginx-ingress-default-backend-f66f7758b-2vrql   1/1     Running   0          9m9s
salaboy-operate-5d7bd95f44-f245b                        1/1     Running   3          9m9s
salaboy-zeebe-0                                         1/1     Running   0          4m36s
salaboy-zeebe-1                                         1/1     Running   0          9m9s
salaboy-zeebe-2                                         1/1     Running   0          9m9s

I'll have time to check the value changes tomorrow @salaboy

Which values file?

salaboy commented Dec 7, 2019

@gitizenme just put the content that I've listed in the comment in a file .yaml and then when you call install use the -f to send that file to the install.. as it is in the docs for KIND.

salaboy commented Dec 12, 2019

@gitizenme did you manage to try with that values file? I am currently updating the charts and testing again.. I will appreciate feedback to see if you still find this issue.

salaboy commented Jan 29, 2021

inactive for too long

