Zookeeper persistent data? #26

gytisgreitai · 2017-04-06T15:40:36Z

Hi,

from the Readme:

Zookeeper runs as a Deployment without persistent storage:

but looking at zookeeper it looks like you are using stateful set with local storage. Maybe I'm missing something, but I could probably use same persistent storage approach as with kafka, eg:

<snip>
          volumeMounts:
            - name: zookeeper-data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: zookeeper-data
        annotations:
          volume.beta.kubernetes.io/storage-class: slow
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 500m
<snip>

Haven't tested much, but seems to be working?

The text was updated successfully, but these errors were encountered:

solsson · 2017-04-07T09:33:28Z

Yes you can. That's part of the reason why we use StatefulSet, the other being that we get predictable host names from which we can deduce an identity number.

We have automated topic creation, so if all zookeeper nodes would go down we would re-run that automation and kafka would pick up all topics. If you don't want to run the risk of having to do so, persistent storage the way you suggest is preferrable.

solsson · 2017-08-09T06:27:02Z

Done in https://github.com/Yolean/kubernetes-kafka/releases/tag/v2.0.0. To make Zookeeper more robust in the face of zone outages I made (by default) two pods ephemeral and tree with persistent volumes.

deitch · 2017-12-11T14:12:35Z

To make Zookeeper more robust in the face of zone outages I made (by default) two pods ephemeral and three with persistent volumes.

I was wondering about that, looks like an interesting way to handle zone failures (e.g. most cloud block storage is zone-specific).

What are the failure scenarios that having a mix of persistent and ephemeral help with, that is better than just having, e.g. 3 persistent nodes or 5 persistent nodes?

solsson · 2017-12-11T15:20:40Z

Let's start with Kafka. We use three zones. Thus we start with three Kafka brokers, and default replication factor 3. To increase throughput we can scale to 6, 9, 12 ... brokers and add 1 partition per such step. We can run with producer acks=2 and continue writing despite zone outage.

With Zookeeper the maths are less appealing, given that three zones is a good choice. We've accepted the recomendation from the Kafka (Definitive Guide) book: use 5 or 7 instances. Zookeeper is configured statically, so in the event of an extended zone outage for one of the two zones that host two instances it can neither be reconfigured nor rescheduled.

To be honest I have neither tested nor studied zookeeper sufficiently to know which failure modes we can handle. I have just tried to ensure that - as with kafka - only one instance at a time is gone.

deitch · 2017-12-11T18:29:33Z

OK, thanks.

deitch · 2017-12-11T18:30:10Z

I have struggled way too much with getting zookeeper and kafka to behave nicely in a cloud environment.

StevenACoffman · 2017-12-11T22:33:06Z

@solsson I noticed this comment you made earlier:

We have automated topic creation, so if all zookeeper nodes would go down we would re-run that automation and kafka would pick up all topics.

Is that automated topic creation captured in an open source repo somewhere? I'd be interested in hearing more of your thoughts on that subject.

solsson · 2017-12-13T18:47:11Z

@StevenACoffman We run Jobs with kafka-topics.sh commands over and over again. In the long run that won't be maintainable though, which is why I opened #101.

solsson · 2017-12-18T07:29:20Z

Update: With #107 merged we'll no longer maintain these definitions. This means that we can probably not recover topics from kafka volumes alone.

solsson mentioned this issue Jun 25, 2017

Persistent volumes for Zookeeper #33

Merged

solsson closed this as completed Aug 9, 2017

This was referenced Jan 8, 2018

Scale to run on two nodes #118

Closed

Use of pzoo and zoo as persistent/ephemeral storage nodes #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zookeeper persistent data? #26

Zookeeper persistent data? #26

gytisgreitai commented Apr 6, 2017

solsson commented Apr 7, 2017

solsson commented Aug 9, 2017

deitch commented Dec 11, 2017

solsson commented Dec 11, 2017

deitch commented Dec 11, 2017

deitch commented Dec 11, 2017

StevenACoffman commented Dec 11, 2017

solsson commented Dec 13, 2017

solsson commented Dec 18, 2017

Zookeeper persistent data? #26

Zookeeper persistent data? #26

Comments

gytisgreitai commented Apr 6, 2017

solsson commented Apr 7, 2017

solsson commented Aug 9, 2017

deitch commented Dec 11, 2017

solsson commented Dec 11, 2017

deitch commented Dec 11, 2017

deitch commented Dec 11, 2017

StevenACoffman commented Dec 11, 2017

solsson commented Dec 13, 2017

solsson commented Dec 18, 2017