Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zookeeper persistent data? #26

Closed
gytisgreitai opened this issue Apr 6, 2017 · 9 comments
Closed

Zookeeper persistent data? #26

gytisgreitai opened this issue Apr 6, 2017 · 9 comments

Comments

@gytisgreitai
Copy link

Hi,

from the Readme:

Zookeeper runs as a Deployment without persistent storage:

but looking at zookeeper it looks like you are using stateful set with local storage. Maybe I'm missing something, but I could probably use same persistent storage approach as with kafka, eg:

<snip>
          volumeMounts:
            - name: zookeeper-data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: zookeeper-data
        annotations:
          volume.beta.kubernetes.io/storage-class: slow
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 500m
<snip>

Haven't tested much, but seems to be working?

@solsson
Copy link
Contributor

solsson commented Apr 7, 2017

Yes you can. That's part of the reason why we use StatefulSet, the other being that we get predictable host names from which we can deduce an identity number.

We have automated topic creation, so if all zookeeper nodes would go down we would re-run that automation and kafka would pick up all topics. If you don't want to run the risk of having to do so, persistent storage the way you suggest is preferrable.

@solsson
Copy link
Contributor

solsson commented Aug 9, 2017

Done in https://github.com/Yolean/kubernetes-kafka/releases/tag/v2.0.0. To make Zookeeper more robust in the face of zone outages I made (by default) two pods ephemeral and tree with persistent volumes.

@solsson solsson closed this as completed Aug 9, 2017
@deitch
Copy link

deitch commented Dec 11, 2017

To make Zookeeper more robust in the face of zone outages I made (by default) two pods ephemeral and three with persistent volumes.

I was wondering about that, looks like an interesting way to handle zone failures (e.g. most cloud block storage is zone-specific).

What are the failure scenarios that having a mix of persistent and ephemeral help with, that is better than just having, e.g. 3 persistent nodes or 5 persistent nodes?

@solsson
Copy link
Contributor

solsson commented Dec 11, 2017

Let's start with Kafka. We use three zones. Thus we start with three Kafka brokers, and default replication factor 3. To increase throughput we can scale to 6, 9, 12 ... brokers and add 1 partition per such step. We can run with producer acks=2 and continue writing despite zone outage.

With Zookeeper the maths are less appealing, given that three zones is a good choice. We've accepted the recomendation from the Kafka (Definitive Guide) book: use 5 or 7 instances. Zookeeper is configured statically, so in the event of an extended zone outage for one of the two zones that host two instances it can neither be reconfigured nor rescheduled.

To be honest I have neither tested nor studied zookeeper sufficiently to know which failure modes we can handle. I have just tried to ensure that - as with kafka - only one instance at a time is gone.

@deitch
Copy link

deitch commented Dec 11, 2017

OK, thanks.

@deitch
Copy link

deitch commented Dec 11, 2017

I have struggled way too much with getting zookeeper and kafka to behave nicely in a cloud environment.

@StevenACoffman
Copy link
Contributor

@solsson I noticed this comment you made earlier:

We have automated topic creation, so if all zookeeper nodes would go down we would re-run that automation and kafka would pick up all topics.

Is that automated topic creation captured in an open source repo somewhere? I'd be interested in hearing more of your thoughts on that subject.

@solsson
Copy link
Contributor

solsson commented Dec 13, 2017

@StevenACoffman We run Jobs with kafka-topics.sh commands over and over again. In the long run that won't be maintainable though, which is why I opened #101.

@solsson
Copy link
Contributor

solsson commented Dec 18, 2017

Update: With #107 merged we'll no longer maintain these definitions. This means that we can probably not recover topics from kafka volumes alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants