Skip to content
This repository has been archived by the owner on Dec 15, 2021. It is now read-only.

Use existing Kafka cluster #196

Closed
solsson opened this issue Jul 8, 2017 · 25 comments
Closed

Use existing Kafka cluster #196

solsson opened this issue Jul 8, 2017 · 25 comments

Comments

@solsson
Copy link

solsson commented Jul 8, 2017

I'm a maintainer of https://github.com/Yolean/kubernetes-kafka, so naturally we have a cluster already :) Also https://github.com/kubernetes/charts/tree/master/incubator/kafka is quite widely adopted, judging by the number of pulls from https://hub.docker.com/r/solsson/kafka/.

I disagree with #32. Kafka is an excellent choice of events backend, with semantics that fit nicely with serverless function execution (can strive for exactly once). Also it makes it easy to integrate with other services in a streaming platform.

Can Kubeless use an existing Kafka cluster? You'd basically only need to specify any requirements on kafka config, and decouple kafka from the rest of Kubeless through a bootstrap brokers config. Maybe have some naming convention for topics.

@sebgoa
Copy link
Contributor

sebgoa commented Jul 8, 2017

Hi @solsson good to see you here, I have checked out your Yolean images of Kafka :)

I am glad you agree with the use of Kafka. I think it is a matter of expertise and taste.

To your issue, yes you can totally reuse an existing kafka cluster. The coupling is very lite.

Currently when you install kubeless with the manifest we generate in the release https://github.com/kubeless/kubeless/releases/download/0.0.16/kubeless-0.0.16.yaml you will see that we launch a very simple kafka setup with single nodes statefulesets.

The coupling happens in two locations:

Currently, events are only supported for the python runtime. We should be able to get to the Node.js within the next two weeks.

So say you have an existing kafka cluster, if you could expose it within k8s as kafka.kubeless dns entry that you are up and running.

We should actually make that kafka endpoint a bit more generic to avoid hard-coding the dns name.

@solsson
Copy link
Author

solsson commented Jul 8, 2017

Thanks for pointing out the built manifest. Without it the setup was a bit opaque to me.

I hope to get time to test this as a branch from Yolean/kubernetes-kafka#30, now that I have vacation :) I haven't yet considered how/if we'll use Kubeless in production, but it'll be interesting to see where discussions in #186 and #148 are heading as we already have build infrastructure - and obviously events - in our clusters.

the cli which provides a convenience wrapper kubeless topic to create,ls,delete topics

I think this mechanism can be in Kubeless independent of how Kafka is hosted. It too needs name+port to brokers. For example we used a simple java client at first, but more often these days we use the shell script in separate Jobs using a /bin/bash -c command with the Kafka image. To Kafka it makes no difference, and I think execing into the production image has no advantage over using a separate container.

the runtime for events. It is basically a kafka consumer. This kafka consumer listens for events on kafka.kubeless

I guess a service+endpoint could map to the actual service, for example kafka.kafka. However, due to Yolean/kubernetes-kafka#21 I've decided to remove the Service from kafka (see referencing commits).

Maybe the kafka.kubeless service could use arbitrary broker(s) as endpoint if Kubeless uses it as "bootstrap", i.e. to get the actual names of online brokers. Thanks to StatefulSet those are cluster-global already, for example kafka-0.broker.kafka.svc.cluster.local:9092.

@sebgoa
Copy link
Contributor

sebgoa commented Jul 8, 2017

Would love to help you if you want to try this out, and any feedback would be very welcome.

and definitely using your existing kafka configuration should not be a huge hurdle

@solsson
Copy link
Author

solsson commented Jul 9, 2017

I'll be on vacation for two weeks. Will give this a shot after that.

@solsson
Copy link
Author

solsson commented Aug 9, 2017

We've released kubernetes-kafka v2.0.0 now, and I just confirmed how easy it is to set up Kubeless on top of it. Summarized in Yolean/kubernetes-kafka#48.

As you predicted @sebgoa the CLI's topic support fails with FATA[0000] Can't find any kafka pod. Maybe it could use the kafka.kubeless service to find brokers, or does it need an equivalent zookeeper service? Anyway users of kubernetes-kafka will likely be proficient in creating topics so it's not a big deal.

I suggest that you release a separate manifest, like you've done with openshift and rbac, for Kubeless with existing Kafka. I've only deleted from the release manifest, not edited anything. The externalName, https://github.com/Yolean/kubernetes-kafka/pull/48/files#diff-7c5fe8f6bc7c0bdce8d75d7387f4cd08R9, will be custom.

I guess the service can also rely on a kind: Endpoints resource, for Kafka outside kubernets, so maybe it's best left to the user to define kafka.kubeless.

This assumes that the kafka setup will provide actual listener resolvable names through the service.

@sebgoa
Copy link
Contributor

sebgoa commented Aug 15, 2017

@solsson great, thanks for trying this.

If I recall correctly the kafka topic find the kafka broker pod via a label query with labels kubeless:kafka so assuming you label your broker with that, the command should work...I think...

cc/ @ngtuna see comment about creating a manifest with a kafka setup kubeless-no-kafka.yaml ? and probably we need to be able to point to a different DNS name than kafka.kubeless

@solsson
Copy link
Author

solsson commented Aug 17, 2017

Tested with the label in Yolean/kubernetes-kafka@fe3a6b3, but still getting FATA[0000] Can't find any kafka pod. Does it assume that the label is in the kubeless namespace?

@sebgoa
Copy link
Contributor

sebgoa commented Aug 18, 2017

@ngtuna can you check this ? thx

@solsson
Copy link
Author

solsson commented Aug 18, 2017

Took the time to investigate this myself now :) Yes it assumes kafka is in the kubeless namespace, and it also assumes an install path with /opt/bitnami/kafka/bin/kafka-topics.sh. But as I suggested before we can scope this out because those who have a kafka cluster already will be comfortable with creating topics in some other way.

@ngtuna
Copy link
Contributor

ngtuna commented Aug 18, 2017

@solsson sorry for a bit delay in response. So I just wrap up this:

  • the current deployment places kafka and zk statefulset into kubeless namespace and hard-coding the dns to kafka.kubeless
  • kubeless topic executes this command /opt/bitnami/kafka/bin/kafka-topics.sh in the kafka pod. It finds the kafka pod by label kubeless: kafka

So:

  • that's easy to make a kubeless-without-kafka/zk manifest and we should do that (in upcoming release)
  • the external kafka should have this label kubeless:kafka
  • We should not fixing the namespace, so the pubsub runtime can find it via the injected envvar KUBELESS_KAFKA_NAMESPACE
  • the external kafka will handle topic creation and publishing, the kubeless runtime will do the consumption.

Are you fine with that ? @sebgoa @solsson

@solsson
Copy link
Author

solsson commented Aug 18, 2017

  • the external kafka should have this label kubeless:kafka
  • We should not fixing the namespace, so the pubsub runtime can find it via the injected envvar KUBELESS_KAFKA_NAMESPACE

This means that there wont be a Service with name:kafka in the kubeless namespace, but that pubsub will look up brokers directly? I think that would be great, because k8s services obscure actual broker listeners.

Adding a label to your kafka pods is a convenient way to qualify them for Kubeless.

@ngtuna
Copy link
Contributor

ngtuna commented Aug 18, 2017

This means that there wont be a Service with name:kafka in the kubeless namespace

Even if there is a kafka in kubeless namespace it's still be fine because the pubsub runtime is prioritized to get the KUBELESS_KAFKA_NAMESPACE envvar first.

Adding a label to your kafka pods is a convenient way to qualify them for Kubeless.

Yeah. But for kubeless topic, there still remains a tricky point that someone deploys kafka and label it as kubeless: kafka. What we are doing is we just pick up the first pod in the returned list.

@arjunrao87
Copy link
Contributor

@ngtuna is this available in the latest release?

@sebgoa sebgoa added the starter label Oct 10, 2017
@sebgoa
Copy link
Contributor

sebgoa commented Oct 10, 2017

@nikhita this one should be relatively easy to start with.

You should run a kafka deployment outside kubeless...then figure out how to start kubeless to use that specific kafka. Verify that pubsub functions work.

It should give you a smooth intro to running kubeless and using pubsub functions.

@lghinet
Copy link

lghinet commented Nov 17, 2017

hello,
i am running kafka on a different port: 9094

Best Practices

Reserve port 9092 for INSIDE listeners.
Reserve port 9093 for BROKER listeners.
Reserve port 9094 for OUTSIDE listeners.
from https://github.com/wurstmeister/kafka-docker/blob/master/README.md

is there any config where i can change that ?

@andresmgot
Copy link
Contributor

Hello,

Unfortunately it is not possible to configure Kafka in a different port and make it work with Kubeless yet. I am curious about why that port setup is a best practice, could you elaborate?

@lghinet
Copy link

lghinet commented Nov 20, 2017

because you want different security levels,
https://github.com/wurstmeister/kafka-docker/blob/master/README.md#listener-configuration

@sebgoa
Copy link
Contributor

sebgoa commented Feb 7, 2018

Pretty sure we fixed this cc/ @andresmgot @ngtuna

@andresmgot
Copy link
Contributor

Not really, PubSub functions still expect Kafka in the kubeless namespace listening in the port 9092

@ngtuna
Copy link
Contributor

ngtuna commented Feb 8, 2018

No we have fixed this already. The pubsub func read KUBELESS_KAFKA_SVC and KUBELESS_KAFKA_NAMESPACE envvars which are defaulted to kafka and kubeless. So if the func is deployed with --env flag to provide those env vars it will work. The missing bit is that the port 9092 is fixed. We should consider to support @lghinet 's case.

@andresmgot
Copy link
Contributor

I thought that the idea was to support an external Kafka system (so it doesn't even need to be in the same cluster).

@arapulido
Copy link
Contributor

In the case of Kafka being on the same cluster, do we have documentation on how to use it?

@andresmgot
Copy link
Contributor

It is not. We need to document it.

@arapulido
Copy link
Contributor

I have created #587

@andresmgot
Copy link
Contributor

It is now possible to use the environment variable KAFKA_BROKERS with the Kafka consumer in order to use different Kafka installations but http://kubeless.io/docs/use-existing-kafka/ requires adaptation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants