Use existing Kafka cluster #196

solsson · 2017-07-08T05:19:18Z

I'm a maintainer of https://github.com/Yolean/kubernetes-kafka, so naturally we have a cluster already :) Also https://github.com/kubernetes/charts/tree/master/incubator/kafka is quite widely adopted, judging by the number of pulls from https://hub.docker.com/r/solsson/kafka/.

I disagree with #32. Kafka is an excellent choice of events backend, with semantics that fit nicely with serverless function execution (can strive for exactly once). Also it makes it easy to integrate with other services in a streaming platform.

Can Kubeless use an existing Kafka cluster? You'd basically only need to specify any requirements on kafka config, and decouple kafka from the rest of Kubeless through a bootstrap brokers config. Maybe have some naming convention for topics.

sebgoa · 2017-07-08T08:57:01Z

Hi @solsson good to see you here, I have checked out your Yolean images of Kafka :)

I am glad you agree with the use of Kafka. I think it is a matter of expertise and taste.

To your issue, yes you can totally reuse an existing kafka cluster. The coupling is very lite.

Currently when you install kubeless with the manifest we generate in the release https://github.com/kubeless/kubeless/releases/download/0.0.16/kubeless-0.0.16.yaml you will see that we launch a very simple kafka setup with single nodes statefulesets.

The coupling happens in two locations:

the cli which provides a convenience wrapper kubeless topic to create,ls,delete topics. This is very much a kubectl exec into the kafka pod and execution of the kafka bash script to manage topics. This is hackish and needs to be improved. https://github.com/kubeless/kubeless/blob/cba55476923ab38d5454b07be37396d9885fb4f3/cmd/kubeless/topicCreate.go#L30
the runtime for events. It is basically a kafka consumer. This kafka consumer listens for events on kafka.kubeless. See: https://github.com/kubeless/kubeless/blob/master/docker/runtime/python-2.7/event-trigger/kubeless.py#L38

Currently, events are only supported for the python runtime. We should be able to get to the Node.js within the next two weeks.

So say you have an existing kafka cluster, if you could expose it within k8s as kafka.kubeless dns entry that you are up and running.

We should actually make that kafka endpoint a bit more generic to avoid hard-coding the dns name.

solsson · 2017-07-08T13:12:10Z

Thanks for pointing out the built manifest. Without it the setup was a bit opaque to me.

I hope to get time to test this as a branch from Yolean/kubernetes-kafka#30, now that I have vacation :) I haven't yet considered how/if we'll use Kubeless in production, but it'll be interesting to see where discussions in #186 and #148 are heading as we already have build infrastructure - and obviously events - in our clusters.

the cli which provides a convenience wrapper kubeless topic to create,ls,delete topics

I think this mechanism can be in Kubeless independent of how Kafka is hosted. It too needs name+port to brokers. For example we used a simple java client at first, but more often these days we use the shell script in separate Jobs using a /bin/bash -c command with the Kafka image. To Kafka it makes no difference, and I think execing into the production image has no advantage over using a separate container.

the runtime for events. It is basically a kafka consumer. This kafka consumer listens for events on kafka.kubeless

I guess a service+endpoint could map to the actual service, for example kafka.kafka. However, due to Yolean/kubernetes-kafka#21 I've decided to remove the Service from kafka (see referencing commits).

Maybe the kafka.kubeless service could use arbitrary broker(s) as endpoint if Kubeless uses it as "bootstrap", i.e. to get the actual names of online brokers. Thanks to StatefulSet those are cluster-global already, for example kafka-0.broker.kafka.svc.cluster.local:9092.

sebgoa · 2017-07-08T14:15:12Z

Would love to help you if you want to try this out, and any feedback would be very welcome.

and definitely using your existing kafka configuration should not be a huge hurdle

solsson · 2017-07-09T18:28:36Z

I'll be on vacation for two weeks. Will give this a shot after that.

solsson · 2017-08-09T19:32:37Z

We've released kubernetes-kafka v2.0.0 now, and I just confirmed how easy it is to set up Kubeless on top of it. Summarized in Yolean/kubernetes-kafka#48.

As you predicted @sebgoa the CLI's topic support fails with FATA[0000] Can't find any kafka pod. Maybe it could use the kafka.kubeless service to find brokers, or does it need an equivalent zookeeper service? Anyway users of kubernetes-kafka will likely be proficient in creating topics so it's not a big deal.

I suggest that you release a separate manifest, like you've done with openshift and rbac, for Kubeless with existing Kafka. I've only deleted from the release manifest, not edited anything. The externalName, https://github.com/Yolean/kubernetes-kafka/pull/48/files#diff-7c5fe8f6bc7c0bdce8d75d7387f4cd08R9, will be custom.

I guess the service can also rely on a kind: Endpoints resource, for Kafka outside kubernets, so maybe it's best left to the user to define kafka.kubeless.

This assumes that the kafka setup will provide actual listener resolvable names through the service.

sebgoa · 2017-08-15T10:28:56Z

@solsson great, thanks for trying this.

If I recall correctly the kafka topic find the kafka broker pod via a label query with labels kubeless:kafka so assuming you label your broker with that, the command should work...I think...

cc/ @ngtuna see comment about creating a manifest with a kafka setup kubeless-no-kafka.yaml ? and probably we need to be able to point to a different DNS name than kafka.kubeless

solsson · 2017-08-17T12:59:45Z

Tested with the label in Yolean/kubernetes-kafka@fe3a6b3, but still getting FATA[0000] Can't find any kafka pod. Does it assume that the label is in the kubeless namespace?

sebgoa · 2017-08-18T07:18:50Z

@ngtuna can you check this ? thx

solsson · 2017-08-18T07:19:20Z

Took the time to investigate this myself now :) Yes it assumes kafka is in the kubeless namespace, and it also assumes an install path with /opt/bitnami/kafka/bin/kafka-topics.sh. But as I suggested before we can scope this out because those who have a kafka cluster already will be comfortable with creating topics in some other way.

ngtuna · 2017-08-18T07:49:58Z

@solsson sorry for a bit delay in response. So I just wrap up this:

the current deployment places kafka and zk statefulset into kubeless namespace and hard-coding the dns to kafka.kubeless
kubeless topic executes this command /opt/bitnami/kafka/bin/kafka-topics.sh in the kafka pod. It finds the kafka pod by label kubeless: kafka

So:

that's easy to make a kubeless-without-kafka/zk manifest and we should do that (in upcoming release)
the external kafka should have this label kubeless:kafka
We should not fixing the namespace, so the pubsub runtime can find it via the injected envvar KUBELESS_KAFKA_NAMESPACE
the external kafka will handle topic creation and publishing, the kubeless runtime will do the consumption.

Are you fine with that ? @sebgoa @solsson

solsson · 2017-08-18T07:57:37Z

the external kafka should have this label kubeless:kafka

We should not fixing the namespace, so the pubsub runtime can find it via the injected envvar KUBELESS_KAFKA_NAMESPACE

This means that there wont be a Service with name:kafka in the kubeless namespace, but that pubsub will look up brokers directly? I think that would be great, because k8s services obscure actual broker listeners.

Adding a label to your kafka pods is a convenient way to qualify them for Kubeless.

ngtuna · 2017-08-18T08:17:52Z

This means that there wont be a Service with name:kafka in the kubeless namespace

Even if there is a kafka in kubeless namespace it's still be fine because the pubsub runtime is prioritized to get the KUBELESS_KAFKA_NAMESPACE envvar first.

Adding a label to your kafka pods is a convenient way to qualify them for Kubeless.

Yeah. But for kubeless topic, there still remains a tricky point that someone deploys kafka and label it as kubeless: kafka. What we are doing is we just pick up the first pod in the returned list.

arjunrao87 · 2017-09-01T18:53:04Z

@ngtuna is this available in the latest release?

sebgoa · 2017-10-10T09:04:53Z

@nikhita this one should be relatively easy to start with.

You should run a kafka deployment outside kubeless...then figure out how to start kubeless to use that specific kafka. Verify that pubsub functions work.

It should give you a smooth intro to running kubeless and using pubsub functions.

lghinet · 2017-11-17T14:44:58Z

hello,
i am running kafka on a different port: 9094

Best Practices

Reserve port 9092 for INSIDE listeners.
Reserve port 9093 for BROKER listeners.
Reserve port 9094 for OUTSIDE listeners.
from https://github.com/wurstmeister/kafka-docker/blob/master/README.md

is there any config where i can change that ?

andresmgot · 2017-11-20T08:36:52Z

Hello,

Unfortunately it is not possible to configure Kafka in a different port and make it work with Kubeless yet. I am curious about why that port setup is a best practice, could you elaborate?

lghinet · 2017-11-20T09:04:31Z

because you want different security levels,
https://github.com/wurstmeister/kafka-docker/blob/master/README.md#listener-configuration

sebgoa · 2018-02-07T16:00:30Z

Pretty sure we fixed this cc/ @andresmgot @ngtuna

andresmgot · 2018-02-08T08:40:15Z

Not really, PubSub functions still expect Kafka in the kubeless namespace listening in the port 9092

ngtuna · 2018-02-08T09:36:24Z

No we have fixed this already. The pubsub func read KUBELESS_KAFKA_SVC and KUBELESS_KAFKA_NAMESPACE envvars which are defaulted to kafka and kubeless. So if the func is deployed with --env flag to provide those env vars it will work. The missing bit is that the port 9092 is fixed. We should consider to support @lghinet 's case.

andresmgot · 2018-02-08T13:41:13Z

I thought that the idea was to support an external Kafka system (so it doesn't even need to be in the same cluster).

arapulido · 2018-02-08T13:42:13Z

In the case of Kafka being on the same cluster, do we have documentation on how to use it?

andresmgot · 2018-02-08T14:34:22Z

It is not. We need to document it.

arapulido · 2018-02-08T16:05:25Z

I have created #587

andresmgot · 2018-04-23T15:42:53Z

It is now possible to use the environment variable KAFKA_BROKERS with the Kafka consumer in order to use different Kafka installations but http://kubeless.io/docs/use-existing-kafka/ requires adaptation.

solsson mentioned this issue Jul 27, 2017

Addon: Kubeless serverless functions with PubSub Yolean/kubernetes-kafka#48

Closed

sebgoa added the starter label Oct 10, 2017

andresmgot mentioned this issue Feb 7, 2018

Support for other namespace and other KAFKA services! vmware-archive/kubeless-ui#53

Open

This was referenced Feb 13, 2018

add a doc for existing kafka #591

Merged

investigate capability of extending kubeless topic command #593

Open

andresmgot added documentation v1.0.0-alpha.1 labels Apr 24, 2018

andresmgot mentioned this issue Apr 25, 2018

Update docs for using an existing Kafka #720

Merged

2 tasks

andresmgot closed this as completed in #720 Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use existing Kafka cluster #196

Use existing Kafka cluster #196

solsson commented Jul 8, 2017

sebgoa commented Jul 8, 2017

solsson commented Jul 8, 2017

sebgoa commented Jul 8, 2017

solsson commented Jul 9, 2017

solsson commented Aug 9, 2017 •

edited

Loading

sebgoa commented Aug 15, 2017

solsson commented Aug 17, 2017

sebgoa commented Aug 18, 2017

solsson commented Aug 18, 2017

ngtuna commented Aug 18, 2017

solsson commented Aug 18, 2017 •

edited

Loading

ngtuna commented Aug 18, 2017 •

edited

Loading

arjunrao87 commented Sep 1, 2017

sebgoa commented Oct 10, 2017

lghinet commented Nov 17, 2017 •

edited

Loading

andresmgot commented Nov 20, 2017

lghinet commented Nov 20, 2017

sebgoa commented Feb 7, 2018

andresmgot commented Feb 8, 2018

ngtuna commented Feb 8, 2018

andresmgot commented Feb 8, 2018

arapulido commented Feb 8, 2018

andresmgot commented Feb 8, 2018

arapulido commented Feb 8, 2018

andresmgot commented Apr 23, 2018

Use existing Kafka cluster #196

Use existing Kafka cluster #196

Comments

solsson commented Jul 8, 2017

sebgoa commented Jul 8, 2017

solsson commented Jul 8, 2017

sebgoa commented Jul 8, 2017

solsson commented Jul 9, 2017

solsson commented Aug 9, 2017 • edited Loading

sebgoa commented Aug 15, 2017

solsson commented Aug 17, 2017

sebgoa commented Aug 18, 2017

solsson commented Aug 18, 2017

ngtuna commented Aug 18, 2017

solsson commented Aug 18, 2017 • edited Loading

ngtuna commented Aug 18, 2017 • edited Loading

arjunrao87 commented Sep 1, 2017

sebgoa commented Oct 10, 2017

lghinet commented Nov 17, 2017 • edited Loading

andresmgot commented Nov 20, 2017

lghinet commented Nov 20, 2017

sebgoa commented Feb 7, 2018

andresmgot commented Feb 8, 2018

ngtuna commented Feb 8, 2018

andresmgot commented Feb 8, 2018

arapulido commented Feb 8, 2018

andresmgot commented Feb 8, 2018

arapulido commented Feb 8, 2018

andresmgot commented Apr 23, 2018

solsson commented Aug 9, 2017 •

edited

Loading

solsson commented Aug 18, 2017 •

edited

Loading

ngtuna commented Aug 18, 2017 •

edited

Loading

lghinet commented Nov 17, 2017 •

edited

Loading