[WIP] Reimplementation #72

wvanbergen · 2015-08-16T13:08:58Z

This reimplements the kafka high-level consumer. This addresses a couple of limitations of the old implementation, and also makes the code and especially the multithreading model easier to grok (I was young an inexperienced when I wrote the initial implementation ;).

Changes

It now uses Zookeeper to discover all topic/partition metadata, instead of Kafka. The primary reason is that this allows us to set watches, so we can detect changes.
Using 1., it now automatically starts to consume new partitions when they become available.
When claiming a partition that is still being claimed by someone else, it uses a zookeeper watch to wait until it becomes available.
It is smarter during a rebalance operation. Instead of stopping everything and starting everything, it now only stops partition consumers it no longer needs, and only starts partition consumers which are not already there. Because it does all of this in parallel, rebalance operations are a lot faster.
Many things are more resilient by implementing retries.
It uses a Subscription interface to describe what topics to consume. This would allow us to implement a regular expression based black list or white list approach, as well as a static list of topics.
It uses an interface for the main Consumer type, so unit testing apps that use this library is now possible using dependency injection.

Implementation notes

I have implemented it as a separate package, because the API is changed somewhat, and I want to keep the old one around for now.
We have three main types:
1. consumerManager: runs a goroutine that figures out what partitions to consume, and start/stop partition managers for them. Afterwards it waits for changes in the subscription or changes in this list of running instances to do it again. Implements the Consumer interface.
2. partitionManager: runs a goroutine that manages a single sarama.PartitionConsumer, claiming the partition in Zookeeper, and managing offsets.
3. Subscription: Describes what partitions the entire group should be consuming, and watches zookeeper for potential changes.
This depends on some Kazoo changes: Consumergoup additions kazoo-go#10
This depends on sarama's offset manager, which has not yet landed in master: OffsetManager Implementation IBM/sarama#461

TODO

Kafka offset management.
Zookeeper offset management?
Add whitelist / blacklist Subscription types. Maybe move the Subscription type to Kazoo?
Unit tests - feasible now sarama has mock types.
Functional tests

wvanbergen · 2015-08-16T13:14:20Z

@horkhe @nemothekid @aaronkavlie-wf @kvs: your input on this is very welcome!

wvanbergen · 2015-08-16T13:29:38Z

Also pinging @eapache as always :)

wvanbergen · 2015-08-16T13:30:40Z

kafkaconsumer/consumer.go

+		}
+
+		instances, instancesChanged, err := cm.group.WatchInstances()
+		if err != nil {


TODO: retry this on error.

eapache · 2015-08-16T14:10:22Z

I am not sure whether to use Zookeeper for offset management, or use Kafka instead

Dunno how similar the ZK version is, but this seems like an ideal place for a swappable interface.

wvanbergen · 2015-08-16T15:49:20Z

I will go ahead and try to implement it with sarama's OffsetManager. I can rework the ZK version to match the sarama interface.

wvanbergen · 2015-08-16T18:48:26Z

I implemented offset management using sarama's WIP offset managers. It appears to work well.

wvanbergen · 2015-08-17T13:13:18Z

I also added Whitelist and Blacklist subscription types. This mean you can subscribe to topics (not) matching a regular expression. Any new topics that are created in the cluster will automatically be consumed.

eapache · 2015-08-17T14:15:33Z

examples/kafkaconsumer/main.go

+		}
+
+		eventCount += 1
+		if offsets[message.Topic][message.Partition] != 0 && offsets[message.Topic][message.Partition] != message.Offset-1 {


this should be unnecessary (sarama does these checks) and it's wrong anyways for compacted topics where the offsets might not be monotonic

This was copy pasted from the previous example app and is shitty. This also fails when you start consuming a partition, then another instance takes it over, and later you get the partition assigned back to you.

Will remove.

eapache · 2015-08-17T14:25:04Z

Quick skim looks good 👍

When this is fully ready I'll do a deep-dive review.

wvanbergen · 2015-08-18T01:57:39Z

This is getting pretty close to being feature complete. Let's work to get IBM/sarama#461 merged so we can pick this up.

horkhe · 2015-08-18T21:04:31Z

@wvanbergen good job! I only wish you guys had all that production ready a couple of months ago, then I would not need to implement that myself :)

horkhe · 2015-08-18T21:08:20Z

kafkaconsumer/consumer_manager.go

+	}
+
+	// Initialize sarama consumer
+	if consumer, err := sarama.NewConsumerFromClient(cm.client); err != nil {


Consumer and OffsetManager share a client instance but consumer requests can stay blocked doing long polling. Maybe it makes sense to use separate clients so that OffsetManager would never be affected by long polls?

… consumer instance list.

…setManager interface

wvanbergen · 2015-08-31T21:25:42Z

kafkaconsumer/partition_manager.go

+// partition consumer resumes consuming from this partition later.
+func (pm *partitionManager) waitForProcessing() {
+	nextOffset, _ := pm.offsetManager.NextOffset()
+	lastProcessedOffset := nextOffset - 1


Not really super happy about this implementation, but not sure how else to do this.

We only want to wait for offsets if a) we actually consumed any messages at all, and b) if we haven't already processed all consumed offsets. In either of those cases, the pm.processingDone will never be closed.

seems reasonable to me... early returns would make it much prettier though, e.g.

if lastConsumedOffset == -1 { return } // ...

horkhe · 2015-08-31T21:47:08Z

kafkaconsumer/kafkaconsumer_integration_test.go

+			}
+			ts.offsetTotal += offset - 1
+
+			request.AddBlock(topic, partition, offset-1, 0, "")


Timestamp 0 means the beginning of Unix epoch. As a result all committed offsets are expired immediately. (in my setup Kafka kept them around for 1 minute). So you need to use ReceiveTime (-1) instead.

F21 · 2015-11-09T08:26:25Z

Any update on this one?

wvanbergen · 2015-11-09T13:04:37Z

It's mostly done; I need to get the functional tests to work on Travis (they work fine on my machine). Any input on that is welcome.

s7anley · 2016-01-11T09:25:28Z

@wvanbergen Hi, any news about this PR? Do you need any help? Last commit from Oct 7, so I'm not sure if I should wait or rather choose different approach for consumer groups.

wvanbergen · 2016-01-11T11:53:49Z

Hey. It looks like I will not be have a lot of time available to maintain this library, or finish this PR. If anybody is interested in taking it over, I will be glad to help out and get you started.

s7anley · 2016-01-11T12:21:56Z

And does it make sense when IBM/sarama@66d77e1 is merged? Or only as support for cluster still running on 0.8.x?

wvanbergen · 2016-01-11T14:16:06Z

Yeah, this will be primarily for people that are stuck on 0.8 for the time being.

This is complementary fix for wvanbergen#68 (issue: wvanbergen#62), before the re-implementation (wvanbergen#72) is ready. In my use case, the message consuming logic is sometimes time consuming, even with 3 times retry as the fix in pull#68, it's still easy to have the issue#62. Furhter checking current logic in consumer_group.go:partitionConsumer(), it may take as many as cg.config.Offsets.ProcessingTimeout to ReleasePartition so that the partition can be claimed by new consumer during rebalance. So just simply set the max retry time same as cg.config.Offsets.ProcessingTimeout, which is 60s by default. Verified this the system including this fix with frequent rebalance operations, the issue does not occur again.

wvanbergen reviewed Aug 16, 2015
View reviewed changes

wvanbergen force-pushed the refactor branch from a83c49b to b54cb7a Compare August 16, 2015 15:54

wvanbergen mentioned this pull request Aug 16, 2015

OffsetManager Implementation IBM/sarama#461

Merged

wvanbergen force-pushed the refactor branch from f52a4cf to 75e4ac0 Compare August 16, 2015 18:43

wvanbergen mentioned this pull request Aug 17, 2015

Retry claiming partitions if the partition is already claims. See #62 #68

Merged

wvanbergen force-pushed the refactor branch from 600ac06 to db11a68 Compare August 17, 2015 13:08

eapache reviewed Aug 17, 2015
View reviewed changes

wvanbergen force-pushed the refactor branch from 29cd7e7 to 6e7d73c Compare August 18, 2015 01:46

horkhe reviewed Aug 18, 2015
View reviewed changes

wvanbergen added 2 commits August 31, 2015 16:57

[WIP] Reimplementation

56986f5

Implement retry logic for setting up watches for the subscription and…

9e3ea5c

… consumer instance list.

wvanbergen added 7 commits August 31, 2015 16:58

Use our own logger, which defaults to sarama.Logger.

c56bb38

Improve logging

80ad22f

Refactor integration tests.

c99d64c

Address code review comments

d8ff276

Clean up example program.

ad2a6d7

Close OffsetManager

b18b031

Update partition_consumer to reflect changes in sarama’s PartitionOff…

98e6727

…setManager interface

wvanbergen force-pushed the refactor branch from 6456e38 to 98e6727 Compare August 31, 2015 21:21

wvanbergen reviewed Aug 31, 2015
View reviewed changes

Address go vet issues

9abf9ee

horkhe reviewed Aug 31, 2015
View reviewed changes

Add 1m timeout to tests

e255101

eapache mentioned this pull request Sep 5, 2015

Support for consumer groups? IBM/sarama#533

Closed

wvanbergen mentioned this pull request Sep 9, 2015

Consumer group example IBM/sarama#535

Closed

ORBAT mentioned this pull request Sep 23, 2015

Make ConsumerGroup an interface instead of struct #79

Closed

wvanbergen added 2 commits October 6, 2015 19:30

Use ReceiveTime(-1) when committing offsets to Kafka

8e92066

Enable debug logging during CI runs.

3b5fa72

wvanbergen force-pushed the refactor branch from 79cf337 to 3b5fa72 Compare October 7, 2015 08:17

josselin-c mentioned this pull request Oct 8, 2015

Kafkaconsumer stops consuming after ZK connection lost/timeout #80

Closed

wvanbergen mentioned this pull request Oct 13, 2015

[WIP] Offset manager tests IBM/sarama#508

Closed

caihua-yin mentioned this pull request Apr 19, 2016

Complementary fix of partition rebalnce issue(#62) #93

Merged

wvanbergen mentioned this pull request Oct 26, 2016

Is There Any Benchmark for the Consumer? #108

Open

wvanbergen closed this Aug 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Reimplementation #72

[WIP] Reimplementation #72

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen Aug 16, 2015

eapache commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 17, 2015

eapache Aug 17, 2015

wvanbergen Aug 18, 2015

eapache commented Aug 17, 2015

wvanbergen commented Aug 18, 2015

horkhe commented Aug 18, 2015

horkhe Aug 18, 2015

wvanbergen Aug 31, 2015

eapache Aug 31, 2015

horkhe Aug 31, 2015

F21 commented Nov 9, 2015

wvanbergen commented Nov 9, 2015

s7anley commented Jan 11, 2016

wvanbergen commented Jan 11, 2016

s7anley commented Jan 11, 2016

wvanbergen commented Jan 11, 2016

[WIP] Reimplementation #72

[WIP] Reimplementation #72

Conversation

wvanbergen commented Aug 16, 2015

Changes

Implementation notes

TODO

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen Aug 16, 2015

Choose a reason for hiding this comment

eapache commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 16, 2015

wvanbergen commented Aug 17, 2015

eapache Aug 17, 2015

Choose a reason for hiding this comment

wvanbergen Aug 18, 2015

Choose a reason for hiding this comment

eapache commented Aug 17, 2015

wvanbergen commented Aug 18, 2015

horkhe commented Aug 18, 2015

horkhe Aug 18, 2015

Choose a reason for hiding this comment

wvanbergen Aug 31, 2015

Choose a reason for hiding this comment

eapache Aug 31, 2015

Choose a reason for hiding this comment

horkhe Aug 31, 2015

Choose a reason for hiding this comment

F21 commented Nov 9, 2015

wvanbergen commented Nov 9, 2015

s7anley commented Jan 11, 2016

wvanbergen commented Jan 11, 2016

s7anley commented Jan 11, 2016

wvanbergen commented Jan 11, 2016