Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh metadata after restart #404

Merged
merged 6 commits into from
Jun 25, 2019
Merged

Conversation

tulios
Copy link
Owner

@tulios tulios commented Jun 24, 2019

We recently upgraded to Kafka 2 and are currently changing some broker configurations, which leads to a lot of broker restarts. During these scenarios, some consumers take too long to recover since they keep retrying on old brokers and fail to acknowledge the change.

This PR does three things:

  1. refresh metadata on the "wrong coordinator" error instead of relying on the crash/restart flow
  2. refresh metadata on the "broker not found", error instead of relying on the crash/restart flow
  3. re-connect the cluster and refresh metadata on restart

1 and 2 improves the recovery time, instead of trashing the consumer and restarting it refreshes metadata and connect to the right broker

3 explicitly prepares the consumer group for the new run instead of relying on errors deep into the consumer group flow; this should improve the restart speed since consumers won't need to fail to execute these operations.

When the consumer crash, the broker is restarted, but it relies on the 
lack of metadata to re-connect. This error happens deep into the 
consumer group flow, and it isn't clear
This was turned on when we used to use travis as the CI server
@tulios tulios added the bug label Jun 24, 2019
@tulios tulios requested a review from Nevon June 24, 2019 16:13
@Nevon Nevon merged commit 37b211b into master Jun 25, 2019
@Nevon Nevon deleted the refresh-metadata-after-restart branch June 25, 2019 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants