Handle the case where the coordinator is replaced with a new host #456

gotascii · 2017-10-27T01:09:50Z

If the group coordinator is replaced with a different host, but the broker id remains the same, the client will go into and endless reconnection loop. This PR refreshes the cluster data if there is a ConnectionError when joining a group. The issue is reproducible by following these steps:

Start up a cluster with 3 nodes.
Publish some messages to a topic.
Connect to the topic and start an each_message loop.
A broker, say #0 for example, becomes memoized in @coordinator in ConsumerGroup.
Stop the each_message loop but do not exit the process.
Kill broker 0 and bring back a new host with a different ip as broker 0.
With the same consumer instance, run the each_message loop again.

When the above steps are taken:

ConsumerGroup#join is called.
Then coordinator.join_group on ConsumerGroup L:117 fails with ConnectionError.
ConsumerGroup#join sets @coordinator = nil.
Cluster#get_group_coordinator asks a broker for the broker id of the coordinator which is 0.
connect_to_broker pulls cached info for id 0 (i.e. the old IP).
Then coordinator.join_group on ConsumerGroup L:117 fails with ConnectionError restarting the loop.

Seeing as the retry for a ConnectionError is guarded by a sleep 1 I'm hoping this is a pretty safe place to refresh metadata.

dasch · 2017-10-27T09:06:41Z

❤️

mark_as_stale! doesn't by itself refresh the metadata, it just means that a subsequent query of metadata will cause of refresh rather than serving cached info.

Justin Marney added 3 commits October 26, 2017 16:37

Handle the case where the coordinator is replaced with a new host

9b6e9d9

Merge branch 'master' into jm-handle-replaced-coordinator

3f82e61

Fix missing !

e5eb76f

dasch merged commit a506abd into zendesk:master Oct 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle the case where the coordinator is replaced with a new host #456

Handle the case where the coordinator is replaced with a new host #456

Uh oh!

gotascii commented Oct 27, 2017

Uh oh!

dasch commented Oct 27, 2017

Uh oh!

Uh oh!

Handle the case where the coordinator is replaced with a new host #456

Handle the case where the coordinator is replaced with a new host #456

Uh oh!

Conversation

gotascii commented Oct 27, 2017

Uh oh!

dasch commented Oct 27, 2017

Uh oh!

Uh oh!