Skip to content
This repository has been archived by the owner on May 13, 2019. It is now read-only.

Group doesn't recover after internet connectivity issue #96

Open
bhirbec opened this issue Jun 1, 2016 · 4 comments
Open

Group doesn't recover after internet connectivity issue #96

bhirbec opened this issue Jun 1, 2016 · 4 comments

Comments

@bhirbec
Copy link
Contributor

bhirbec commented Jun 1, 2016

Hi,

When the internet connection is off for sometime and up again then the consumers do not consume incoming messages. The caller isn't notified of this and no action can be taken.

You can reproduce this behavior the following way:

  • start a consumer group on your local machine
  • turn off your wifi
  • turn on your wifi
  • send some messages to your topic and you'll see that they are not consumed

Here's my log right after I turned on my wifi:

2016-05-31T23:42:22Z Unstructured Log Line,file=structs.go:21,text= Failed to connect to 10.101.206.42:2181: dial tcp 10.101.206.42:2181: connect: network is unreachable
2016-05-31T23:42:23Z Unstructured Log Line,text= Connected to 10.102.206.12:2181,file=structs.go:21
2016-05-31T23:42:23Z Unstructured Log Line,file=structs.go:21,text= Authentication failed: zk: session has been expired by the server
2016-05-31T23:42:23Z [My-Group/80b66df0c462] Triggering rebalance due to consumer list change
2016-05-31T23:42:23Z [My-Group/80b66df0c462] mytopic/0 :: Stopping partition consumer at offset -1
2016-05-31T23:42:23Z [My-Group/80b66df0c462] mytopic/1 :: Stopping partition consumer at offset -1
2016-05-31T23:42:23Z [My-Group/80b66df0c462] mytopic/2 :: Stopping partition consumer at offset -1
2016-05-31T23:42:23Z consumer/broker/3 closed dead subscription to mytopic/0
2016-05-31T23:42:23Z consumer/broker/2 closed dead subscription to mytopic/2
2016-05-31T23:42:24Z consumer/broker/1 closed dead subscription to mytopic/1
2016-05-31T23:42:24Z [My-Group/80b66df0c462] mytopic :: Stopped topic consumer
2016-05-31T23:42:24Z [My-Group/80b66df0c462] Currently registered consumers: 0
2016-05-31T23:42:24Z [My-Group/80b66df0c462] mytopic :: Started topic consumer
2016-05-31T23:42:24Z [My-Group/80b66df0c462] mytopic :: Claiming 0 of 3 partitions
2016-05-31T23:42:24Z [My-Group/80b66df0c462] mytopic :: Stopped topic consumer

The znode at /someroot/consumers/My-Group/ids is cleared after the connection is lost. From Kafka protocol doc:

The coordinator considers the consumer dead if it receives no heartbeat after this timeout in ms.

When the group got the connection back then there're no registered consumers. It stops but does not report this to the caller.

Sending an error in the errors channel would be a possible fix. It would notify the caller who could shut down the group and restart it.

Any thoughts?

@WangXijue
Copy link

I also encountered this problem. I add a goroutine checking registration and registering again after connection broken, it seems working.

@bhirbec
Copy link
Contributor Author

bhirbec commented Jun 9, 2016

@WangXijue
I solved my problem by sending an error into cg.errors channel like this (about consumer_group.go#L264):

cg.Logf("Currently registered consumers: %d\n", len(cg.consumers))
if len(cg.consumers) == 0 {
    cg.errors <- &sarama.ConsumerError{
        Topic:     "-",
        Partition: -1,
        Err:       fmt.Errorf("No consumer registered"),
    }
    return
}

This gives an opportunity to the caller to receive a notification by reading from Errors(). If this occurs then I close the group and restart it. It works ok.

side note: I found later that cg.errors doesn't receive any error from Sarama since config.Consumer.Return.Errors is false. By setting this to true Errors() should receive connection errors...

@WangXijue
Copy link

Uh... did you use the latest version?
I found the problem had been solved in this commit:
0479015

@bhirbec
Copy link
Contributor Author

bhirbec commented Jun 11, 2016

@WangXijue
I did miss this. Thanks for the update.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants