Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join request timeout doesn't consider rebalance_timeout #941

Closed
abicky opened this issue May 17, 2022 · 2 comments
Closed

Join request timeout doesn't consider rebalance_timeout #941

abicky opened this issue May 17, 2022 · 2 comments

Comments

@abicky
Copy link
Contributor

abicky commented May 17, 2022

  • Version of Ruby: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-darwin20]
  • Version of Kafka: 6.1.1-ccs (Commit:c209f70c6c2e52ae)
  • Version of ruby-kafka: 1.4.0 (555281e)
Steps to reproduce
require 'logger'
require 'kafka'

GROUP_ID = ENV.fetch('GROUP_ID')
TOPIC = ENV.fetch('TOPIC')
BROKERS = %w[127.0.0.1:9092]

PARTITIONS = 4

def create_consumer(kafka, heartbeat_interval: 10)
  consumer = kafka.consumer(group_id: GROUP_ID, session_timeout: 60, heartbeat_interval: heartbeat_interval)
  consumer.subscribe(TOPIC)
  consumer.each_message do |message|
  end
end

kafka = Kafka.new(BROKERS, logger: Logger.new($stdout).tap { |l| l.level = Logger::WARN })
unless kafka.topics.include?(TOPIC)
  kafka.create_topic(TOPIC, num_partitions: PARTITIONS)
end

logger = Logger.new($stdout)
Thread.new do
  command = "kafka-consumer-groups --bootstrap-server #{BROKERS.first} --describe --group #{GROUP_ID}"
  loop do
    logger.info(command)
    system(command)
    sleep 5
  end
end

logger.info 'Create the first consumer'
Thread.new { create_consumer(kafka, heartbeat_interval: 15) }

sleep 1

logger.info 'Create the second consumer'
Thread.new { create_consumer(kafka) }

sleep 20

logger.info 'Create the third consumer'
create_consumer(kafka)
Expected outcome

Join requests wait more than rebalance_timeout, and consumers don't resend join requests over and over as follows:

% TOPIC=test GROUP_ID=test ruby -Ilib ~/ghq/src/github.com/zendesk/ruby-kafka/join.rb
I, [2022-05-18T05:11:39.793155 #73498]  INFO -- : Create the first consumer
I, [2022-05-18T05:11:39.793328 #73498]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
I, [2022-05-18T05:11:40.797705 #73498]  INFO -- : Create the second consumer
I, [2022-05-18T05:11:46.171240 #73498]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
I, [2022-05-18T05:11:52.604724 #73498]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T05:11:54.869682 #73498]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
W, [2022-05-18T05:11:54.880615 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
I, [2022-05-18T05:11:58.944404 #73498]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-f15c8b56-11e9-44eb-b7e7-49d029d196ce /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-f15c8b56-11e9-44eb-b7e7-49d029d196ce /192.168.10.6   ruby-kafka
I, [2022-05-18T05:12:00.803351 #73498]  INFO -- : Create the third consumer
W, [2022-05-18T05:12:04.924389 #73498]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
I, [2022-05-18T05:12:05.292280 #73498]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T05:12:09.945293 #73498]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
W, [2022-05-18T05:12:09.971232 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
W, [2022-05-18T05:12:09.971736 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
W, [2022-05-18T05:12:09.971804 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
W, [2022-05-18T05:12:09.971899 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
W, [2022-05-18T05:12:09.971994 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
W, [2022-05-18T05:12:09.972171 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
W, [2022-05-18T05:12:09.972233 #73498]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
I, [2022-05-18T05:12:11.593278 #73498]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-99547132-a573-4fa8-9589-0db839c4bcf7 /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-2e0f4603-81ea-4f7a-8e51-05321c8a0bfc /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-f15c8b56-11e9-44eb-b7e7-49d029d196ce /192.168.10.6   ruby-kafka

Actually, the Java consumer waits rebalance.timeout.ms + 5 seconds.
cf. https://github.com/confluentinc/kafka/blob/c8fbe26f3bd3a7c018e7619deba002ee454208b9/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L568

The following sequence diagram describes the behavior:

sequenceDiagram
  participant member0
  participant member1
  participant member2
  participant coordinator

  member0->>+coordinator: join request
  Note right of coordinator: member0
  coordinator-->>-member0: member_id: member0

  member0->>+member0: wait 15 seconds (heartbeat_interval)

  member1->>+coordinator: join request
  Note right of coordinator: member0<br>member1

  member0->>+coordinator: join request (re-join)

  coordinator-->>-member1: member_id: member1
  coordinator-->>-member0: member_id: member0

  member0->>+member0: wait 15 seconds (heartbeat_interval)
  member1->>+member1: wait 10 seconds (heartbeat_interval)

  member2->>+coordinator: join request
  Note right of coordinator: member0<br>member1<br>member2

  member1->>+coordinator: join request
  member0->>+coordinator: join request

  coordinator-->>-member2: member_id: member2
  coordinator-->>-member1: member_id: member1
  coordinator-->>-member0: member_id: member0
Loading
Actual outcome

Join requests wait only socket_timeout, and consumers resend join requests over and over. As a result, it takes much time for the third consumer to join the consumer group as follows:

% TOPIC=test GROUP_ID=test ruby -Ilib join.rb
I, [2022-05-18T04:23:43.694969 #67039]  INFO -- : Create the first consumer
I, [2022-05-18T04:23:43.695116 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-27fcef6d-1bac-4cc1-baf6-c17825478b47 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-27fcef6d-1bac-4cc1-baf6-c17825478b47 /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-27fcef6d-1bac-4cc1-baf6-c17825478b47 /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-27fcef6d-1bac-4cc1-baf6-c17825478b47 /192.168.10.6   ruby-kafka
I, [2022-05-18T04:23:44.700268 #67039]  INFO -- : Create the second consumer
I, [2022-05-18T04:23:50.017389 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:23:54.716501 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:23:54.719050 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:23:56.329268 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T04:23:58.773595 #67039]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
W, [2022-05-18T04:23:58.784054 #67039]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
I, [2022-05-18T04:24:02.672127 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-27fcef6d-1bac-4cc1-baf6-c17825478b47 /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-27fcef6d-1bac-4cc1-baf6-c17825478b47 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-e56f93b0-a91b-43a5-9b93-d7369974b092 /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-1aee89a5-e92c-447c-98e2-925a4578e9e4 /192.168.10.6   ruby-kafka
I, [2022-05-18T04:24:04.704826 #67039]  INFO -- : Create the third consumer
W, [2022-05-18T04:24:08.822660 #67039]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
I, [2022-05-18T04:24:09.016520 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T04:24:13.844439 #67039]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
E, [2022-05-18T04:24:14.715577 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:14.716407 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:15.364225 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:18.828278 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 8
E, [2022-05-18T04:24:18.829447 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:21.686261 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:23.849696 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 13
E, [2022-05-18T04:24:23.850371 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
E, [2022-05-18T04:24:25.730622 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:25.731780 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:28.010968 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:29.841175 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:29.841614 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:34.336420 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test
E, [2022-05-18T04:24:34.861060 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:34.861540 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:36.745722 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:36.746393 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:40.653087 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test
E, [2022-05-18T04:24:40.855358 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:40.855543 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:45.873195 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:45.874013 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:46.956079 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test
E, [2022-05-18T04:24:47.761925 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:47.762241 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:51.869976 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:51.870366 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:24:53.261250 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:24:56.886507 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:56.887105 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
E, [2022-05-18T04:24:58.772598 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:24:58.773041 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
W, [2022-05-18T04:24:58.784093 #67039]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
-- snip --
W, [2022-05-18T04:24:58.791049 #67039]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
I, [2022-05-18T04:24:59.565488 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
I, [2022-05-18T04:25:05.906850 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T04:25:08.823248 #67039]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
E, [2022-05-18T04:25:09.782510 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:09.782876 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:12.219462 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T04:25:13.834707 #67039]  WARN -- : [[test] {}:] Error sending heartbeat: Kafka::RebalanceInProgress
I, [2022-05-18T04:25:18.513595 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test
E, [2022-05-18T04:25:18.829260 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 7
E, [2022-05-18T04:25:18.829779 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:25:20.794087 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:20.794752 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
E, [2022-05-18T04:25:23.839824 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 8
E, [2022-05-18T04:25:23.840392 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:24.851720 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:25:29.840465 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:29.842326 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:31.174084 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test
E, [2022-05-18T04:25:31.804583 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:31.804929 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:25:34.852361 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:34.852849 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:37.496698 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:25:40.851438 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:40.852258 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
E, [2022-05-18T04:25:42.815102 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:42.815827 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:43.788145 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:25:45.862835 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:45.863856 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:50.076594 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

Warning: Consumer group 'test' is rebalancing.
E, [2022-05-18T04:25:51.865437 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:51.868033 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
E, [2022-05-18T04:25:53.826510 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:53.826818 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...
I, [2022-05-18T04:25:56.663265 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test
E, [2022-05-18T04:25:56.879188 #67039] ERROR -- : [[test] {}:] [join_group] Timed out while waiting for response 2
E, [2022-05-18T04:25:56.879733 #67039] ERROR -- : [[test] {}:] Connection error while trying to join group `test`; retrying...

Warning: Consumer group 'test' is rebalancing.
W, [2022-05-18T04:25:58.790792 #67039]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
-- snip --
W, [2022-05-18T04:25:58.799624 #67039]  WARN -- : [[test] {}:] Skipping stale messages buffered prior to reset
I, [2022-05-18T04:26:02.984722 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-d0188b7a-4c5e-402b-a09e-a55f8919a4a9 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-8561c724-1d76-4274-907b-599fde60ff98 /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-e56f93b0-a91b-43a5-9b93-d7369974b092 /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-e9540f4c-38a9-45a2-965a-ea0e8d881d4a /192.168.10.6   ruby-kafka
I, [2022-05-18T04:26:09.323409 #67039]  INFO -- : kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --describe --group test

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
test            test            0          -               0               -               ruby-kafka-d0188b7a-4c5e-402b-a09e-a55f8919a4a9 /192.168.10.6   ruby-kafka
test            test            1          -               0               -               ruby-kafka-8561c724-1d76-4274-907b-599fde60ff98 /192.168.10.6   ruby-kafka
test            test            2          -               0               -               ruby-kafka-e56f93b0-a91b-43a5-9b93-d7369974b092 /192.168.10.6   ruby-kafka
test            test            3          -               0               -               ruby-kafka-e9540f4c-38a9-45a2-965a-ea0e8d881d4a /192.168.10.6   ruby-kafka
^CW, [2022-05-18T04:26:10.580681 #67039]  WARN -- : [[test] {}:] Received signal , shutting down

The following sequence diagram describes the behavior:

sequenceDiagram
  participant member0
  participant member1
  participant member2
  participant coordinator

  member0->>+coordinator: join request
  Note right of coordinator: member0
  coordinator-->>-member0: member_id: member0

  member0->>+member0: wait 15 seconds (heartbeat_interval)

  member1->>+coordinator: join request
  Note right of coordinator: member0<br>member1

  member1->>member1: wait 10 seconds (socket_timeout)

  member1->>coordinator: join request (re-send)
  Note right of coordinator: member0<br>member1<br>member1'

  member0->>+coordinator: join request (re-join)

  coordinator-->>-member1: member_id: member1'
  coordinator-->>-member0: member_id: member0

  member0->>+member0: wait 15 seconds (heartbeat_interval)
  member1->>+member1: wait 10 seconds (heartbeat_interval)

  member2->>+coordinator: join request
  Note right of coordinator: member0<br>member1<br>member1'<br>member2

  member1->>+coordinator: join request
  member0->>+coordinator: join request

  coordinator->>+coordinator: wait for member1 to join (rebalance_timeout or session_timeout)
  Note right of coordinator: member0<br>member1'<br>member2

  coordinator-->>-member2: member_id: member2
  coordinator-->>-member1: member_id: member1'
  coordinator-->>-member0: member_id: member0
Loading
abicky added a commit to abicky/ruby-kafka that referenced this issue May 17, 2022
This commit resolves zendesk#941.

As the coorinator waits for each member to rejoin when rebalancing
the group, the timeout of join request should be greater than
rebalance_timeout. Otherwise, each member might send join requests
over and over, and as a result, the following problems occur:

* Some partitions are never processed until session_timeout passes
* It takes much time for rebalance that occurs in a short time to finish
abicky added a commit to abicky/ruby-kafka that referenced this issue May 17, 2022
This commit resolves zendesk#941.

As the coorinator waits for each member to rejoin when rebalancing
the group, the timeout of join request should be greater than
rebalance_timeout. Otherwise, each member might send join requests
over and over, and as a result, the following problems occur:

* Some partitions are never processed until session_timeout passes
* It takes much time for rebalance that occurs in a short time to finish
abicky added a commit to abicky/ruby-kafka that referenced this issue May 17, 2022
This commit resolves zendesk#941.

As the coorinator waits for each member to rejoin when rebalancing
the group, the timeout of join request should be greater than
rebalance_timeout. Otherwise, each member might send join requests
over and over, and as a result, the following problems occur:

* Some partitions are never processed until session_timeout passes
* It takes much time for rebalance that occurs in a short time to finish
abicky added a commit to abicky/ruby-kafka that referenced this issue May 17, 2022
This commit resolves zendesk#941.

As the coorinator waits for each member to rejoin when rebalancing
the group, the timeout of join request should be greater than
rebalance_timeout. Otherwise, each member might send join requests
over and over, and as a result, the following problems occur:

* Some partitions are never processed until session_timeout passes
* It takes much time for rebalance that occurs in a short time to finish
@abicky
Copy link
Contributor Author

abicky commented May 17, 2022

I've created the PR #943.

abicky added a commit to abicky/ruby-kafka that referenced this issue May 18, 2022
This commit resolves zendesk#941.

As the coordinator waits for each member to rejoin when rebalancing
the group, the timeout of join request should be greater than
rebalance_timeout. Otherwise, each member might send join requests
over and over, and as a result, the following problems occur:

* Some partitions are never processed until session_timeout passes
* It takes much time for rebalance that occurs in a short time to finish
@abicky abicky changed the title Join request doesn't consider rebalance_timeout Join request timeout doesn't consider rebalance_timeout May 18, 2022
abicky added a commit to abicky/ruby-kafka that referenced this issue Jun 11, 2022
This commit resolves zendesk#941.

As the coordinator waits for each member to rejoin when rebalancing
the group, the timeout of join request should be greater than
rebalance_timeout. Otherwise, each member might send join requests
over and over, and as a result, the following problems occur:

* Some partitions are never processed until session_timeout passes
* It takes much time for rebalance that occurs in a short time to finish
@github-actions
Copy link

Issue has been marked as stale due to a lack of activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant