Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Specified group generation id is not valid #1009

Closed
vicmerlis opened this issue Jan 28, 2021 · 8 comments · Fixed by patrykwegrzyn/kafkajs#1
Closed

[Question] Specified group generation id is not valid #1009

vicmerlis opened this issue Jan 28, 2021 · 8 comments · Fixed by patrykwegrzyn/kafkajs#1
Labels

Comments

@vicmerlis
Copy link

Describe the bug
Consumers (sometimes) encounters the following exception: Specified group generation id is not valid. The consumers that encounters that error become a kind of Zombie. They are still connected as consumers to a partition but not consuming messages.

To Reproduce
Can't reproduce

Expected behavior
The consumer should reconnect to the consumer grouop.

Observed behavior
Logs:

  1. [ConsumerGroup] Consumer has joined the group
  2. The group is rebalancing, so a rejoin is needed
  3. Specified group generation id is not valid

Environment:

  • OS: [alpine]
  • KafkaJS version [1.15.0]
  • Kafka version [2.4.1]
  • NodeJS version [14]

Additional context
It's probably not related to KafkaJs, but there is a mention of that error error.js and maybe you have any idea why it's happening?

@tulios
Copy link
Owner

tulios commented Jan 28, 2021

That's a server error, how are you generating your group ids?

@vicmerlis
Copy link
Author

the groupId looks like: ${Environment}-staticString

this._Kafka = new KafkaJs({brokers: [host], ...clientOptions});
const groupId = options.groupId;
const currentOptions = Object.assign({}, options, {topic: topic});
const consumer = this._Kafka.consumer(currentOptions);
await consumer.connect();
await consumer.subscribe(currentOptions);
consumer.run({
    eachMessage: async ({topic, partition, message}) => {
        return this._handleConsumerMessages({topic, message, groupId, partition});
    }, ...currentOptions
});

@Nevon
Copy link
Collaborator

Nevon commented Jan 28, 2021

The groupGenerationId is something we get from the broker (generation_id) in the JoinGroup response, and it's just a number that increments with each generation in the group.

You'll get this error when you try to commit after having been kicked out of the consumer group. This could for example happen if you spend too long in between heartbeats (because you're processing a single message for too long, for example). What should happen is that you should re-join the group and get the new generation id to use.

It would be helpful if you could run with DEBUG log level so that we can see what requests are being made when this happens.

The consumers that encounters that error become a kind of Zombie. They are still connected as consumers to a partition but not consuming messages.

This is a shot in the dark, but do you maybe have more consumer instances than you do partitions? If so, some of your consumers will not be assigned any partitions, and thus won't be doing any work.

@vicmerlis
Copy link
Author

I'll enable debug log level and will post once the error occurs.

Regarding more consumers than partitions - we are running on ECS with autoscaling, max tasks=60 (each task = 1 consumer). The topic configured with 60 partitions. I'm pretty sure that we didn't reached to the max number of tasks, but i'll check it also once the error occurs.

@mremick
Copy link

mremick commented Jan 28, 2021

Hello, I'm having a similar issue. Could increasing the heartbeatInterval and sessionTimeout potentially fix the issue? I think I'm taking too long while processing when making a network request with high latency.

@Nevon
Copy link
Collaborator

Nevon commented Jan 29, 2021

Could increasing the heartbeatInterval and sessionTimeout potentially fix the issue?

Yes. You have to tweak those to fit your application behavior.

@mremick
Copy link

mremick commented Jan 29, 2021

Thanks. It fixed my issue.

@tulios tulios closed this as completed Feb 15, 2021
jakewins added a commit to jakewins/kafkajs that referenced this issue Nov 2, 2022
This error is indicating that the consumer is trying to commit
offsets, but the consumer group has changed to a new generation.

Retrying within the existing session will indeed not work, but
rejoining the group and re-trying should be successful.

Fixes tulios#1009
jakewins added a commit to jakewins/kafkajs that referenced this issue Nov 2, 2022
This error is indicating that the consumer is trying to commit
offsets, but the consumer group has changed to a new generation.

Retrying within the existing session will indeed not work, but
rejoining the group and re-trying should be successful.

Fixes tulios#1009
@guiestimoneon
Copy link

Hello guys

I am having this issue when I scale my application horizontally. The pod is processing normally and out of nowhere I get this error:

image

I suspect a rebalance has occurred and the pod still tries to commit a message.
I've tried the above solutions but to no avail.

Nevon pushed a commit to jakewins/kafkajs that referenced this issue Feb 27, 2023
This error is indicating that the consumer is trying to commit
offsets, but the consumer group has changed to a new generation.

Retrying within the existing session will indeed not work, but
rejoining the group and re-trying should be successful.

Fixes tulios#1009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants