Skip to content

KAFKA-19357: AsyncConsumer#close hangs during closing because the commitAsync request never completes due to a missing coordinator #19914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

Mirai1129
Copy link
Contributor

@Mirai1129 Mirai1129 commented Jun 6, 2025

Problem:
When AsyncConsumer is closing, CoordinatorRequestManager stops looking for coordinator by returning EMPTY in poll() method when closing flag is true.
This prevents commitAsync() and other coordinator-dependent operations from completing, causing close() to hang indefinitely.

Solution:
Remove the closing flag check in poll() method of CoordinatorRequestManager, so it continues to look for coordinator when needed, even during closing state. This ensures pending coordinator-dependent operations can complete during shutdown.

@github-actions github-actions bot added triage PRs from the community consumer clients small Small PRs labels Jun 6, 2025
@@ -99,7 +99,7 @@ public void signalClose() {
*/
@Override
public NetworkClientDelegate.PollResult poll(final long currentTimeMs) {
if (closing || this.coordinator != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the consumer find the coordinator during closing, right? If the consumer doesn't have a coordinator running, does it make sense to find one during closing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we do need to be careful. There is also code to make sure that finding a coordinator does not block the progress of closing. If you start a consumer when there are no running brokers, close needs to complete promptly. If you stop the brokers when a consumer has been running and then attempt a close, it also needs to complete promptly.

@@ -99,7 +99,7 @@ public void signalClose() {
*/
@Override
public NetworkClientDelegate.PollResult poll(final long currentTimeMs) {
if (closing || this.coordinator != null)
if (this.coordinator != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, left a comment.
Should we add an IT for this scenario: when closing the consumer, commitAsync, and verify that the consumer can shut down properly?

@github-actions github-actions bot removed the triage PRs from the community label Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants