Improve Kafka Management API performance and fix consumer thread-safety by switching to async AdminClient calls and batching lag calculation#276
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request description
This PR refactors Kafka Management endpoints to provide async
/v2variants and removes usage ofKafkaConsumer#endOffsets(...)inside multi-threaded request handling, which previously caused consumer multi-thread access issues. Consumer group lag calculation is now performed viaAdminClientand is optimized by batching offset/end-offset requests to reduce the number of broker round-trips.Motivation / Problem
The original consumer-groups implementation:
KafkaConsumer#endOffsets(...)per group,Changes
Added async REST endpoints:
GET /cluster-info/v2GET /kafka-topics/v2GET /consumer-groups/v2These return
CompletableFuture<...>and preserve existing/endpoints for backward compatibility.Refactored backend service methods to async equivalents:
getClusterInfoAsync(),getTopicsAsync(PageLink),getConsumerGroupsAsync(PageLink)Replaced lag calculation based on
KafkaConsumer#endOffsets(...)withAdminClient#listOffsets(...)to avoid non-thread-safe consumer usage.Optimized lag computation by batching:
listConsumerGroupOffsets(Map<groupId, spec>))Preserved existing pagination/filtering/sorting behavior and error handling via
CompletionExceptionwrapping for controllercheckNotNull(...).Performance impact
Reduces lag computation from
O(number_of_groups)Admin round-trips (per-group offsets + per-group listOffsets) to:Expected to significantly reduce API latency on clusters with many consumer groups.
General checklist
Front-End feature checklist
Back-End feature checklist