Skip to content

Improve Kafka Management API performance and fix consumer thread-safety by switching to async AdminClient calls and batching lag calculation#276

Merged
dmytro-landiak merged 5 commits into
mainfrom
kafka-admin
Dec 15, 2025
Merged

Improve Kafka Management API performance and fix consumer thread-safety by switching to async AdminClient calls and batching lag calculation#276
dmytro-landiak merged 5 commits into
mainfrom
kafka-admin

Conversation

@dmytro-landiak
Copy link
Copy Markdown
Contributor

@dmytro-landiak dmytro-landiak commented Dec 15, 2025

Pull Request description

This PR refactors Kafka Management endpoints to provide async /v2 variants and removes usage of KafkaConsumer#endOffsets(...) inside multi-threaded request handling, which previously caused consumer multi-thread access issues. Consumer group lag calculation is now performed via AdminClient and is optimized by batching offset/end-offset requests to reduce the number of broker round-trips.

Motivation / Problem

The original consumer-groups implementation:

  • computed lag by calling KafkaConsumer#endOffsets(...) per group,
  • ran in a flow that could be executed concurrently, resulting in KafkaConsumer multi-thread access issues (KafkaConsumer is not thread-safe),
  • performed many sequential broker calls per request, leading to high latency.

Changes

  • Added async REST endpoints:

    • GET /cluster-info/v2
    • GET /kafka-topics/v2
    • GET /consumer-groups/v2
      These return CompletableFuture<...> and preserve existing / endpoints for backward compatibility.
  • Refactored backend service methods to async equivalents:

    • getClusterInfoAsync(), getTopicsAsync(PageLink), getConsumerGroupsAsync(PageLink)
  • Replaced lag calculation based on KafkaConsumer#endOffsets(...) with AdminClient#listOffsets(...) to avoid non-thread-safe consumer usage.

  • Optimized lag computation by batching:

    • fetch committed offsets for all groups in one request (listConsumerGroupOffsets(Map<groupId, spec>))
    • fetch end offsets for the union of partitions (with optional batching by partition count)
  • Preserved existing pagination/filtering/sorting behavior and error handling via CompletionException wrapping for controller checkNotNull(...).

Performance impact

  • Reduces lag computation from O(number_of_groups) Admin round-trips (per-group offsets + per-group listOffsets) to:

    • 1 request to fetch committed offsets for all groups
    • 1..N requests to fetch end offsets (depending on partition batching)
  • Expected to significantly reduce API latency on clusters with many consumer groups.

General checklist

  • You have reviewed the guidelines document.
  • Labels that classify your pull request have been added.
  • The milestone is specified and corresponds to fix version.
  • Description references specific issue.
  • Description contains human-readable scope of changes.
  • Description contains brief notes about what needs to be added to the documentation.
  • No merge conflicts, commented blocks of code, code formatting issues.
  • Changes are backward compatible or upgrade script is provided.

Front-End feature checklist

  • Screenshots with affected component(s) are added. The best option is to provide 2 screens: before and after changes;
  • If you change the widget or other API, ensure it is backward-compatible or upgrade script is present.

Back-End feature checklist

  • Added corresponding unit and/or integration test(s). Provide written explanation in the PR description if you have failed to add tests.
  • If new dependency was added: the dependency tree is checked for conflicts.

@dmytro-landiak dmytro-landiak added this to the 2.3.0 milestone Dec 15, 2025
@dmytro-landiak dmytro-landiak added Bug Something isn't working Core Minor improvement to Core services labels Dec 15, 2025
@dmytro-landiak dmytro-landiak changed the title Kafka admin Improve Kafka Management API performance and fix consumer thread-safety by switching to async AdminClient calls and batching lag calculation Dec 15, 2025
@dmytro-landiak dmytro-landiak merged commit 01bbe92 into main Dec 15, 2025
0 of 2 checks passed
@dmytro-landiak dmytro-landiak deleted the kafka-admin branch December 15, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Core Minor improvement to Core services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant