KAFKA-19397: Ensure consistent metadata usage in produce request and response #19964

OmniaGM · 2025-06-13T14:29:34Z

Metadata doesn't have the full view of topicNames to ids during
rebootstrap of client or when topic has been deleted/recreated. The
solution is to pass down topic id and stop trying to figure it out later
in the logic.

junrao

@OmniaGM : Thanks for the PR. Overall, it looks good. A few comments below. Also, could we add a unit test?

@lucasbru : Could you test this PR with the stream job?

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

lucasbru · 2025-06-16T09:45:54Z

@junrao I redeployed the soak with the fix. I will report back if the problem reoccurs

lucasbru · 2025-06-17T08:39:36Z

@OmniaGM @junrao I have been running the soak for 24h, and it's looking good.

chia7712

@OmniaGM thanks for this patch. a couple of comments are left. PTAL

chia7712 · 2025-06-17T13:51:12Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

+                    // topic name in the response might be empty.
+                    ProducerBatch batch = batches.entrySet().stream()
+                            .filter(entry ->
+                                    entry.getKey().same(new TopicIdPartition(r.topicId(), p.index(), r.name()))


Should we create the TopicIdPartition outside the stream to avoid creating many temporary objects?

moved the initalisation out of stream

chia7712 · 2025-06-17T13:52:31Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

+                    ProducerBatch batch = batches.entrySet().stream()
+                            .filter(entry ->
+                                    entry.getKey().same(new TopicIdPartition(r.topicId(), p.index(), r.name()))
+                            ).map(Map.Entry::getValue).findFirst().orElse(null);


It is possible to have null batch, right? For example, the topic is recreated after the batch is generated

We always had this potentail of batch is null this why I raised the comment here #19964 (comment) that should we have IllegalStateException. I updated this to fail with IllegalStateException instead of leaving it like this

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

OmniaGM · 2025-06-17T16:16:47Z

I have added some test in ProducerSendWhileDeletionTest to cover recreation while producing as well hope this will be enough to cover these cases.

kirktrue

Thanks for the PR @OmniaGM!

A few very minor comments.

Thanks!

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

lucasbru

Needs a rebase, but LGTM

- Metadata doesn't have the full view of topicNames to ids during rebootstrap of client or when topic has been deleted/recreated. The solution is to pass down topic id and stop trying to figure it out later in the logic.

junrao

@OmniaGM : Thanks for the updated PR. A few more comments.

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

junrao · 2025-06-19T07:38:28Z

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

+            return topicId.equals(tpId.topicId) &&
+                    topicPartition.partition() == tpId.partition();
+        } else {
+            return topicPartition.equals(tpId.topicPartition());


In the rare case that Sender::topicIdsForBatches returns 0 topic id (e.g. topic is deleted), we will pass along topicName -> 0 to handleProduceResponse(). The response will include empty topic and 0 topic id. It's important that we find a match in this case to avoid IllegalStateException. I am thinking that we should first try to do the comparison on topic name, if it's not empty. Otherwise, just do the comparison on topic id even if it's zero.

What about change ProducerBatch to use TopicIdPartition instead of TopicPartition this will simplify things instead of all of these checks and metadata lookups also I wouldn' tneed to filter batches to find the topicId

In the rare case that Sender::topicIdsForBatches returns 0 topic id (e.g. topic is deleted), we will pass along topicName -> 0 to handleProduceResponse(). The response will include empty topic and 0 topic id. It's important that we find a match in this case to avoid IllegalStateException. I am thinking that we should first try to do the comparison on topic name, if it's not empty. Otherwise, just do the comparison on topic id even if it's zero.

Thinking out loud here shouldn't this be true for TopicIdPartition::equals as we might have this rare situation in other places in the code where topic id partition metadata wouldn't exist fully? Specially that right now not all Kafka a topic id aware yet.

Actually, what I said about topic deletion wasn't correct. sendProduceRequest() is called in the Sender thread and at that point, the metadata cached in the client won't change. We drain ProducerBatch based on the topic/partition in the metadata. The topicIds map created is based on the same metadata. So, if a partition is included in a produce request, the topicIds should always include that topic (assuming the topic Id is supported on the server side), even though the server may have deleted that topic. So, this code is fine.

Regarding whether to use TopicIdPartition in ProducerBatch. In the rare case, topicId could change over time for the same topic. So, we probably can't store TopicIdPartition as a final field when ProducerBatch is created. The easiest way to do this is probably to bind the topicId when the ProducerBatch is drained, based on the metadata at that time. This is more or less what this PR does.

core/src/test/scala/integration/kafka/api/ProducerSendWhileDeletionTest.scala

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

junrao · 2025-06-19T08:01:31Z

core/src/test/scala/integration/kafka/api/ProducerSendWhileDeletionTest.scala

+        topicMetadata(admin, topic).topicId()
+      } else Uuid.ZERO_UUID
+      // don't wait for the physical delete
+      deleteTopicWithAdminRaw(admin, topic)


Hmm, deleteTopicWithAdminRaw() doesn't wait for the metadata propagation to the brokers. However, the producer only sees the deleted topic after the metadata is propagated. Is this test effective?

Delete topic usually faster at deleting the metadata first however deleteTopic without the admin raw waits for deleting the hard partitions which might take longer.

junrao · 2025-06-19T08:03:18Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

+                    // We need to find batch based on topic id and partition index only as
+                    // topic name in the response might be empty.
+                    TopicIdPartition tpId = new TopicIdPartition(r.topicId(), p.index(), r.name());
+                    ProducerBatch batch = batches.entrySet().stream()


This changes a map lookup to an iteration. Could we do some produce perf test (with multiple topic/partitions) to verify there is no performance degradation?

An alternative is for handleProduceResponse() to take a Map<TopicPartition, ProducerBatch> and a Map<UUID, String>. If the response has non-zero topicId, we look up the second map to find the topic name and then use the first map to find the batch. Otherwise, we look up the first map using the topic name.

This changes a map lookup to an iteration. Could we do some produce perf test (with multiple topic/partitions) to verify there is no performance degradation?
The perf of this

5000000 records sent

pr trunk

records/sec 2709.9 2454.8

MB/sec 2.65 2.40

avg latency 11110.81 ms 12260.82 ms

max latency 47484.00ms 46715.00 ms

50th 10147 ms 11341 ms

95th 22677 ms 24620 ms

99th 27950 ms 30265 ms

99.9th 33468 ms 36318 ms

An alternative is for handleProduceResponse() to take a Map<TopicPartition, ProducerBatch> and a Map<UUID, String>. If the response has non-zero topicId, we look up the second map to find the topic name and then use the first map to find the batch. Otherwise, we look up the first map using the topic name.

I update the code to do this instead as we seems to have already all these data ready to send out to handleProduceResponse
running the perf test one last time with this updata and will report on this.

for 5000000 records sent, record-size 1024, linger.ms=100 batch.size=10000
topic 1000 partitions 3 replicas

old implementation in the pr second approach trunk

records/sec 2709.9 2316.2 2454.8

MB/sec 2.65 2.26 2.40

avg latency 11110.81 ms 12993.45 ms 12260.82 ms

max latency 47484.00ms 51446.00 ms 46715.00 ms

50th 10147 ms 11952 ms 11341 ms

95th 22677 ms 26238 m 24620 ms

99th 27950 ms 32202 ms 30265 ms

99.9th 33468 ms 38580 ms 36318 ms

….java Co-authored-by: Kirk True <kirk@kirktrue.pro>

chia7712 · 2025-06-22T19:03:28Z

@OmniaGM could you please fix the conflicts?

junrao

@OmniaGM : Thanks for the updated PR. A few more comments.

junrao · 2025-06-23T21:05:06Z

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

+            return topicId.equals(tpId.topicId) &&
+                    topicPartition.partition() == tpId.partition();
+        } else {
+            return topicPartition.equals(tpId.topicPartition());


Actually, what I said about topic deletion wasn't correct. sendProduceRequest() is called in the Sender thread and at that point, the metadata cached in the client won't change. We drain ProducerBatch based on the topic/partition in the metadata. The topicIds map created is based on the same metadata. So, if a partition is included in a produce request, the topicIds should always include that topic (assuming the topic Id is supported on the server side), even though the server may have deleted that topic. So, this code is fine.

Regarding whether to use TopicIdPartition in ProducerBatch. In the rare case, topicId could change over time for the same topic. So, we probably can't store TopicIdPartition as a final field when ProducerBatch is created. The easiest way to do this is probably to bind the topicId when the ProducerBatch is drained, based on the metadata at that time. This is more or less what this PR does.

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

junrao · 2025-06-23T21:16:37Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

+                    // We need to find batch based on topic id and partition index only as
+                    // topic name in the response might be empty.
+                    TopicIdPartition tpId = new TopicIdPartition(r.topicId(), p.index(), r.name());
+                    ProducerBatch batch = batches.entrySet().stream()


An alternative is for handleProduceResponse() to take a Map<TopicPartition, ProducerBatch> and a Map<UUID, String>. If the response has non-zero topicId, we look up the second map to find the topic name and then use the first map to find the batch. Otherwise, we look up the first map using the topic name.

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

junrao · 2025-06-23T21:25:00Z

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

+        recreateTopicFuture.join();
+        producerFutures.forEach(CompletableFuture::join);
+        assertTrue(Math.abs(maxNumRecreatTopicAttempts - topicIds.size()) <= 5);
+        assertEquals(20, numSuccess.intValue());


20 => 2 * numRecords ?

junrao · 2025-06-23T21:29:25Z

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

+        })).toList();
+        recreateTopicFuture.join();
+        producerFutures.forEach(CompletableFuture::join);
+        assertTrue(Math.abs(maxNumRecreatTopicAttempts - topicIds.size()) <= 5);


Hmm, why doesn't each topic creation create a new topic Id?

It happeneds when delete take longer and when topic get created again we hit TopicExistsException

Could we create admin with cluster.admin(Map.of(), true)? This way, all admin requests are sent to the controller. Since the controller always has the latest metadata, it seems that we should never hit TopicExistsException.

or we could keep recreating the topic until there are "enough" new topic ids?

var recreateTopicFuture = CompletableFuture.supplyAsync(() -> { var topicIds = new HashSet<Uuid>(); while (topicIds.size() < maxNumRecreatTopicAttempts) { try (var admin = cluster.admin()) { if (admin.listTopics().names().get().contains(topic)) { admin.deleteTopics(List.of(topic)).all().get(); } topicIds.add(admin.createTopics(List.of(new NewTopic(topic, 1, (short) 1))).topicId(topic).get()); } catch (Exception e) { // ignore } } return topicIds; }); var topicIds = recreateTopicFuture.join(); assertEquals(maxNumRecreatTopicAttempts, topicIds.size());

Could we create admin with cluster.admin(Map.of(), true)? This way, all admin requests are sent to the controller. Since the controller always has the latest metadata, it seems that we should never hit TopicExistsException.

I don't believe we can use admin client with controller bootstrap for create, delete or even list topics as these uses LeastLoadedNodeProvider instead of ControllerNodeProvider

recreate until we have enough topic ids is better approach here i think

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java

junrao · 2025-06-23T21:39:48Z

Also, regarding the title of the PR. "Not relaying on metadata to map between topic id and name".

We are still relying on the metadata to map topic id and name. We just want to use consistent metadata between generating the produce request and handing the produce response.

junrao

@OmniaGM : Thanks for the updated PR. A couple of more comments. Also, are the test failures related to this PR?

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

junrao · 2025-06-26T17:23:57Z

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

+        })).toList();
+        recreateTopicFuture.join();
+        producerFutures.forEach(CompletableFuture::join);
+        assertTrue(Math.abs(maxNumRecreatTopicAttempts - topicIds.size()) <= 5);


Could we create admin with cluster.admin(Map.of(), true)? This way, all admin requests are sent to the controller. Since the controller always has the latest metadata, it seems that we should never hit TopicExistsException.

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

chia7712 · 2025-06-26T20:04:27Z

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

+        })).toList();
+        recreateTopicFuture.join();
+        producerFutures.forEach(CompletableFuture::join);
+        assertTrue(Math.abs(maxNumRecreatTopicAttempts - topicIds.size()) <= 5);


or we could keep recreating the topic until there are "enough" new topic ids?

var recreateTopicFuture = CompletableFuture.supplyAsync(() -> { var topicIds = new HashSet<Uuid>(); while (topicIds.size() < maxNumRecreatTopicAttempts) { try (var admin = cluster.admin()) { if (admin.listTopics().names().get().contains(topic)) { admin.deleteTopics(List.of(topic)).all().get(); } topicIds.add(admin.createTopics(List.of(new NewTopic(topic, 1, (short) 1))).topicId(topic).get()); } catch (Exception e) { // ignore } } return topicIds; }); var topicIds = recreateTopicFuture.join(); assertEquals(maxNumRecreatTopicAttempts, topicIds.size());

kirktrue

Thanks for the PR @OmniaGM!

Left a few comments, mostly minor.

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

kirktrue · 2025-06-26T23:00:54Z

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

+        int maxNumRecreatTopicAttempts = 10;
+        List<Uuid> topicIds = new CopyOnWriteArrayList<>();
+        var recreateTopicFuture = CompletableFuture.runAsync(() -> {
+            for (int i = 1; i <= maxNumRecreatTopicAttempts; i++) {


Nitpicky, but is there a reason not to start i at 0? I don't mind, I just want to make sure I'm not missing something.

I was following the rest of the pattern in the file which append number from 1 to 10 to record value.

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

OmniaGM · 2025-06-27T11:24:20Z

I will run perf test one more time this time with some topic recreation in the background while am waiting for the pipline to finish to ensure everything is okay.

OmniaGM · 2025-06-27T12:27:32Z

last perf tests results
5000000 records sent, 2378.4 records/sec (2.32 MB/sec), 12647.78 ms avg latency, 55251.00 ms max latency, 11555 ms 50th, 25864 ms 95th, 32131 ms 99th, 39096 ms 99.9th.

junrao

@OmniaGM : Thanks for the updated PR. One more comment.

junrao · 2025-06-27T16:18:32Z

...ion-tests/src/test/java/org/apache/kafka/clients/producer/ProducerSendWhileDeletionTest.java

+                                if (metadata != null) {
+                                    numSuccess.incrementAndGet();
+                                }
+                            }).get();


It's possible for get() to throw an exception like TopicNotExist, right? If that happens, the assertion on numRecords will fail.

github-actions bot added triage producer clients small labels Jun 13, 2025

OmniaGM force-pushed the KAFKA-19397 branch from b90445d to 75129f7 Compare June 13, 2025 14:37

OmniaGM mentioned this pull request Jun 13, 2025

KAFKA-10551: Add topic id support to produce request and response #15968

Merged

chia7712 self-requested a review June 13, 2025 16:21

chia7712 added the ci-approved label Jun 13, 2025

OmniaGM force-pushed the KAFKA-19397 branch from 75129f7 to 324230c Compare June 13, 2025 17:29

junrao reviewed Jun 14, 2025

View reviewed changes

github-actions bot removed the triage label Jun 15, 2025

chia7712 reviewed Jun 17, 2025

View reviewed changes

github-actions bot added core and removed small labels Jun 17, 2025

OmniaGM force-pushed the KAFKA-19397 branch from 5fdd621 to 584740a Compare June 17, 2025 16:10

kirktrue reviewed Jun 17, 2025

View reviewed changes

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java Outdated Show resolved Hide resolved

clients/src/main/java/org/apache/kafka/common/TopicIdPartition.java Outdated Show resolved Hide resolved

lucasbru approved these changes Jun 18, 2025

View reviewed changes

OmniaGM added 2 commits June 18, 2025 14:56

Address feedback

d34e32b

junrao reviewed Jun 19, 2025

View reviewed changes

OmniaGM and others added 5 commits June 22, 2025 02:34

Add test cases to keep recreate topic while producing

c0ca405

address chia's feedback

f1dd93d

throw IllegalStateException instead of failBatch when batch is null

9a9f0c1

Update clients/src/main/java/org/apache/kafka/common/TopicIdPartition…

7554e2a

….java Co-authored-by: Kirk True <kirk@kirktrue.pro>

address feedback

1852043

OmniaGM force-pushed the KAFKA-19397 branch from f47ed06 to 7a78333 Compare June 22, 2025 05:07

github-actions bot added the small label Jun 22, 2025

OmniaGM force-pushed the KAFKA-19397 branch from 7a78333 to 0c71b5c Compare June 22, 2025 05:14

fix failed test

0a83b2b

OmniaGM force-pushed the KAFKA-19397 branch from 0c71b5c to 0a83b2b Compare June 22, 2025 19:00

junrao reviewed Jun 23, 2025

View reviewed changes

address feedback

42c9387

OmniaGM changed the title ~~KAFKA-19397: Not relaying on metadata to map between topic id and name.~~ KAFKA-19397: Ensure consistent metadata usage in produce request and response Jun 26, 2025

OmniaGM added 3 commits June 26, 2025 12:38

address feedback

142b225

Merge remote-tracking branch 'apache/trunk' into KAFKA-19397

c5d97ea

fix test

04d9b58

junrao reviewed Jun 26, 2025

View reviewed changes

chia7712 reviewed Jun 26, 2025

View reviewed changes

kirktrue reviewed Jun 26, 2025

View reviewed changes

address feedback

9d876b7

junrao reviewed Jun 27, 2025

View reviewed changes

	pr	trunk
records/sec	2709.9	2454.8
MB/sec	2.65	2.40
avg latency	11110.81 ms	12260.82 ms
max latency	47484.00ms	46715.00 ms
50th	10147 ms	11341 ms
95th	22677 ms	24620 ms
99th	27950 ms	30265 ms
99.9th	33468 ms	36318 ms

	old implementation in the pr	second approach	trunk
records/sec	2709.9	2316.2	2454.8
MB/sec	2.65	2.26	2.40
avg latency	11110.81 ms	12993.45 ms	12260.82 ms
max latency	47484.00ms	51446.00 ms	46715.00 ms
50th	10147 ms	11952 ms	11341 ms
95th	22677 ms	26238 m	24620 ms
99th	27950 ms	32202 ms	30265 ms
99.9th	33468 ms	38580 ms	36318 ms

KAFKA-19397: Ensure consistent metadata usage in produce request and response #19964

Are you sure you want to change the base?

KAFKA-19397: Ensure consistent metadata usage in produce request and response #19964

Conversation

OmniaGM commented Jun 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucasbru commented Jun 16, 2025

Uh oh!

lucasbru commented Jun 17, 2025

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OmniaGM Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OmniaGM commented Jun 17, 2025

Uh oh!

kirktrue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lucasbru left a comment

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OmniaGM Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OmniaGM Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chia7712 commented Jun 22, 2025

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

OmniaGM commented Jun 13, 2025 •

edited by github-actions bot

Loading

OmniaGM Jun 17, 2025 •

edited

Loading

OmniaGM Jun 23, 2025 •

edited

Loading

OmniaGM Jun 22, 2025 •

edited

Loading