Skip to content

[improve][broker] Improve the performance of TopicName #24463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Jun 25, 2025

Motivation

The TopicName's constructor has poor performance:

  • NamespaceName#get is very slow
  • Splitter.on("/").splitToList(rest) is slow
  • String.format is slower than the + operation on strings and completeTopicName is unnecessarily created again

Modifications

  • Initialize NamespaceName in a lazy way
  • Replace Splitter with a manually written splitBySlash introduced from [fix][proxy] Fix proxy OOM by replacing TopicName with a simple conversion method #24465. Actually, StringUtils#split has good performance as well. But it will split "//xxx/yyy/zzz" to ["xxx", "yyy", "zzz"] without reporting any error.
  • Initialize completeTopicName from the argument directly without any concentrate operation
  • Replace String.format with + in fromPersistenceNamingEncoding
  • Add TopicNameBenchmark for benchmark

Expose the constructor, so the downstream can choose not to use the buggy TopicName cache or establish its custom cache mechanism for TopicName.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: BewareMyPower#44

@BewareMyPower BewareMyPower self-assigned this Jun 25, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jun 25, 2025
@BewareMyPower
Copy link
Contributor Author

Before 48ffb7a, there is a constructor parameter that determines whether to initialize the NamespaceName. I also compared with the existing TopicName constructor, which can be found here: https://gist.github.com/BewareMyPower/6b83c897552c110f336e51965cc91c24

The benchmark result is:

TopicNameBenchmark.testConstruct                          thrpt       312.983          ops/s
TopicNameBenchmark.testConstructWithNamespaceInitialized  thrpt       101.973          ops/s
TopicNameBenchmark.testLegacyTopicName                    thrpt        53.944          ops/s
TopicNameBenchmark.testReadFromCache                      thrpt       721.392          ops/s

As you can see

  • Accessing it from the cache is only 2.3x faster than the latest implementation, which is not significant. However, to avoid debate on whether to remove cache in this PR, I only exposed the public constructor.
  • Skipping initialization of NamespaceName is 3x faster than initializing it immediately
  • It's 6x faster than the original implementation. Even if initializing NamespaceName immediately, it's still about 2x faster.

@lhotari
Copy link
Member

lhotari commented Jun 25, 2025

Before 48ffb7a, there is a constructor parameter that determines whether to initialize the NamespaceName. I also compared with the existing TopicName constructor, which can be found here: https://gist.github.com/BewareMyPower/6b83c897552c110f336e51965cc91c24

The benchmark result is:

TopicNameBenchmark.testConstruct                          thrpt       312.983          ops/s
TopicNameBenchmark.testConstructWithNamespaceInitialized  thrpt       101.973          ops/s
TopicNameBenchmark.testLegacyTopicName                    thrpt        53.944          ops/s
TopicNameBenchmark.testReadFromCache                      thrpt       721.392          ops/s

As you can see

  • Accessing it from the cache is only 2.3x faster than the latest implementation, which is not significant. However, to avoid debate on whether to remove cache in this PR, I only exposed the public constructor.
  • Skipping initialization of NamespaceName is 3x faster than initializing it immediately
  • It's 6x faster than the original implementation. Even if initializing NamespaceName immediately, it's still about 2x faster.

I wouldn't trust JMH results from benchmarking on Mac OS. In the case of PR #24457, the results are very different when benchmarking on Linux x86_64 with Intel i9 processor (Dell XPS 2019 laptop).
If we'd completely remove the cache, there would be more duplicate TopicName and NamespaceName instances and more duplicate string instance. For very long tenant, namespace and topic names, that could add up a significant amount of wasted heap memory.

@BewareMyPower
Copy link
Contributor Author

@lhotari So could you help run benchmark in your Linux environment to see the difference? I can also create a workflow via GitHub Actions to see the result.

@BewareMyPower
Copy link
Contributor Author

If we'd completely remove the cache, there would be more duplicate TopicName and NamespaceName instances and more duplicate string instance. For very long tenant, namespace and topic names, that could add up a significant amount of wasted heap memory.

I don't want to debate about it in this PR. If you have a chance to look at how TopicName is used in Pulsar, you will find a lot of temporary TopicName instances just for conversion. I know near all Pulsar constributors are very sensitive to the topic about GC, so I don't change the TopicName#get implementation.

The ultimate solution is to create some utils methods instead of leveraging the TopicName class.

@BewareMyPower
Copy link
Contributor Author

Updated test results to compare it with the legacy implementation in https://github.com/BewareMyPower/pulsar/actions/runs/15872434081/job/44752001433?pr=45

TopicNameBenchmark.testConstruct                             thrpt       258.932          ops/s
TopicNameBenchmark.testConstructWithNamespaceInitialization  thrpt       158.876          ops/s
TopicNameBenchmark.testLegacyConstruct                       thrpt        44.290          ops/s
TopicNameBenchmark.testReadFromCache                         thrpt       340.613          ops/s
  • The constructor is 5.8x faster than the existing one
  • The constructor with NamespaceName initialization is 3.6x faster than the existing one

@BewareMyPower BewareMyPower force-pushed the bewaremypower/improve-topic-name-parse branch from 48ffb7a to 355140a Compare July 14, 2025 02:46
@BewareMyPower
Copy link
Contributor Author

@lhotari @codelipenghui @nodece @dao-jun @poorbarcode @coderzc This PR is now ready to review, PTAL

@BewareMyPower BewareMyPower changed the title [improve][common] Improve the performance of TopicName [improve][broker] Improve the performance of TopicName Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants