-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Context
Currently, I'm working on #3067 issue.
To implement this issue, I used the PatternMatchUtils
.
If #3874 is merged, it would be better to consider introducing cache to optimize pattern match.
A consumer will subscribe to only a few topics, and it’s very likely that all ConsumerRecords
for those topics will carry the same headers.
Therefore, the following assumptions are reasonable:
- The number of distinct header types won’t grow without bound (i.e., low cardinality).
- Once a header has been resolved via a pattern match, the result can be reused thereafter without any issues.
- the number of records that need to be processed per second may be in the thousands, tens of thousands, or even hundreds of thousands.
So, If we put result of pattern matching to cache and use it next iteration,
The performance of multi-value headers will be improved drastically.
As artembilan mentioned before, it cause the memory leak problem.
I think It would be better to introduce Caffein for LFU cache.
If this issue makes sense to you guys, I will take a crack. 🙇♂
Simple Performance Result
https://gist.github.com/chickenchickenlove/8c0935883bab446a1ff2efe728b6f99b
Case | Without Cache (ms) | With Cache (ms) |
---|---|---|
First Match – 10,000,000 | 91 | 0 |
First Match – 50,000,000 | 479 | 0 |
First Match – 100,000,000 | 480 | 0 |
Last Match – 10,000,000 | 2500 | 0 |
Last Match – 50,000,000 | 627 | 0 |
Last Match – 100,000,000 | 1112 | 0 |
Last Match and more patterns – 10,000,000 | 202 | 0 |
Last Match and more patterns – 50,000,000 | 1023 | 0 |
Last Match and more patterns – 100,000,000 | 2148 | 0 |
Activity
artembilan commentedon May 6, 2025
Well, no. We develop here a library for target projects, therefore any extra dependency has to be really good justified.
Doesn't look like the whole Caffein has to be used just for this simple optimization.
Consider to use
LinkedHashMap
and itsremoveEldestEntry()
implementation.sobychacko commentedon May 6, 2025
While we can't introduce dependencies like Caffeine in Spring Kafka proper, I am thinking aloud here to see if we can provide a hook from the framework so that if the users want to bring a 3rd party caching solution, that is possible (of course, without having any direct dependency on it)
artembilan commentedon May 6, 2025
I don't see how is that useful for just header names caching.
Everything is done in the memory, so no need to overhead with fully-blown cache implementation.
sobychacko commentedon May 6, 2025
I don't remember the details of Caffeine, but it is also an in-memory cache with advanced algorithms for caching (some applications prefer it). Spring Pulsar has some use cases for it: https://github.com/spring-projects/spring-pulsar/tree/bd9aea58589a290f0fcb434383d26b7914db6196/spring-pulsar-cache-provider-caffeine.
I am not saying that we have to do it, but if some applications prefer to use it, we may want to think about having a way to support those use cases without explicitly bringing that as a dependency in the framework.
chickenchickenlove commentedon May 9, 2025
Thanks for you guys comments!
Let me suggest something.
A header-mapping cache isn't efficient for certain use cases.
For examples,
In these scenarios, caching add overhead rather than helping.
To let users tune the behaviour, I suggest we provide:
opt-in
Flag : Cache enable / disableCache Interface
: User can suplly their own implementation and inject it intoKafkaHeaderMapper
.LinkedHashMap
usingremoveEldestEntry()
.As, @artembilan mentioned earlier, the header-pattern matching logic will be invoked for every
ConsumerRecord
.Given the properties of Kafka, there will be a large number of
ConsumerRecords
, so I believe the performance of the Header Mapper is critical.To optimize it for each use cases, I think it would be better to provide a pluggable cache interface.
I would love to hear you guys thougths.!
Thanks for reading this. 🙇♂
artembilan commentedon May 9, 2025
If I would develop target application, I'm not sure if I would bother myself with extra cache abstraction to be injected into the framework component for some tiny CPU optimization for know reasons for my application.
I would state if there is a problem in the framework component, that has to be done on the framework level, or I would go my own
KafkaHeaderMapper
implementation.We do use headers pattern matching in many other projects and no one ever raises a question to cache whatever is just result of pattern matching.
Just my opinion because there are too much else to do rather than try to come up with some caching pattern matching externalization.
chickenchickenlove commentedon May 13, 2025
I agree 👍
your points about avoiding heavy dependencies make sense to me! 🙇♂
What do you think about an opt-in, lightweight LRU cache based on
LinkedHashMap
(usingremoveEldestEntry(...)
)?Given the fact that
spring-kafka
supports enterprise environment, IMHO, it might be helpful to provide LRU cache in the framework component to optimize pattern matching.In my earlier experience, there were some cases that input topic's rate was approximately 2M/sec.
In that case, I guess LRU cache for header pattern matching makes a noticeable difference in terms of CPU usage.
What do you think?
sobychacko commentedon May 13, 2025
Yeah, that was my original thought when you raised it in the first place. It would be nice to support things like that as long as there is minimal impact on Spring Kafka. The problem arises when someone tries to opt in for some enterprise caching, runs into issues, and then thinks it is Spring Kafka-related. It could potentially become a maintenance burden on us in the long term. Although it could be a minor change, we need to consider the pros/cons more prudently. I like the overall idea, but is it worth it? You made the case for CPU usage benefits in heavy data-intensive apps, but let's continue to explore any downsides since @artembilan has other thoughts about this.
artembilan commentedon May 13, 2025
Why opt-in?
The
LinkedHashMap
withremoveEldestEntry(...)
should be enough for this internal use-case always.There is no external interaction, so not need for any enterprise cache.
The CPU burden is noted, so
LinkedHashMap
must do the trick.For example, in Spring Integration, we have this feature: https://docs.spring.io/spring-integration/reference/router/dynamic-routers.html.
This one relies internally on the:
And that was enough for years. No one complained that there is no enterprise cache support.
Just because we just optimize some CPU ticks for same bean resolution.
We do over there similar solution for local
Lock
instances caching in theJdbcLockRegistry
:We also do have cache for
SimpleJdbcCall
instances in theStoredProcExecutor
.I don't mind
LinkedHashMap
for this our header names use-case, but I don't see a reason in an external cache abstraction overhear.chickenchickenlove commentedon May 13, 2025
Apologies for the lack of clarity in my previous explanation.🙇♂️
What I meant by “opt-in” was simply the ability to turn the header cache functionality on or off.
Also, I wasn’t suggesting introducing any enterprise-level caching solution.
Rather, my point was that in enterprise environments where a large number of messages are consumed, even a simple LinkedHashMap-based cache can offer meaningful CPU optimizations.
To summarize:
What do you think?
Please let me know you opinion!
artembilan commentedon May 13, 2025
My suggestion: always caching via
LinkedHashMap
with some meaningful size like1000
.No need in opt-in.
Thanks
chickenchickenlove commentedon May 14, 2025
Thanks for your comments!
That makes sense!
Would it be okay if I create a PR for the header pattern cache you suggested?
artembilan commentedon May 14, 2025
Happy to review and merge!
Thanks
7 remaining items