Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'publish' performance optimization on high parallelism, prevent lock if not needed #1185

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

assaf-xm
Copy link

Every message 'publish' goes through 'addMultipleTargetTopics' which always takes an async lock.

The lock becomes slow on high publish parallelism (few thousands waiters or more) and could cause errors like:
KafkaJSLockTimeout: Timeout while acquiring lock (2162 waiting locks): "updating target topics"

This PR prevents taking the lock and increases the throughput by ~20% for medium loads and by more for high loads. As well as reducing the need to increase requestTimeout on high parallelism.

@assaf-xm
Copy link
Author

@tulios , need to fix few tests that check the amount of calls to 'refreshMetadata' which is now reduced.
Does this change make sense?

if (
this.brokerPool.metadata &&
this.previousTopics &&
topics.every(topic => this.previousTopics.has(topic))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to use previousTopics here? why not just targetTopics ? And related, why do we need to make previousTopics an object property?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-d-d , saving 'previousTopics' as an object property is needed since here we are not holding the lock, and other flow could be already inside the lock and modified 'targetTopics' before calling 'refreshMetadata', so the 'targetTopics' might be reverted on 'refreshMetadata' error. So saving the 'previousTopics' is needed as the latest topics that successfully finished the 'refreshMetadata'

@t-d-d
Copy link
Contributor

t-d-d commented Sep 18, 2021

@assaf-xm I think you should wait on this until we know what's happening with #667 .

@assaf-xm
Copy link
Author

@t-d-d , if 'addMultipleTargetTopics' will be removed, yes, there is no need for this PR, but I see that #667 is in code review for months.

@suvorovis
Copy link

suvorovis commented Oct 8, 2021

@t-d-d
Any updates here? I have faced described issue in production with a producing rate of 6k/sec.

@assaf-xm
Copy link
Author

@t-d-d , I don't see any progress with #667

@atiquefiroz
Copy link

Facing similar issue.

0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18 {
0|ludo-ws-s1 | name: 'KafkaJSNonRetriableError',
0|ludo-ws-s1 | retriable: false,
0|ludo-ws-s1 | helpUrl: undefined,
0|ludo-ws-s1 | cause: KafkaJSLockTimeout: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at Timeout. (/home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/utils/lock.js:48:23)
0|ludo-ws-s1 | at Timeout.wrapped [as _onTimeout] (/home/ec2-user/zupee-ludo/ludo.service/node_modules/wtfnode/index.js:197:27)
0|ludo-ws-s1 | at listOnTimeout (internal/timers.js:549:17)
0|ludo-ws-s1 | at processTimers (internal/timers.js:492:7) {
0|ludo-ws-s1 | name: 'KafkaJSLockTimeout',
0|ludo-ws-s1 | retriable: false,
0|ludo-ws-s1 | helpUrl: undefined,
0|ludo-ws-s1 | cause: undefined
0|ludo-ws-s1 | }
0|ludo-ws-s1 | } >>>>>>>>>>>>>>>> 22 Unhandled Rejection at Promise >>>>>>>>>>>>>>>> Promise {
0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18 {
0|ludo-ws-s1 | name: 'KafkaJSNonRetriableError',
0|ludo-ws-s1 | retriable: false,
0|ludo-ws-s1 | helpUrl: undefined,
0|ludo-ws-s1 | cause: KafkaJSLockTimeout: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at Timeout. (/home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/utils/lock.js:48:23)
0|ludo-ws-s1 | at Timeout.wrapped [as _onTimeout] (/home/ec2-user/zupee-ludo/ludo.service/node_modules/wtfnode/index.js:197:27)
0|ludo-ws-s1 | at listOnTimeout (internal/timers.js:549:17)
0|ludo-ws-s1 | at processTimers (internal/timers.js:492:7) {
0|ludo-ws-s1 | name: 'KafkaJSLockTimeout',
0|ludo-ws-s1 | retriable: false,
0|ludo-ws-s1 | helpUrl: undefined,
0|ludo-ws-s1 | cause: undefined
0|ludo-ws-s1 | }
0|ludo-ws-s1 | }
0|ludo-ws-s1 | }
0|ludo-ws-s1 | This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason:
0|ludo-ws-s1 | KafkaJSNonRetriableError: Timeout while acquiring lock (2 waiting locks): "updating target topics"
0|ludo-ws-s1 | at /home/ec2-user/zupee-ludo/ludo.service/node_modules/commons/node_modules/kafkajs/src/retry/index.js:55:18

@emorneau
Copy link

I am facing this issue also during my high scale performance testing to simulate our production scenario. Without forking this repo and making this simple code change we can't use this library. Please merge it in!

@delarosaj
Copy link

delarosaj commented Apr 17, 2024

I'm facing the exact same issue. Could these changes be merged? I saw #667 is closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants