-
-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix network buffering performance problems #669
Conversation
src/network/connection.js
Outdated
this.bufferQueue.bytesTotal += Buffer.byteLength(rawData) | ||
|
||
const bytesTotal = Buffer.byteLength(this.buffer) + this.bufferQueue.bytesTotal | ||
if (this.bufferQueue.bytesAwaiting <= bytesTotal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you inverse this statement please. It would read a lot clearer if it's just "if there are more bytes to read, return" than the way it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I rewrote the patch to make it easier to follow
I will properly review this and provide some real feedback as soon as I can. The TL;DR: cool! |
This looks cool. Do you have any indication of the gains from this? |
Cool! 🚀 Big fan of what you do. The issue with copying buffers is a known one which I started addressing in #475, but I didn't have time to drive it to the finish line. Your approach is a bit simpler, which should make it easier to get in, even if it doesn't address every place where there's excessive buffer copying (the protocol encoding and decoding is another place). It would be really nice if you were able to share some kind of before and after benchmarks, so that we can see if it actually has an impact (it should, but it would be great to see regardless). On the actual fix itself, it looks pretty good to me. Ideally I would like to encapsulate this into a separate |
I think the logic of keeping As for metrics, we only observed this behavior in production, since it is so dependent on the way Kafka chooses to batch data. In our case, once a process started lagging behind, it would spike the CPU usage from ~15% to 80-100% and stay there indefinitely. After we deployed a fix, the CPU usage has stabilized back to ~15% immediately and stayed there. We use single-core ECS containers. |
3789061
to
034bab1
Compare
034bab1
to
26f21f8
Compare
I reworked the patch again to add more clarity and pass the tests. |
You need to merge in latest master, and then we can merge this. Thanks! |
@Nevon done, thank you for a quick turnaround! |
We use Kafka.js at Figma and we have been observing systematic performance problems with our node.js services consuming from Kafka using this library.
During one of these incidents we took a CPU profile and identified a performance issue in the library, specifically the code performing network reads and low-level protocol parsing.
The culprit is the use of
Buffer.concat
on every chunk read from the network, resulting into O(n^2) work copying the old buffer to a new buffer of bigger size. This effects are more obvious when the message size is larger, we noticed it when our services lagged behind and started fetching large batches from Kafka.The proposed fix is to avoid
Buffer.concat
operations until necessary, accumulating a list of Buffer references instead (a well optimized operation in V8).As a first-time contributor to Kafka.js I realize that this patch might not be in the perfect shape for merging as is, but happy to discuss with the maintainers how to fix this issue upstream.