-
-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How about the performance compared with node-rdkafka? #398
Comments
We have done benchmarks in isolation just to see what kind of throughput we could achieve, and we are running it in production under heavy load, but we've never done any direct comparisons with other libraries. So far, it has been fast enough for our needs, so we haven't put any serious effort into optimizations. I would welcome a real comparison between the different libraries to see how we stack up. |
The library If necessary, the same strategies could be applied to Reality is that you probably won't need it to. The official Java Kafka client doesn't employ strategies up to that extreme point either, and is still used in production everywhere by big tech. We haven't moved our highest volume topics from node-rdkafka to kafkajs yet, but not for a worry about performance. Unless you already have a big deployment and know exactly what kind of throughput you'll be facing, I wouldn't worry too much about it either. |
In recent days, I tried kafkajs, also node-rdkafka. The scenario what we need is writing a kafka sink connector which can save data from kafka to mongodb, it need detecting the new kafka topics, and call the mongodb native driver to write data. The result is we choose node-rdkafka. It can support to 100 topics in 150 ~ 200ms interval in the same time, in other words, it can consume 500 ~ 600 topics in 1 second. kafkajs seems the much lower performance to subscribe many topics. |
WriteI made a comparison Usage differencenode-rdkafkaConfig
And you cant just produce as much as you want, you will catch an error, when internal queue is full - it means you need to
kafkajs
Runs and results
Sending single message
Sending 10 at once (in node-rdkafka it is still one by one because there is no bulk method there)
100
1000
Tried 10000
Run stucked for several minutes, so I removed
In fact, write operation is much faster in node-rdkafka, may because of internal buffer and async delivery, or may be because I found better options for node-kafka producer and did not found any performance improving options (except ReadI don't have pretty benchmark, so don't want to attach my clumsy code, but the result is that both implementations read messages equally fast (more 100 000 per sec) ConclusionI will try to use node-rdkafka for sending messages and kafkajs for receiving because node-rdkafka has issue with consuming (underlying librdkafka lib) |
Interesting results! I'm quite sure that serialization and deserialization is a huge bottleneck for us, that could be improved relatively easily by utilizing buffers in a less wasteful way. I'm very surprised to see that we are reading messages equally as fast as node-rdkafka. I would suspect that actually it's due to limitations on the broker side, rather than on the consumer side. |
Having a bit of experience with producing messages in node-rdkafka: the producing API is fully asynchronous and batched, with no promises or callbacks to indicate whether producing was successful. Although I'm sure there's other bottlenecks in KafkaJS, this fire and forget strategy wins you a lot (as described in the reasoning for it it here), but also makes using it way more difficult. The only practical way I've found to get any kind of feedback from the producer is to listen for delivery reports, tagging each message with some unique id and matching that. It might be interesting to try that "synchronous" kind of producing with node-rdkafka and see how that benchmarks. |
Well.. producing 10000 messages and then waiting for delivery callbacks for all of them works like this (node-rdkafka producing became 2 times slower)
Making this kind of benchmark is not really correct because |
@JaapRood Lately I some have ended up here looking for alternative to node-rdkafka and also to see other related issues. However just wanted to help out with what I did for making the producer aware of delivery report is as follows: const generateAcknowledgment = (err, ackMessage) => {
if (typeof ackMessage.opaque === 'function') {
ackMessage.opaque.call(null, err, ackMessage);
}
};
producer.on('delivery-report', generateAcknowledgment);
producer.produce(
topic,
partition,
message,
key,
Date.now(),
(err, delvieryReport) => {
if (err) {
return reject(`Error publishing message and error is ${err}`);
}
if (delvieryReport) {
const stringifiedValue: string = delvieryReport.value.toString();
resp = customResponse(201, {
partition: delvieryReport.partition,
offset: delvieryReport.offset,
key: delvieryReport.key.toString(),
messageType: 'JSON',
value: JSON.parse(stringifiedValue),
});
producer.removeListener(
kafkaEvents.PUBLISH_ACKNOWLEDGMENT,
generateAcknowledgment
);
return resolve(resp);
}
}
); my producer configuration options include: dr_msg_cb: true, // Enable delivery reports with message payload |
Hello! Sorry for awaking such old thread, but I think it can have a little more hints to what to do I'm using kafkajs to produce messages with some level of batching (I'm publishing to between 2 and 5 topics at once) I'm using acks=-1 so each message is really persisted before returning myLogger.info(`publishing ${msgTotal} msgs to kafka processingTime ${processingTime} encodingTime ${encodingTime} ${JSON.stringify(msgByTopic)}`);
const beforeSend = Date.now();
await kafkaProducer.sendBatch({
topicMessages: topicMessages,
acks: -1, // all insync replicas
//compression: CompressionTypes.GZIP
});
myLogger.info(`published ${msgTotal} to kafka after ${Date.now() - beforeSend} ms`); processing/encodingTime usually < 2 ms, so in theory, each worker should process at least 500 messages/s But when I stop the workers, let the queue (rabbitMQ) grow a little, then let de node process start again, some of publish to kafka take 9 seconds:
I changed the number of kafka partitions to 40, and set 16 workers and I had no improvement to throughput I'm running on DigitalOcean k8s, each node has 8 cpu, 16gb ram, I have 3 nodes, hosting 3 kafka brokers and the nodejs processes. What should I look for? is the latency related just to kafkajs or should I have more kafka-brokers? Any help is welcome, maybe I should try to batch more messages? or just gave up the sync = -1 -- Edit: with sync=0 and 4 node processes, I can reach 1600 raw messages/s (this load has avg 10 msg published from each original message, so this is 16k/msg/s kafka perspective), so the previous 500 msg/s is actually 5000/s from the kafka perspective |
I am searching some kafka client libraries in node.js, this library seems awesome, and lightweight, and i am only worry about the performance, so does anyone do some benchmark yet?
I am curious about it is not a wrapper from native librdkafka, it just use node.js 'net' and 'tls' to connect kafka cluster.
The text was updated successfully, but these errors were encountered: