Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make murmur2 implementation compatible with Java version #358

Merged
merged 5 commits into from
May 8, 2019
Merged

Conversation

Nevon
Copy link
Collaborator

@Nevon Nevon commented May 8, 2019

Fixes #357

There were two main issues. The first one being that Java does integer division, so length4 would be 2.25 in JS vs 2 in Java, which meant that we would do an additional iteration in the loop compared to the Java version.

The second, and far more annoying issue, is that Java's int and Javascripts Number type behave quite differently, so we'd get wildly different results during multiplication of large numbers. I changed the implementation to use Long, which solved the problem. I'm not very happy about the part where I convert back to an int to do a zero fill right shift, but I found that Long's k.shiftRightUnsigned(r) behaves differently to k >>> r (Java), but maybe I misunderstood something.

I haven't updated the existing murmur2 tests, since they will now all fail. This feels like a super breaking change, so I'm not sure how we should roll it out. I'm opening this PR for the sake of discussion.

@pimpelsang

@Nevon Nevon added the bug label May 8, 2019
@Nevon Nevon requested a review from tulios May 8, 2019 09:23
@Nevon
Copy link
Collaborator Author

Nevon commented May 8, 2019

My concern with changing this in the default partitioner is that new messages produced with this version will end up in different partitions, which may break some applications that (quite reasonably) expect all messages for a given key to be assigned to the same partition.

One option could be to expose this as a separate partitioner that @pimpelsang and anyone else requiring compatibility with the Java partitioner can use, and then in a later major version change switch to that being the default.

EDIT: Github thinks I'm from the future. This comment was actually posted right after creating the PR, long before there were any comments. 🤔

@tulios
Copy link
Owner

tulios commented May 8, 2019

I agree @Nevon, we can't change the current murmur implementation without breaking compatibility. I think we should create a murmur2Java implementation and a defaultJava partitioner that uses the new algorithm and then document that default doesn't follow the same logic as the Java partitioner, but it's easy to switch to defaultJava. When I read the issue I was sure that it was a number overflow, Long got into the project quite late. I think the murmur implementation was already in place and never got reviewed.

@Nevon
Copy link
Collaborator Author

Nevon commented May 8, 2019

Sounds good. I'll go for that solution. Will update the PR.

@Nevon
Copy link
Collaborator Author

Nevon commented May 8, 2019

Not sure why Github thinks I live in the future (the comment after this was actually posted before Tulio's...)

I've exposed both the default partitioner and the Java compatible one now, and documented how to use them and why you might want to:

const { Partitioners } = require('kafkajs')
kafka.producer({ createPartitioner: Partitioners.JavaCompatiblePartitioner })

@Nevon Nevon merged commit 0444a2a into master May 8, 2019
@Nevon Nevon deleted the repro-357 branch May 8, 2019 16:09
@t-d-d t-d-d mentioned this pull request Feb 14, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KafkaJS and java producer partitioning returning different results
2 participants