-
-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix record batch compression masking #912
Conversation
Fixes #911 We were accidentally using the masking value for the Message type rather than the RecordBatch type (3 instead of 7). This worked for other compression types than zstd, but failed for zstd. The reason is because to get the last 3 bytes of an int16, we need to AND it with 7, not 3. But for certain values, it will work anyway. Using Snappy as an example. Snappy = 2 (010 in binary). The compression masking value for the Message type is 3 (011 in binary). 2 & 3 = 2. 010 011 === 010 The correct masking value = 7. 2 & 7 = 2. 010 111 === 010 Now Zstd is 4 (100 in binary). 4 & 3 = 0. 100 011 === 000 The correct masking value = 7. 4 & 7 = 4. 100 111 === 000
This reverts commit d32ff09.
… into fix-zstandard-compression
First of all, nice find!
I found this curious, so wanted to check. Looking at http://kafka.apache.org/documentation/#messageset it still specifies the last 3 bits for the compression ( Am I missing something? |
I originally thought that we wanted the three last bits also for the legacy message type (which is why I updated it in d32ff09, but once I saw failing tests with INVALID_RECORD errors I figured I was mistaken and reverted. But indeed, you are correct, the specification does say that it should be the same also for the older version. In practice, it won't really make a difference since only compression codecs 1-3 are available with the old message type, all of which fit into 2 bits, but now I want to understand why it doesn't work when I change it. Will investigate further. |
I must have made some mistake last night when things were failing after fixing the codec mask for message set. I tried it again now, and it worked just fine. I even ran the produce tests with 0_11, and while the tests failed due to the expectations being written for newer protocol versions, the actual producing worked. |
Debugging this almost killed me...
Fixes #911
We were accidentally using the wrong masking value for getting the compression codec out of the recordbatch attributes (3 instead of 7). This worked for other compression types than zstd, but failed for zstd.
The reason is because to get the last 3 bytes, we need to AND it with 7 (
111
), not 3 (010
). But for certain values, it will work anyway. Using Snappy as an example. Snappy = 2 (010 in binary). The compression masking value for the Message type is 3 (011 in binary). 2 & 3 = 2.The correct masking value = 7. 2 & 7 = 2.
Now Zstd is 4 (100 in binary). 4 & 3 = 0.
The correct masking value = 7. 4 & 7 = 4.
For the old Message type, the attribute value is int8, rather than int16, which is why the correct value is 3 (
011
), since we want to get the last two bits - not three.