Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lookup table when calculating CRC16 XMODEM #733

Merged
merged 2 commits into from
Sep 12, 2014

Conversation

HeartSaVioR
Copy link
Contributor

Related to #729.

I borrowed it from redis/lettuce@b921931
Thanks @mp911de!
And thanks @allanwax for considering better algorithm!

@HeartSaVioR
Copy link
Contributor Author

@mp911de @allanwax @marcosnils @xetorthio
Please review and comment! Thanks!

@marcosnils @xetorthio
Btw, can we include this PR to 2.6 before releasing?

public class JedisClusterCRC16 {
public final static int polynomial = 0x1021; // Represents x^16+x^12+x^5+1

public static final int LOOKUP_TABLE[] = { 0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7, 0x8108, 0x9129,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this private. I've missed to make this constant private in lettuce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mp911de Oh, others can break table. Thanks!

@mp911de
Copy link
Contributor

mp911de commented Sep 12, 2014

Looks good.

@marcosnils
Copy link
Contributor

👍 awesome!. It should definitely go into 2.6

@marcosnils marcosnils added this to the 2.6.0 milestone Sep 12, 2014
marcosnils added a commit that referenced this pull request Sep 12, 2014
Use lookup table when calculating CRC16 XMODEM
@marcosnils marcosnils merged commit 2f18da6 into redis:master Sep 12, 2014
@marcosnils
Copy link
Contributor

Merged to master and 2.6

@HeartSaVioR
Copy link
Contributor Author

@marcosnils Thanks for merging!

@allanwax
Copy link

crc = ((crc << 8) ^ LOOKUP_TABLE[((crc >> 8) ^ (b & 0xFF)) & 0xFF]) &
0xFFFF;

you need >>> instead of >> or sign extension will happen. 'C' and java
differ in how they shift.

Allan Wax

On 9/12/2014 7:00 AM, Jungtaek Lim wrote:

@marcosnils https://github.com/marcosnils Thanks for merging!


Reply to this email directly or view it on GitHub
#733 (comment).

@allanwax
Copy link

You can probably eliminate the final '& 0xFFFF'

crc = (crc << 8) ^ LOOKUP_TABLE[((crc >>> 8) ^ bytes[i]) & 0x00FF];

On 9/12/2014 1:59 AM, Mark Paluch wrote:

Looks good.


Reply to this email directly or view it on GitHub
#733 (comment).

@HeartSaVioR
Copy link
Contributor Author

@allanwax Would you please provide "edge-case" examples of current implementation?
I'll apply your suggestion if we found fail cases. Thanks!

@HeartSaVioR
Copy link
Contributor Author

Btw, 16384 is a power of 2, so we can apply optimization to modulo, "% 16384" to "& (16384 - 1)", when we really need to apply optimizations as much as possible.
https://www.chrisnewland.com/high-performance-modulo-operation-317

@allanwax
Copy link

I can't provide any edge cases where there is a failure but I've tested the code modification above and I get the same results. It saves the final '& 0xFFFF' in the loop. Now '&' is a very fast instruction but millions of them will add up. As long as there are no mathematical operations (i.e. only boolean operations) then there are no functional issues.

I have no problem leaving the code as is since it does work and is faster than before. Lots of other things to work on.

@HeartSaVioR
Copy link
Contributor Author

@allanwax Actually I don't mean final '& 0xFFFF' but sign extension.
Do you mean sign extension don't affect result itself but affect performance?

@allanwax
Copy link

Sign extension MAY affect the result in the shift since a double shift
'>>' will fill the 'word' on the left with ones if there is a '1' in the
leftmost bit. A triple shift '>>>' will not. BUT since we are dealing
with a multiple of 2 it may not matter. I don't know. The problem is
currently not seen since the last operation is to 'and' it with 0xFFFF.
If that is taken away, the results may change. AGAIN, I don't know
without a great deal of testing. By the way, if that last '&' is left
in, the and on the return statement can be taken away since it was
already done in the loop.

The performance issue is only that we're adding an and at the end of
each partial crc calculation. This only makes a difference if we're
doing this (maybe) millions of times. Again, this is opinion rather
than fact.

On 9/15/2014 6:03 PM, Jungtaek Lim wrote:

@allanwax https://github.com/allanwax I don't mean final '& 0xFFFF'
but sign extension.
(I also think last &0xFFFF can be removed.)
Do you mean sign extension don't affect result itself but affect
performance?


Reply to this email directly or view it on GitHub
#733 (comment).

@HeartSaVioR
Copy link
Contributor Author

@allanwax @mp911de
Let me explain about calculation inside a loop.
(Sorry I didn’t have time to look deeply.)

(revised : I made a mistake during calculation with "crc >> 8", revised one time)

crc = ((crc << 8) ^ LOOKUP_TABLE[((crc >> 8) ^ (b & 0xFF)) & 0xFF]) & 0xFFFF

Using uint16_t in C, (crc << 8) discards previous CRC value’s upper 1 byte, but our code just shifts byte
(from 9 ~ 16 bit to 17 ~ 24 bit, starts from 1 bit, count from rightmost) so it can be survived, and it can be reached to sign bit.
So we should look deeply about it whether it can change our result or not.

At first, we names (crc << 8) to (A).

Inside of calculation of array index, (crc >> 8) occurs sign extension so crc’s leftmost 8 bit fills with sign bit and other bits shifts, especially 9 ~ 16 bit shifts to 1 ~ 8 bit. -- (B)
We apply XOR to (B) and lower 1 byte (1 ~ 8 bit) of b. -- (C)
And we select lower 1 byte from (C). -- (D).
We used crc only lower 1 byte from (C) and (D), and discarded other bytes.
(including leftmost 1 byte affected by sign extension)
It means that sign extension doesn't affect result.

We reference LOOKUP_TABLE with (D), -- (E).
(E) should be 2 bytes values, so when applying XOR to (A) and (E), 17 ~ 24 bit of (A) survives.
And with loop it can be reached to sign bit.

BUT, we already confirmed that sign extension doesn’t affect result, and core calculation uses only 2 bytes of previous CRC.
So we can delegate discarding upper than 2 bytes (I means last & 0xFFFF in loop) and finally select 2 bytes before returning last CRC calculation result.

tl;dr.
(1) sign extension doesn’t affect whole calculation, but I think using ">>>" can let us feel more safely.
(2) we can get rid of last & 0xFFFF in loop but should apply & 0xFFFF before returning.

So, applying @allanwax suggestion makes same result to current, and it's faster.

Please correct me if I am wrong.
Thanks!

@mp911de
Copy link
Contributor

mp911de commented Sep 17, 2014

Yep, some 5-10%

@allanwax
Copy link

Sounds good to me

@allanwax
Copy link

Also, the compiler may or may not be smart enough to transform 'return getCRC16(key) % 16384' into 'return getCRC16(key) & (16384 - 1) // 0x3FFF'

If not, the above change may help speed things up by a tiny amount.

@HeartSaVioR
Copy link
Contributor Author

I've ran a benchmark to see it helps.
(I've applied modulo optimization, @allanwax's suggestion - unsigned shift, remove last & 0xFFFF)
My dev environment is i7 2.3G, 16G DDR3, OSX 10.9.4, JDK 1.6.0_65.
I didn't turn off other processes so it may not accurate.

Before

100883 ops
100716 ops
99987 ops
101439 ops
99553 ops

After

104535 ops
105828 ops
105433 ops
102387 ops
105005 ops

It seems to help!
I'll apply it to master by hand. Thanks all!

@allanwax
Copy link

'Q a new' ???

@mp911de
Copy link
Contributor

mp911de commented Oct 18, 2014

Actually, I did not know that I sent this comment. So just ignore it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants