Use lookup table when calculating CRC16 XMODEM #733

HeartSaVioR · 2014-09-12T08:53:53Z

Related to #729.

I borrowed it from redis/lettuce@b921931
Thanks @mp911de!
And thanks @allanwax for considering better algorithm!

* I borrowed it from redis/lettuce@b921931

HeartSaVioR · 2014-09-12T08:56:23Z

@mp911de @allanwax @marcosnils @xetorthio
Please review and comment! Thanks!

@marcosnils @xetorthio
Btw, can we include this PR to 2.6 before releasing?

mp911de · 2014-09-12T08:57:58Z

src/main/java/redis/clients/util/JedisClusterCRC16.java

 public class JedisClusterCRC16 {
-    public final static int polynomial = 0x1021; // Represents x^16+x^12+x^5+1
-
+    public static final int LOOKUP_TABLE[] = { 0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7, 0x8108, 0x9129,


Make this private. I've missed to make this constant private in lettuce.

@mp911de Oh, others can break table. Thanks!

mp911de · 2014-09-12T08:58:56Z

Looks good.

marcosnils · 2014-09-12T13:41:59Z

👍 awesome!. It should definitely go into 2.6

Use lookup table when calculating CRC16 XMODEM

marcosnils · 2014-09-12T13:55:18Z

Merged to master and 2.6

HeartSaVioR · 2014-09-12T14:00:16Z

@marcosnils Thanks for merging!

allanwax · 2014-09-12T16:45:57Z

crc = ((crc << 8) ^ LOOKUP_TABLE[((crc >> 8) ^ (b & 0xFF)) & 0xFF]) &
0xFFFF;

you need >>> instead of >> or sign extension will happen. 'C' and java
differ in how they shift.

Allan Wax

On 9/12/2014 7:00 AM, Jungtaek Lim wrote:

@marcosnils https://github.com/marcosnils Thanks for merging!

—
Reply to this email directly or view it on GitHub
#733 (comment).

allanwax · 2014-09-12T17:00:58Z

You can probably eliminate the final '& 0xFFFF'

crc = (crc << 8) ^ LOOKUP_TABLE[((crc >>> 8) ^ bytes[i]) & 0x00FF];

On 9/12/2014 1:59 AM, Mark Paluch wrote:

Looks good.

—
Reply to this email directly or view it on GitHub
#733 (comment).

HeartSaVioR · 2014-09-15T01:57:58Z

@allanwax Would you please provide "edge-case" examples of current implementation?
I'll apply your suggestion if we found fail cases. Thanks!

HeartSaVioR · 2014-09-15T02:40:03Z

Btw, 16384 is a power of 2, so we can apply optimization to modulo, "% 16384" to "& (16384 - 1)", when we really need to apply optimizations as much as possible.
https://www.chrisnewland.com/high-performance-modulo-operation-317

allanwax · 2014-09-15T15:58:40Z

I can't provide any edge cases where there is a failure but I've tested the code modification above and I get the same results. It saves the final '& 0xFFFF' in the loop. Now '&' is a very fast instruction but millions of them will add up. As long as there are no mathematical operations (i.e. only boolean operations) then there are no functional issues.

I have no problem leaving the code as is since it does work and is faster than before. Lots of other things to work on.

HeartSaVioR · 2014-09-16T01:03:17Z

@allanwax Actually I don't mean final '& 0xFFFF' but sign extension.
Do you mean sign extension don't affect result itself but affect performance?

allanwax · 2014-09-16T16:12:20Z

Sign extension MAY affect the result in the shift since a double shift
'>>' will fill the 'word' on the left with ones if there is a '1' in the
leftmost bit. A triple shift '>>>' will not. BUT since we are dealing
with a multiple of 2 it may not matter. I don't know. The problem is
currently not seen since the last operation is to 'and' it with 0xFFFF.
If that is taken away, the results may change. AGAIN, I don't know
without a great deal of testing. By the way, if that last '&' is left
in, the and on the return statement can be taken away since it was
already done in the loop.

The performance issue is only that we're adding an and at the end of
each partial crc calculation. This only makes a difference if we're
doing this (maybe) millions of times. Again, this is opinion rather
than fact.

On 9/15/2014 6:03 PM, Jungtaek Lim wrote:

@allanwax https://github.com/allanwax I don't mean final '& 0xFFFF'
but sign extension.
(I also think last &0xFFFF can be removed.)
Do you mean sign extension don't affect result itself but affect
performance?

—
Reply to this email directly or view it on GitHub
#733 (comment).

HeartSaVioR · 2014-09-17T02:14:30Z

@allanwax @mp911de
Let me explain about calculation inside a loop.
(Sorry I didn’t have time to look deeply.)

(revised : I made a mistake during calculation with "crc >> 8", revised one time)

crc = ((crc << 8) ^ LOOKUP_TABLE[((crc >> 8) ^ (b & 0xFF)) & 0xFF]) & 0xFFFF

Using uint16_t in C, (crc << 8) discards previous CRC value’s upper 1 byte, but our code just shifts byte
(from 9 ~ 16 bit to 17 ~ 24 bit, starts from 1 bit, count from rightmost) so it can be survived, and it can be reached to sign bit.
So we should look deeply about it whether it can change our result or not.

At first, we names (crc << 8) to (A).

Inside of calculation of array index, (crc >> 8) occurs sign extension so crc’s leftmost 8 bit fills with sign bit and other bits shifts, especially 9 ~ 16 bit shifts to 1 ~ 8 bit. -- (B)
We apply XOR to (B) and lower 1 byte (1 ~ 8 bit) of b. -- (C)
And we select lower 1 byte from (C). -- (D).
We used crc only lower 1 byte from (C) and (D), and discarded other bytes.
(including leftmost 1 byte affected by sign extension)
It means that sign extension doesn't affect result.

We reference LOOKUP_TABLE with (D), -- (E).
(E) should be 2 bytes values, so when applying XOR to (A) and (E), 17 ~ 24 bit of (A) survives.
And with loop it can be reached to sign bit.

BUT, we already confirmed that sign extension doesn’t affect result, and core calculation uses only 2 bytes of previous CRC.
So we can delegate discarding upper than 2 bytes (I means last & 0xFFFF in loop) and finally select 2 bytes before returning last CRC calculation result.

tl;dr.
(1) sign extension doesn’t affect whole calculation, but I think using ">>>" can let us feel more safely.
(2) we can get rid of last & 0xFFFF in loop but should apply & 0xFFFF before returning.

So, applying @allanwax suggestion makes same result to current, and it's faster.

Please correct me if I am wrong.
Thanks!

mp911de · 2014-09-17T05:55:50Z

Yep, some 5-10%

allanwax · 2014-09-17T15:17:27Z

Sounds good to me

allanwax · 2014-09-17T15:24:07Z

Also, the compiler may or may not be smart enough to transform 'return getCRC16(key) % 16384' into 'return getCRC16(key) & (16384 - 1) // 0x3FFF'

If not, the above change may help speed things up by a tiny amount.

HeartSaVioR · 2014-09-18T00:30:46Z

I've ran a benchmark to see it helps.
(I've applied modulo optimization, @allanwax's suggestion - unsigned shift, remove last & 0xFFFF)
My dev environment is i7 2.3G, 16G DDR3, OSX 10.9.4, JDK 1.6.0_65.
I didn't turn off other processes so it may not accurate.

Before

100883 ops
100716 ops
99987 ops
101439 ops
99553 ops

After

104535 ops
105828 ops
105433 ops
102387 ops
105005 ops

It seems to help!
I'll apply it to master by hand. Thanks all!

allanwax · 2014-10-17T16:00:53Z

'Q a new' ???

mp911de · 2014-10-18T10:42:25Z

Actually, I did not know that I sent this comment. So just ignore it.

Use lookup table when calculating CRC16 XMODEM

ad10f91

* I borrowed it from redis/lettuce@b921931

HeartSaVioR added the wait for more reviews label Sep 12, 2014

mp911de reviewed Sep 12, 2014
View reviewed changes

Hide lookup table to prevent broken

10307ec

marcosnils added ready to merge and removed wait for more reviews labels Sep 12, 2014

marcosnils added this to the 2.6.0 milestone Sep 12, 2014

marcosnils added a commit that referenced this pull request Sep 12, 2014

Merge pull request #733 from HeartSaVioR/crc16-lookup-table

2f18da6

Use lookup table when calculating CRC16 XMODEM

marcosnils merged commit 2f18da6 into redis:master Sep 12, 2014

HeartSaVioR removed the ready to merge label Sep 12, 2014

marcosnils mentioned this pull request Sep 12, 2014

Consider using faster CRC16 algorithm #729

Closed

HeartSaVioR mentioned this pull request Sep 18, 2014

Optimize CRC16 calculation (with optimization of slot decision) #741

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use lookup table when calculating CRC16 XMODEM #733

Use lookup table when calculating CRC16 XMODEM #733

HeartSaVioR commented Sep 12, 2014

HeartSaVioR commented Sep 12, 2014

mp911de Sep 12, 2014

HeartSaVioR Sep 12, 2014

mp911de commented Sep 12, 2014

marcosnils commented Sep 12, 2014

marcosnils commented Sep 12, 2014

HeartSaVioR commented Sep 12, 2014

allanwax commented Sep 12, 2014

allanwax commented Sep 12, 2014

HeartSaVioR commented Sep 15, 2014

HeartSaVioR commented Sep 15, 2014

allanwax commented Sep 15, 2014

HeartSaVioR commented Sep 16, 2014

allanwax commented Sep 16, 2014

HeartSaVioR commented Sep 17, 2014

mp911de commented Sep 17, 2014

allanwax commented Sep 17, 2014

allanwax commented Sep 17, 2014

HeartSaVioR commented Sep 18, 2014

allanwax commented Oct 17, 2014

mp911de commented Oct 18, 2014

Use lookup table when calculating CRC16 XMODEM #733

Use lookup table when calculating CRC16 XMODEM #733

Conversation

HeartSaVioR commented Sep 12, 2014

HeartSaVioR commented Sep 12, 2014

mp911de Sep 12, 2014

Choose a reason for hiding this comment

HeartSaVioR Sep 12, 2014

Choose a reason for hiding this comment

mp911de commented Sep 12, 2014

marcosnils commented Sep 12, 2014

marcosnils commented Sep 12, 2014

HeartSaVioR commented Sep 12, 2014

allanwax commented Sep 12, 2014

allanwax commented Sep 12, 2014

HeartSaVioR commented Sep 15, 2014

HeartSaVioR commented Sep 15, 2014

allanwax commented Sep 15, 2014

HeartSaVioR commented Sep 16, 2014

allanwax commented Sep 16, 2014

HeartSaVioR commented Sep 17, 2014

mp911de commented Sep 17, 2014

allanwax commented Sep 17, 2014

allanwax commented Sep 17, 2014

HeartSaVioR commented Sep 18, 2014

Before

After

allanwax commented Oct 17, 2014

mp911de commented Oct 18, 2014