-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRC64 perf improvements from Redis patches #350
Conversation
* 53-73% faster on Xeon 2670 v0 @ 2.6ghz * 2-2.5x faster on Core i3 8130U @ 2.2 ghz * 1.6-2.46 bytes/cycle on i3 8130U * likely >2x faster than crcspeed on newer CPUs with more resources than a 2012-era Xeon 2670 * crc64 combine function runs in <50 nanoseconds typical with vector + cache optimizations (~8 *microseconds* without vector optimizations, ~80 *microseconds without cache, the combination is extra effective) * still single-threaded * valkey-server test crc64 --help (requires `make distclean && make SERVER_TEST=yes`) Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
ec8fdf2
to
4cfe03d
Compare
What do we use crc64 for? RDB files? I want to understand the practical benefit. |
RDB and slot migration. (Since the dump + restore includes the CRC checksumming). I mentioned before improving crc16 would be a bigger win for cluster mode. |
OK. It would be impossible to change to SHA1 or something else for RDB, DUMP/RESTORE, right? Because of backwards compatibility. I think we should merge this then. It's more complex code, but as long as it's well tested and isolated in its own files, that's no problem to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superficial review. I didn't proof read the bit fiddling, but I'm not intending to.
When producing or loading a large RDB dump, how much of the CPU time is CRC64 computation and how much is other stuff?
If the CRC computation is haft of the CPU time, it's good, but if the CRC computation is only 1%, then this optimization is useless IMO.
I found your blog post. 😃 https://www.dr-josiah.com/2023/04/crc64-in-redis-is-getting-even-faster.html |
Let me rephrase your question; why is doing CRC64 important for Redis? Before loading a snapshot from disk, or received over the wire from a master server, the loading server will take a pass over the whole RDB file, and call out roughly once every 2 megabytes by default (see When creating RDB snapshots, any data segment written gets a call to CRC64 for up to 2 megabytes by default (also uses That said, with additional small changes, you can update on write the same 2 meg segment all the time (instead of at maximum), and the effects of this (along with eliminating fork overhead) can be seen in 20 seconds of reduced startup / snapshot time for a 10 gig resident dataset from the last slide here: https://www.slideshare.net/secret/sLTnWqcPV7G0HK , of which ~8-10 seconds can be attributed to CRC64 performance improvements. |
Sounds good. I'm convinced. :) FYI: You need Signed-off-by also for commits done using suggestions in the web gui. You can add that using |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #350 +/- ##
============================================
+ Coverage 68.37% 68.48% +0.10%
============================================
Files 108 109 +1
Lines 61555 61669 +114
============================================
+ Hits 42091 42235 +144
+ Misses 19464 19434 -30
|
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
6c58141
to
302cb2a
Compare
Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Assuming tests pass, I'm happy with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Btw, I just want to test it on another arm machine. I'll do that tomorrow and merge it if the performance makes sense. |
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, builds and some quick tests show it's faster than crcspeed.
Improve the performance of crc64 for large batches by processing large number of bytes in parallel and combining the results. ## Performance * 53-73% faster on Xeon 2670 v0 @ 2.6ghz * 2-2.5x faster on Core i3 8130U @ 2.2 ghz * 1.6-2.46 bytes/cycle on i3 8130U * likely >2x faster than crcspeed on newer CPUs with more resources than a 2012-era Xeon 2670 * crc64 combine function runs in <50 nanoseconds typical with vector + cache optimizations (~8 *microseconds* without vector optimizations, ~80 *microseconds without cache, the combination is extra effective) * still single-threaded * valkey-server test crc64 --help (requires `make distclean && make SERVER_TEST=yes`) --------- Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Improve the performance of crc64 for large batches by processing large number of bytes in parallel and combining the results.
Performance
make distclean && make SERVER_TEST=yes
)