-
-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new crc32 tests and optimized crc32 for POWER #637
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #637 +/- ##
===========================================
+ Coverage 77.72% 78.32% +0.59%
===========================================
Files 82 84 +2
Lines 8463 8723 +260
Branches 1381 1392 +11
===========================================
+ Hits 6578 6832 +254
- Misses 1354 1356 +2
- Partials 531 535 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
test/crc32_test.c
Outdated
|
||
typedef struct { | ||
int line; | ||
uLong crc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use uint32_t
instead of uLong
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
char* buf; | ||
int len; | ||
uLong expect; | ||
} crc32_test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use standard fixed int types in this struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually ifdef'd this struct's definition to match the different crc32 declarations for the compat and zlib-ng APIs. When building the default API, it will use fixed int types.
This way we was use the proper argument types and avoid potential problems with "hidden" type casting.
functable.c
Outdated
@@ -59,6 +59,8 @@ ZLIB_INTERNAL uint32_t crc32_generic(uint32_t, const unsigned char *, uint64_t); | |||
|
|||
#ifdef __ARM_FEATURE_CRC32 | |||
extern uint32_t crc32_acle(uint32_t, const unsigned char *, uint64_t); | |||
#elif defined(POWER8) && (defined(__powerpc64__) || defined(__PPC64__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use #elif defined(POWER8_CRC32)
instead.
@@ -223,6 +225,7 @@ ZLIB_INTERNAL uint32_t crc32_stub(uint32_t crc, const unsigned char *buf, uint64 | |||
"crc32_z takes size_t but internally we have a uint64_t len"); | |||
/* return a function pointer for optimized arches here after a capability test */ | |||
|
|||
functable.crc32 = &crc32_generic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change.
This code is GPL or Apache? Can you get it relicensed under zlib license? |
arch/power/crc32_power8.c
Outdated
* This program is free software; you can redistribute it and/or | ||
* modify it under the terms of either: | ||
* | ||
* a) the GNU General Public License as published by the Free Software |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPL license..
@nmoinvaz today the code is dual licensed (GPL / Apache). I'm checking to see what we can do in that regard. |
All code should be licensed using zlib license, this avoids any viral effects of licenses like GPL. |
Incorporated suggestions from @nmoinvaz and rebased against latest |
@mscastanho were you able to resolve the licensing issues? |
@nmoinvaz No updates on this so far =/ I believe this won't make to the first release. |
This needs a rebase. |
@Dead2 Agreed. I'll split the tests into a separate PR and submit it. |
I believe everything is now ready for review. |
test/crc32_test.c
Outdated
typedef unsigned long crc32_type; | ||
typedef unsigned char buf_type; | ||
typedef unsigned int len_type; | ||
# define CRC_FMT "%08lX" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All optimized crc32 functions should use only uint32_t
as CRC32 variable type, the conversion for ZLIB_COMPAT API is done in crc32_z()
and crc32()
inside crc32.c. All length parameters should always be size_t
. Conversion from unsigned int
is also done in crc32.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_crc_table may have produced unsigned long return type in zlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the test to just call crc32_z
as @mtl1979 suggested, these typedefs become irrelevant, as most types are the same between compat and zlib-ng. But the crc
still has different types on each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test is statically linked to zlib-ng, it can access functable directly, and such there will be no differences... For dynamic linking, simple cast when calling crc32_z
is enough. I pointed out to how it is done in crc32.c, because essentially what is needed is same backwards. It would be a lot easier, if there would be "undocumented API" to access functable when building against shared version of zlib-ng.
arch/power/clang_workaround.h
Outdated
@@ -0,0 +1,98 @@ | |||
/* Helper functions to work around issues with clang builtins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a fallback_builtins.h
in the main directory. I think it should be moved to arch/x86/fallback_builtins.h. And then this file should be renamed arch/power/fallback_builtins.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed clang_workarounds.h
to fallback_builtins.h
. I haven't moved the x86 one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have style changes as I don't have the hardware. It is good that there are crc32 tests to prove it out.
arch/power/crc32_power8.c
Outdated
|
||
unsigned long len = (unsigned long) _len; | ||
|
||
if (p == (const unsigned char *) 0x0) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate line for return 0
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
arch/power/crc32_power8.c
Outdated
offset += 16; | ||
|
||
v0 = vec_xor(v0, va0); | ||
va0 = __builtin_crypto_vpmsumd ((__vector unsigned long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random functions with space after function name above and below this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I fixed all of those now.
arch/power/fallback_builtins.h
Outdated
} | ||
|
||
#if BYTE_ORDER == BIG_ENDIAN | ||
#define __builtin_unpack_vector_0(a) __builtin_unpack_vector ((a), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space after function name here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
arch/power/crc32_power8.c
Outdated
offset += 128; | ||
|
||
v0 = (__vector unsigned long long)__builtin_crypto_vpmsumw ( | ||
(__vector unsigned int)vdata0,(__vector unsigned int)v0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No space between function arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* left 32 bits so it occupies the least significant bits in the | ||
* bit reflected domain. | ||
*/ | ||
v0 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mscastanho what do you think about using typedef?
_vector unsigned long long = vector_uint64_t?
_vector unsigned char = vector_uint8_t?
Or just using fixed types if possible?
Anyways, just an idea because I noticed there is a lot of casting with long types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nmoinvaz I don't think there are vector types with "fixed type names" available for Power (something like vector uint64_t
). But the types available do have fixed sizes (e.g. vector unsigned long long
is a vector with 2 doublewords).
I think I'm neutral regarding using typedefs. It would certainly save some chars and make the code neater, but would add an extra indirection for the "reader" to find out which underlying type is actually being used.
If you think using typedefs would be better, I can make the change.
@nmoinvaz I'm so sorry. I completely missed your last comments 🤦🏼 I think I addressed all of them, and also rebased against current develop to fix some merge conflicts. |
LGTM. |
Gentle ping =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some cleanup.
arch/power/crc32_power8.c
Outdated
#if defined (__clang__) | ||
#include "fallback_builtins.h" | ||
#else | ||
#define __builtin_pack_vector(a, b) __builtin_pack_vector_int128 ((a), (b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary... parameters are passed in same order without any additional cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed because an alternative __builtin_pack_vector
was defined in fallback_builtins
for clang, so we needed a definition for the GCC case. But clang now has __builtin_pack_vector_int128
, so we can use that directly instead, so this was removed.
da528e5
to
4ac05e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small style fixes ;)
4ac05e1
to
dc5e9c7
Compare
This needs a minor rebase now, sorry 😉 |
Reorganize statements inside crc32_stub() to match more closely the format used for other function stubs in functable.c.
dc5e9c7
to
7b15e92
Compare
@Dead2 Done =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still quite a lot of casts that make the code hard to read, but I don't think keeping them blocks merging as trying to avoid casts by using inline utility functions makes debug build larger.
arch/power/crc32_power8.c
Outdated
p = (char *)p + 128; | ||
|
||
/* | ||
* main loop. We modulo schedule it such that it takes three |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is hard to read. Possibly missing punctuation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was inherited from previous iterations of this code. I replaced it with simpler high-level comment.
This commit adds an optimized version of the crc32 function based on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/ . The code has been relicensed to the zlib license. This is the C implementation created by Rogerio Alves <rogealve@br.ibm.com> It makes use of vector instructions to speed up CRC32 algorithm. Decompression times were improved by +30% on tests. Based on Daniel Black's work for the original zlib (madler/zlib#478).
7b15e92
to
d082606
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
Hi,
This PR adds an optimized version of the crc32 function for POWER processors. This is a port of madler/zlib#478 to zlib-ng. The algorithm is mostly intact, as originally developed by @racardoso and added to the original zlib by @grooverdan.
While preparing this PR, I also found some issues in the way
$ARCH
was detected on the CI using CMake, so I added a fix.This PR also adds new crc32 tests that can be used by all targets.
I'm marking this as a Draft PR for a few reasons:
crc32_power8
onfunctable.c
as it's not quite clear to me whatif (sizeof(void *) == sizeof(ptrdiff_t))
is checking.To measure the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.
The results below show compression and decompression throughput in MB/s using gzip wrapper, for all compression levels: