-
-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSE2/3/4/AVX2 optimisations cause crc32 error in pigz #979
Comments
Probably a good idea to provide configure and build logs for both zlib-ng and pigz as well as the versions and the actual complete output which contains the error. |
pigz.build.log output: [crispin@yossarian pigz]$ wget http://www.zlib.net/pigz/pigz-2.6.tar.gz pigz-2.6.tar.gz 100%[=======================================================================================================================>] 104.34K 170KB/s in 0.6s 2021-05-29 17:57:26 (170 KB/s) - ‘pigz-2.6.tar.gz’ saved [106840/106840] [crispin@yossarian pigz]$ sha1sum pigz-2.6.tar.gz |
cpu: vendor_id : GenuineIntel |
@tpgxyz FYI |
This may be the root cause. Can you please build pigz against zlib-ng to see if it reproduces. It may also be worthwhile reporting what version of regular libz the pigz is built against. |
@gdevenyi pigz and others are compiled against zlib-ng. As @cris-b mentioned OpenMandriva ditched zlib in favour of zlib-ng. Same situation on aarch64
|
Bisected to 4bc5bd6 |
Looks like my bisect skills are some what lacking. I tracked it further back to 9f6bacb. So the inflate chunk stuff is causing the problem. That would make since if it was happening on multiple architectures. |
There is some problem with |
I can't really see why it can fail on multiple architectures... Only reasons it should fail is if it copies too few bytes or overwrites something that it isn't supposed to... There is so many asserts that one of them should get triggered when using debug build. |
It may have to do with the fact that |
Those two cases seem to be quite easy to test, I presume? I thought |
Line 40 in 59306ef
I'm starting to think this line has off-by-one error... Should be
|
I think the issue is that I think we haven't run into this before because matches at the end of the window are not common. But also because This can be corrected in Lines 268 to 271 in 025b27e
Good news is this should only affect |
@nmoinvaz I can amend my patch with the change to use _safe() versions... It should already make sure no more than |
I'm not sure if that is desired because the safe versions are slower. |
@nmoinvaz Crashing or data corruption is always worse choice than being slow... In most hardware implementations small moves using vector registers are slower than equivalent move using general-purpose registers. Despite that, I made sure the vector instructions are used for lengths that are equal or longer than chunk_t. |
This should be resolved in 2.0.4 now. |
Thank you guys for fixing this issue in 2.0.4 version. |
Thanks for the fixes in 2.0.4, but they don't seem to be 100% complete just yet.
|
To diagnose the problem, we need binary diff of the corrupted output and the expected output... That way we know if the root cause is same or different. |
@berolinux can you please provide it ? |
Do you have any smaller file that this occurs on? I would like to include it in repository for CI but that one is too big. |
@nmoinvaz i did check on pigz
with gzip it works
|
I'm pretty sure I found a design flaw in CHUNKCOPY_SAFE() that might be relevant... |
I think I had mentioned something like that here: #982 (comment). My example used |
@nmoinvaz I rewrote whole chunkcopy_safe() so it uses memcpy() with constant lengths instead (32,16,8,4,2,1)... At least on AArch64 it can optimize the code better that way... If I tried to use loadchunk()/storechunk() it used four reads/writes, but using memcpy it used single read/write (LDP/STP). I assume on other platforms it also uses widest possible register type when length parameter is constant. |
Ah ok I see you are using while loop instead. |
@nmoinvaz Using while loop was logical as it simplifies the code a lot... At least gcc can optimize while loops according to Compiler Explorer... |
@berolinux Retest with latest develop tree, we're trying to get all the fixes included in 2.0.5 |
@mtl1979 i'll do some tests too. |
Please test |
@Dead2 Like I said elsewhere, testing against |
@mtl1979 |
Today i'm going to start to verify that |
@Dead2 @mtl1979 I just run test on aarch64 from
|
@tpgxyz That's good if it works now... |
This is on an Openmandriva 4.3 system building libz-ng and pigz from source (the system defaults to clang but have tried with gcc without change).
Recently we switched to zlib-ng as the system zlib, but this for me caused a problem with pigz (which we use as the default gzip).
I have to build zlib-ng with the following options (or WITH_OPTIM=OFF, but these are narrowed down):
-DWITH_SSSE3=OFF -DWITH_SSE4=OFF -DWITH_SSE2=OFF -DWITH_AVX2=OFF
or I get
pigz: skipping: pigz-2.5.tar.gz: corrupted -- crc32 mismatch
when uncompressing certain binaries. (the pigz official source tar is one).
I've tried rebuilding zlib-ng with all compiler optimisations off (pgo/lto etc) and similar with pigz. The only thing I can do to get it to work is with the above options off.
This on a haswell era Intel I7.
It should also be noted that uncompressing the archive using libarchive (via the libarchive tar implementation) which uses zlib-ng does not give an error.
Have tried pigz 2.5/2.6, zlib-ng 2.0.2/2.0.3
The text was updated successfully, but these errors were encountered: