Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE2/3/4/AVX2 optimisations cause crc32 error in pigz #979

Closed
cris-b opened this issue May 29, 2021 · 39 comments
Closed

SSE2/3/4/AVX2 optimisations cause crc32 error in pigz #979

cris-b opened this issue May 29, 2021 · 39 comments
Labels

Comments

@cris-b
Copy link

cris-b commented May 29, 2021

This is on an Openmandriva 4.3 system building libz-ng and pigz from source (the system defaults to clang but have tried with gcc without change).

Recently we switched to zlib-ng as the system zlib, but this for me caused a problem with pigz (which we use as the default gzip).

I have to build zlib-ng with the following options (or WITH_OPTIM=OFF, but these are narrowed down):

-DWITH_SSSE3=OFF -DWITH_SSE4=OFF -DWITH_SSE2=OFF -DWITH_AVX2=OFF

or I get

pigz: skipping: pigz-2.5.tar.gz: corrupted -- crc32 mismatch

when uncompressing certain binaries. (the pigz official source tar is one).

I've tried rebuilding zlib-ng with all compiler optimisations off (pgo/lto etc) and similar with pigz. The only thing I can do to get it to work is with the above options off.

This on a haswell era Intel I7.

It should also be noted that uncompressing the archive using libarchive (via the libarchive tar implementation) which uses zlib-ng does not give an error.

Have tried pigz 2.5/2.6, zlib-ng 2.0.2/2.0.3

@gdevenyi
Copy link
Contributor

Probably a good idea to provide configure and build logs for both zlib-ng and pigz as well as the versions and the actual complete output which contains the error.

@cris-b
Copy link
Author

cris-b commented May 29, 2021

pigz.build.log
zlib-ng.build.log

output:

[crispin@yossarian pigz]$ wget http://www.zlib.net/pigz/pigz-2.6.tar.gz
--2021-05-29 17:57:25-- http://www.zlib.net/pigz/pigz-2.6.tar.gz
Resolving www.zlib.net... 85.187.148.2
Connecting to www.zlib.net|85.187.148.2|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106840 (104K) [application/x-gzip]
Saving to: ‘pigz-2.6.tar.gz’

pigz-2.6.tar.gz 100%[=======================================================================================================================>] 104.34K 170KB/s in 0.6s

2021-05-29 17:57:26 (170 KB/s) - ‘pigz-2.6.tar.gz’ saved [106840/106840]

[crispin@yossarian pigz]$ sha1sum pigz-2.6.tar.gz
da4683552b2e55ff133bb041bd3a76f218adc01e pigz-2.6.tar.gz
[crispin@yossarian pigz]$ LD_PRELOAD=/home/crispin/rpmbuild/BUILD/zlib-ng-2.0.3/build/libz.so /home/crispin/rpmbuild/BUILD/pigz-2.6/pigz -dc pigz-2.6.tar.gz > o
pigz: skipping: pigz-2.6.tar.gz: corrupted -- crc32 mismatch

@cris-b
Copy link
Author

cris-b commented May 29, 2021

cpu:

vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
stepping : 3
microcode : 0x27
cpu MHz : 2882.443
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple shadow_vmcs
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips : 6186.27
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual

@cris-b
Copy link
Author

cris-b commented May 29, 2021

@tpgxyz FYI

@gdevenyi
Copy link
Contributor

[crispin@yossarian pigz]$ LD_PRELOAD=/home/crispin/rpmbuild/BUILD/zlib-ng-2.0.3/build/libz.so /home/crispin/rpmbuild/BUILD/pigz-2.6/pigz -dc pigz-2.6.tar.gz > o

This may be the root cause. Can you please build pigz against zlib-ng to see if it reproduces.

It may also be worthwhile reporting what version of regular libz the pigz is built against.

@tpgxyz
Copy link

tpgxyz commented May 29, 2021

@gdevenyi pigz and others are compiled against zlib-ng. As @cris-b mentioned OpenMandriva ditched zlib in favour of zlib-ng.

Same situation on aarch64

[tpg@omv-rockpro64 OpenMandriva]$ pigz -dc pigz-2.6.tar.gz > o
pigz: skipping: pigz-2.6.tar.gz: corrupted -- crc32 mismatch
[tpg@omv-rockpro64 OpenMandriva]$ uname -a
Linux omv-rockpro64 5.12.7-server-clang-1omv4050 #1 SMP Sat May 29 09:33:49 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

@nmoinvaz
Copy link
Member

Bisected to 4bc5bd6

@nmoinvaz
Copy link
Member

Looks like my bisect skills are some what lacking. I tracked it further back to 9f6bacb. So the inflate chunk stuff is causing the problem. That would make since if it was happening on multiple architectures.

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 1, 2021

There is some problem with chunkcopy and chunkcopy_safe. In functable, if I hack it to use chunkcopy_c and chunkcopy_safe_c and leave all the rest of the chunk functions to use intrinsic versions then it works.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 1, 2021

I can't really see why it can fail on multiple architectures... Only reasons it should fail is if it copies too few bytes or overwrites something that it isn't supposed to... There is so many asserts that one of them should get triggered when using debug build.

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 1, 2021

It may have to do with the fact that copychunk_safe is not really safe. There are two instances where it will write beyond the safe limit. One is when (safe - out) >= sizeof(chunk_t) && len > (safe - out) where len should really be len + (sizeof(chunk_t) - (len % sizeof(chunk_t)). The other is when (safe - out) < sizeof(chunk_t) && len > (safe - out).

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 1, 2021

Those two cases seem to be quite easy to test, I presume?

I thought safe = out + len - 1, which is same as safe - out = len - 1, which means len can never be more than safe - out + 1.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 1, 2021

if ((safe - out) < (ptrdiff_t)sizeof(chunk_t)) {

I'm starting to think this line has off-by-one error... Should be

    if ((safe - out + 1) < (ptrdiff_t)sizeof(chunk_t)) {

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 2, 2021

I noticed that the first difference is with a code that has a dist of 32765 which is awfully close to wsize of 32768.

image

image

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 3, 2021

I think the issue is that inflateBack uses a single memory allocation (32k) for sliding window. If we are using the sliding window both as a look backward buffer and as a output buffer (similar to deflate?), we can only write the exact number of bytes we are expecting to. Otherwise, when looking back, it will point to bad data. When using the intrinsics we are writing more than allowed and so offsets back into the window that are near the window boundary are returning junk data when writing codes.

I think we haven't run into this before because matches at the end of the window are not common. But also because
inflateBack() uses window param as an output buffer while inflate() appears to have a separate window memory allocation (see put = state->window in infback.c and updatewindow function in inflate.c).

This can be corrected in inflate_fast around here:

zlib-ng/inffast.c

Lines 268 to 271 in 025b27e

if (dist >= len || dist >= state->chunksize)
out = functable.chunkcopy(out, out - dist, len);
else
out = functable.chunkmemset(out, dist, len);

  • We could replace that functable.chunkcopy call with functable.chunkcopy_safe along with a modified chunkcopy_safe that only writes the number of bytes specified in len. And probably also use functable.chunkmemset_safe instead of functable.chunkmemset.
  • Replacing all that logic with functable.chunkmemset_safe also seems to work (assuming it is truly safe).
    • inflate.c also seems to use these safe functions:

      zlib-ng/inflate.c

      Lines 956 to 962 in 025b27e

      put = functable.chunkcopy_safe(put, from, copy, put + left);
      } else { /* copy from output */
      copy = state->length;
      if (copy > left)
      copy = left;
      put = functable.chunkmemset_safe(put, state->offset, copy, left);
  • Commenting out the inflate fast code in inflate.c also corrects the problem:
    • zlib-ng/inflate.c

      Lines 800 to 807 in 025b27e

      if (have >= INFLATE_FAST_MIN_HAVE && left >= INFLATE_FAST_MIN_LEFT) {
      RESTORE();
      zng_inflate_fast(strm, out);
      LOAD();
      if (state->mode == TYPE)
      state->back = -1;
      break;
      }
  • The original block of code in madler/zlib looks is just a memory copy:
  • It may be possible to also detect when same buffer is being used. (strm->next_out >= state->window && strm->next_out + INFLATE_FAST_MIN_LEFT <= state->window + wsize)

Good news is this should only affect inflateBack.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 3, 2021

@nmoinvaz I can amend my patch with the change to use _safe() versions... It should already make sure no more than len bytes are written...

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 3, 2021

I'm not sure if that is desired because the safe versions are slower.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 3, 2021

@nmoinvaz Crashing or data corruption is always worse choice than being slow... In most hardware implementations small moves using vector registers are slower than equivalent move using general-purpose registers.

Despite that, I made sure the vector instructions are used for lengths that are equal or longer than chunk_t.

@nmoinvaz
Copy link
Member

This should be resolved in 2.0.4 now.

@tpgxyz
Copy link

tpgxyz commented Jun 14, 2021

Thank you guys for fixing this issue in 2.0.4 version.

@berolinux
Copy link
Contributor

Thanks for the fixes in 2.0.4, but they don't seem to be 100% complete just yet.

$ wget https://storage.googleapis.com/golang/go1.16.5.src.tar.gz
$ pigz -d go1.16.5.src.tar.gz 
pigz: skipping: go1.16.5.src.tar.gz: corrupted -- crc32 mismatch
$ rpm -qf /usr/lib64/libz.so.1
lib64z1-2.0.4-1.znver1

@tpgxyz
Copy link

tpgxyz commented Jun 15, 2021

@nmoinvaz @mtl1979 Guys please take a look again. Looks like this issue is partially resolved.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 15, 2021

To diagnose the problem, we need binary diff of the corrupted output and the expected output... That way we know if the root cause is same or different.

@Dead2 Dead2 reopened this Jun 15, 2021
@Dead2 Dead2 added the bug label Jun 15, 2021
@tpgxyz
Copy link

tpgxyz commented Jun 15, 2021

@berolinux can you please provide it ?

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 15, 2021

Do you have any smaller file that this occurs on? I would like to include it in repository for CI but that one is too big.

@tpgxyz
Copy link

tpgxyz commented Jun 16, 2021

@nmoinvaz i did check on pigz

[tpg@omv-rockpro64 OpenMandriva]$ LC_ALL=C wget http://www.zlib.net/pigz/pigz-2.6.tar.gz
--2021-06-16 16:17:08--  http://www.zlib.net/pigz/pigz-2.6.tar.gz
Resolving www.zlib.net... 85.187.148.2
Connecting to www.zlib.net|85.187.148.2|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106840 (104K) [application/x-gzip]
Saving to: 'pigz-2.6.tar.gz'

pigz-2.6.tar.gz                                      100%[====================================================================================================================>] 104.34K   275KB/s    in 0.4s    

2021-06-16 16:17:08 (275 KB/s) - 'pigz-2.6.tar.gz' saved [106840/106840]

[tpg@omv-rockpro64 OpenMandriva]$ LC_ALL=C pigz -d pigz-2.6.tar.gz 
pigz: skipping: pigz-2.6.tar.gz: corrupted -- crc32 mismatch
[tpg@omv-rockpro64 OpenMandriva]$ rpm -qf /usr/lib64/libz.so.1
lib64z1-2.0.4-1.aarch64

with gzip it works

[tpg@omv-rockpro64 OpenMandriva]$ gzip-st -d pigz-2.6.tar.gz
[tpg@omv-rockpro64 OpenMandriva]$ rpm -qf $(command -v gzip-st)
gzip-1.10-4.aarch64

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 18, 2021

I'm pretty sure I found a design flaw in CHUNKCOPY_SAFE() that might be relevant...

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 19, 2021

I think I had mentioned something like that here: #982 (comment). My example used len % sizeof(chunk_t) to do the partial copy and when that is done if len == 0 then it returns out. Otherwise it will do the rest of chunkcopy since len is multiple of chunk_t.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 19, 2021

@nmoinvaz I rewrote whole chunkcopy_safe() so it uses memcpy() with constant lengths instead (32,16,8,4,2,1)... At least on AArch64 it can optimize the code better that way... If I tried to use loadchunk()/storechunk() it used four reads/writes, but using memcpy it used single read/write (LDP/STP). I assume on other platforms it also uses widest possible register type when length parameter is constant.

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 19, 2021

Ah ok I see you are using while loop instead.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 19, 2021

@nmoinvaz Using while loop was logical as it simplifies the code a lot... At least gcc can optimize while loops according to Compiler Explorer...

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 22, 2021

@berolinux Retest with latest develop tree, we're trying to get all the fixes included in 2.0.5

@tpgxyz
Copy link

tpgxyz commented Jun 22, 2021

@mtl1979 i'll do some tests too.

@Dead2
Copy link
Member

Dead2 commented Jun 22, 2021

Please test Stable branch instead, that is the one containing only the fixes for the planned 2.0.5 release.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 22, 2021

@Dead2 Like I said elsewhere, testing against stable branch works few moments after you have updated it... However this close to the release, it might get force-pushed quite many times.

@Dead2
Copy link
Member

Dead2 commented Jun 22, 2021

@mtl1979 Stable is where 2.0.5 is being staged, and has been for days. Testing develop might encounter different bugs or fixes since they are diverging, invalidating the testing when it regards to Stable.
I have not force-pushed the Stable branch so far, and I don't really intend to do so, so I don't think we need to worry too much about that problem.

@tpgxyz
Copy link

tpgxyz commented Jun 23, 2021

Today i'm going to start to verify that Stable branch, as we have some issues after updating openssl-to 3.0.0-beta1 version.

tpgxyz added a commit to OpenMandrivaAssociation/zlib-ng that referenced this issue Jun 23, 2021
@tpgxyz
Copy link

tpgxyz commented Jun 23, 2021

@Dead2 @mtl1979 I just run test on aarch64 from stable tip on 3a66bc7

[tpg@omv-rockpro64 OpenMandriva]$ LC_ALL=C wget http://www.zlib.net/pigz/pigz-2.6.tar.gz
--2021-06-23 23:25:09--  http://www.zlib.net/pigz/pigz-2.6.tar.gz
Resolving www.zlib.net... 85.187.148.2
Connecting to www.zlib.net|85.187.148.2|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106840 (104K) [application/x-gzip]
Saving to: 'pigz-2.6.tar.gz'

pigz-2.6.tar.gz                                      100%[====================================================================================================================>] 104.34K   220KB/s    in 0.5s    

2021-06-23 23:25:11 (220 KB/s) - 'pigz-2.6.tar.gz' saved [106840/106840]

[tpg@omv-rockpro64 OpenMandriva]$ LC_ALL=C pigz -d pigz-2.6.tar.gz 
[tpg@omv-rockpro64 OpenMandriva]$ file pigz-2.6.tar 
pigz-2.6.tar: POSIX tar archive

@mtl1979
Copy link
Collaborator

mtl1979 commented Jun 23, 2021

@tpgxyz That's good if it works now...

@nmoinvaz nmoinvaz closed this as completed Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants