Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pigz compiled with zlib-ng generates errors with level=1 #536

Closed
neurolabusc opened this issue Feb 19, 2020 · 32 comments
Closed

pigz compiled with zlib-ng generates errors with level=1 #536

neurolabusc opened this issue Feb 19, 2020 · 32 comments
Labels

Comments

@neurolabusc
Copy link

@reddydp discovered that pigz compiled with zlib-ng generates errors with compression level 1. This is specific to zlib-ng, as versions of pigz compiled to default, Intel or CloudFlare zlib does not show this error. Since the zlib-ng and lib-Intel use similar strategy for level-1 compression, I wonder if this is related to a fixed intel zlib issue.

We have replicated the issue on Linux as well as MacOS. To see the issue run the following commands:

 git clone git clone https://github.com/neurolabusc/pigz-bench
 cd pigz-bench
 ./1compile.sh 
 ./4verify.sh

The issue is only seen with pigzNG with the level 1 compression:

Segmentation fault: 11  $cmd
pigzNG	1	1	33
pigzNG: skipping: /Users/chris/src/pigz-bench/corpus/asl16.nii.gz: corrupted -- invalid deflate data (invalid stored block lengths)
@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 20, 2020

It happens due to deflateParams call which changes the level mid-stream. It starts out as level 6 and then switches to level 1.

I believe the max window size for deflate level 1 is 8192 (the size of quick_dist_codes). This is set as (1 << 13) here:

#ifdef X86_QUICK_STRATEGY

When the call to deflateParams happens, the w_size is never updated. This can allow dist in deflate_quick to exceed the size of quick_dist_codes array. See the check here:

if (dist > 0 && (dist-1) < (s->w_size - 1)) {

I think that perhaps since w_size is dependent upon compression level, we should be setting it during deflateParams call. Also perhaps configuration_level should have a setting for max window size for each algorithm and we should use that.

Removing the check for s->last_flush != 2 in deflateParams fixes the problem. It seems the change was made in 11413c0#diff-8940271ef2146523af486ca4408361da and appears to not be part of Intel zlib.

Perhaps when changing the level, if the window size changes, it should force block to flush.

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 20, 2020

If deflate_quick can't handle safely larger window size than 8192, shouldn't we limit dist - 1 to smallest of s->w_size - 1 and 8191 in

if (dist > 0 && (dist-1) < (s->w_size - 1)) {

... This allows changing the level back and forth without changing the code too much.

@nmoinvaz
Copy link
Member

Yes that is possible, however there is a problem in pigz if we don't flush the block before switching to level 1.

pigz: ../pigz.c:2449: void single_compress(int): Assertion `strm->avail_in == 0' failed.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 20, 2020

That problem goes away if change quick deflate like so, adding s->strm->avail_in == 0 check:

image

The issue is a combination of madler's changes and deflate_quick which I don't think aleast Intel or Cloudflare have.

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 20, 2020

I'm not completely sure adding s->strm->avail_in == 0 has no negative side-effects...

@nmoinvaz
Copy link
Member

I did a burn test against switchlevels using various inputs and without s->strm->avail_in == 0 I get this:

Debug\switchlevels.exe 6 88933 1 195840 2 45761
deflate() did not consume 79973 bytes of input
Debug\switchlevels.exe 6 19196 1 123183 2 72888
deflate() did not consume 3115 bytes of input
Debug\switchlevels.exe 6 61854 1 222485 2 18297
deflate() did not consume 104115 bytes of input
Debug\switchlevels.exe 6 74302 1 94616 2 57517
Debug\switchlevels.exe 6 105968 1 227353 2 13465
deflate() did not consume 103945 bytes of input
Debug\switchlevels.exe 6 23044 1 113788 2 48375

Because deflate_quick is early exiting even though there is more available input. If I add that check those warnings go away. It could also be due to the fault of switchlevels doing this kind of check..

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 21, 2020

Obviously when the check is there, it will consume more input than without the check, but deflate* is never supposed to consume more input than necessary to finish one block of output. When Z_FINISH is used, it doesn't wait for enough data to create full block.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 21, 2020

In my test case, it is needing to flush pending buf before it emits an end block. Is that expected? That means that the block is larger than s->pending_buf_size. That would mean a single call to deflate() would not emit a single block because it returns need_more.

Perhaps it should be:

if (s->strm->avail_in == 0 && flush != Z_FINISH) {
    static_emit_end_block(s, 0);
    return need_more; 
}

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 21, 2020

Because you can call deflate() with just 1 byte input, one call doesn't have to emit at least one full block, but it shouldn't emit more than one full block because that could cause overflow of internal structures.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 21, 2020

I see, then if that case I don't see a problem with s->strm->avail_in == 0 allowing it to continue to process input after flush so long as it hasn't output a complete block which doesn't appear to be the case in that particular loop.

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 21, 2020

Maybe you should make a pull requests with all the modifications so it's easier for everyone to review...

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 21, 2020

The reason I suggest calling static_emit_end_block during the s->strm->avail_in == 0 loop, is because without it I ran across deflateParams returning Z_BUF_ERROR here:

return Z_BUF_ERROR;

This happens when deflateParams is called without a prior call to deflateReset. The test app pigz that is referenced in this issue seems to call it, while switchlevels does not. deflateReset sets last_flush == -2 which prevents it from flushing the last block in deflateParams.

But when deflateParams does flush the last block with Z_BLOCK then deflate_quick() never completely finishes the block because of the early exit with need_more. At the least the end block should be emitted when flush is Z_SYNC_FLUSH || Z_FULL_FLUSH || Z_BLOCK upon early exit.

Adding the check s->strm->avail_in == 0 only allows it to continue to process more, but there is still an edge case where avail_in reaches 0 and flush at the same time - so we need to emit the end block even with the check.

I will try to submit a PR in the coming days.

@nmoinvaz
Copy link
Member

Here are my test cases for now: nmoinvaz@fb0324b

@neurolabusc
Copy link
Author

Not sure if this is related to issue 284 (which was also with level 1, but not a parallel compressor). I note a couple of the Pull Requests remain open .

Here is a minimal adaptation of my Python scripts to illustrate the issue. This script will compile pigz using zlib-ng and validate the compression/decompression.

a_compile.py
c_decompress.py

The crucial line is for lvl in range(1, 3): - if compression level 1 is tested pigz-ng will generate an error (tested just now on x86-64 MacOS)

python_script.zip

@nmoinvaz
Copy link
Member

The changes have been merged.

@neurolabusc
Copy link
Author

@nmoinvaz the error persists in the develop (default) branch of zlib-ng. You can see this by running pigz-bench-python. You use the script a_compile.py to compile pigs using zlib-ng. Next, set up the scripts to test level 1 (which was disabled due to this bug.

For c_decompress.py edit line 172
for lvl in range(2, 10):
to read
for lvl in range(1, 10):

For f_speed_size_decompress.py edit line 81
for lvl in range(2, max_level + 1):
to read
for lvl in range(1, max_level + 1):

When you run the modified scripts, they will generate errors:

pigzng: skipping: ./temp/pigzng1_asl16.nii.gz: corrupted -- invalid deflate data (invalid distance code)
...
pigzng: skipping: ./temp/pigzng1_asl32.nii.gz: corrupted -- crc32 mismatch

It seems the issue persists with files created by pigz at level 1 when pigz is compiled with zlib-ng.

@nmoinvaz nmoinvaz reopened this May 3, 2020
@nmoinvaz
Copy link
Member

nmoinvaz commented May 3, 2020

I think this may still need #549 but I haven’t tested against it myself.

@nmoinvaz
Copy link
Member

@neurolabusc can you check again against latest develop branch? Thanks.

@nmoinvaz
Copy link
Member

nmoinvaz commented May 14, 2020

It looks like it may still be occurring.

@nmoinvaz
Copy link
Member

@neurolabusc this should be resolved now.

@neurolabusc
Copy link
Author

neurolabusc commented May 26, 2020

@nmoinvaz not yet fixed on the default (develop) branch of zlib-ng. I tested on an Intel i5 using MacOS and an AMD Ryzen 3900x using Ubuntu 20.04. To replicate my test:

  1. Download pigz-bench-python
  2. Run a_compile.py to compile various versions of pigz.
  3. Use a text editor to change a line in c_decompress.py to read for lvl in range(1, 10): instead of for lvl in range(2, 10):
  4. Run c_decompress.py
  5. Notice that gzip, and pigz (regardless of zlib variant) are unable to decompress the level 1 compressed file created by the zlib-ng variant of pigz:
gzip: ./temp/pigzng1_asl16.nii.gz: uncompress failed
gzip	9279	221.97
pigzSystem: skipping: ./temp/pigzng1_asl32.nii.gz: corrupted -- invalid deflate data (invalid distance code)
pigzSystem: skipping: ./temp/pigzng1_asl16.nii.gz: corrupted -- invalid deflate data (invalid distance code)
pigzSystem	7981	258.09
pigzCloudflare: skipping: ./temp/pigzng1_asl32.nii.gz: corrupted -- invalid deflate data (invalid distance code)
pigzCloudflare: skipping: ./temp/pigzng1_asl16.nii.gz: corrupted -- invalid deflate data (invalid distance code)
pigzCloudflare	7807	263.84
pigzng: skipping: ./temp/pigzng1_asl32.nii.gz: corrupted -- invalid deflate data (invalid distance code)

On the other hand, the latest zlib-ng does appear to caught up with Cloudflare compression on both systems (Ryzen 3900x shown):

3900x

If one runs the unedited version of c_decompress.py (which only tests level 2..9) the decompression is fine, and zlib-ng is ~20% faster than the other pigz variants and twice as fast as gzip.

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 1, 2020

@neurolabusc can you confirm whether or not level 1 deflate_quick works when -p 1 command line argument is used? It sets it to single threaded. For me there is no issue when one thread is used. It has been a battle to get an environment setup to debug pigz on Windows but that is where I am at now.

@neurolabusc
Copy link
Author

neurolabusc commented Jun 1, 2020

@nmoinvaz at least some errors are still present in the default (develop) branch of zlib-ng, even if pigz is only run in single-threaded mode. Here is how I tested this on my Linux system:

  1. download benchmark
    git clone https://github.com/neurolabusc/pigz-bench-python
  2. open f_speed_size_decompress.py with a text editor
  • Insert ' -p 1 ' into line 376 and also optionally remove the quiet mode decompression (-q) so it reads:
    exes.append({'exe': exe, 'uncompress': ' -f -k -d ', 'compress': ' -p 1 -q -f -k -', 'max_level': 9, 'ext': '.gz' })
  • Edit line 271 to test from level 1 not 2, so it reads:
    for lvl in range(1, max_level+1):
  • Edit line 81 to test from level 1 not 2, so it reads:
    for lvl in range(1, max_level + 1)::
  1. compile the pigz variants
    a_compile.py
  2. test the pigz variants
    f_speed_size_decompress.py
  3. Notice gzip is unhappy with the level 1 compressed file created by the zlib-ng variant of pigz:
    gzip: ./temp/pigzng1_asl32.nii.gz: invalid compressed data--format violated

Likewise, the terminal reveals the zlib-ng variant of pigz is unable to extract the file it created:

../exe/pigzng -k -d pigzng1_asl32.nii.gz
pigzng: skipping: pigzng1_asl32.nii.gz: corrupted -- invalid deflate data (invalid distance code)
pigzng: abort: internal threads error

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 2, 2020

Sometimes setting ZLIB_VERNUM to 0x1240 seems to make the problem disappear..

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 2, 2020

I also noticed the single_compress function outputs Z_NO_FLUSH at the end of the ZLIB_VERNUM >= 0x1260 block and parallel_compress outputs Z_BLOCK. Perhaps a bug?

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 2, 2020

It appears with Z_BLOCK it shouldn't write a block unless there is s->strm->avail_in. This the reason for the s->sym_next check at the bottom of deflate_fast. @neurolabusc, can you test against my tests/def-v1 branch and see if the error happens?

@neurolabusc
Copy link
Author

@nmoinvaz this passes my tests both for Neuroimaging data as well as the Silesia corpus (shown below, pigznm is your tests/def-v1 build).

DecompressMethod	ms	mb/s
gzip	25305	226.13
pigznm	19474	293.84
pigzCloudflare	24176	236.69

silesia_speed_size

@neurolabusc
Copy link
Author

neurolabusc commented Jun 2, 2020

@nmoinvaz while your branch worked on my MacOS 4-core 8-thread MacBook, I was not so lucky with my Ubuntu 20.04 Ryzen 3900X. It handles my neuroimaging corpus without issue, but reliably generates a segmentation fault when asked to compress the xray file from the silesia corpus using level 1. This can be seen in the command line outside of the script:

./pigznm -1 -k -f ./x-ray
Segmentation fault (core dumped)
./pigznm -1 -k -f -p 1 ./x-ray
munmap_chunk(): invalid pointer
Aborted (core dumped)
./pigznm -1 -k -f -p 2 ./x-ray
Segmentation fault (core dumped)

For my MacOS system, all values of -p work (including default) except -p 1:

./pigznm -1 -k -f ./x-ray
./pigznm -1 -k -f -p 1 ./x-ray
Segmentation fault: 11
./pigznm -1 -k -f -p 2 ./x-ray

CloudFlare and System zlib work in all these situations.

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 3, 2020

@neurolabusc, can you please try again with tests/def-v1. I have made some additional changes. I don't really have the ability to track down some of these segfaults that are hardware specific.

@nmoinvaz
Copy link
Member

nmoinvaz commented Jun 3, 2020

I don't experience a crash with silesia x-ray with latest tests/def-v1 even with -p 1.

@neurolabusc
Copy link
Author

Your recent commit works on all my computers with both the Silesia corpus and my neuroimaging data. If this can be merged as a pull request, I think this will close the issue. Thanks!

Below are the results for the 3900X

DecompressMethod	ms	mb/s
gzip	11045	186.49
pigzCloudflare	6378	322.92
pigzSystem	6388	322.41
pigzNM	5276	390.39

Silesia Compression:
silesia_speed_size
Neuroimaging Compression:
corpus_speed_size

@nmoinvaz nmoinvaz added the bug label Jun 3, 2020
@nmoinvaz
Copy link
Member

Should be fixed now in latest develop.

neurolabusc added a commit to neurolabusc/pigz-bench-python that referenced this issue Jun 19, 2020
nmoinvaz added a commit to nmoinvaz/zlib-ng that referenced this issue Sep 19, 2020
…ginal because we are using switchlevels and only compressing certain parts of the original.
nmoinvaz added a commit to nmoinvaz/zlib-ng that referenced this issue Sep 19, 2020
nmoinvaz added a commit to nmoinvaz/zlib-ng that referenced this issue Sep 20, 2020
nmoinvaz added a commit to nmoinvaz/zlib-ng that referenced this issue Sep 23, 2020
nmoinvaz added a commit to nmoinvaz/zlib-ng that referenced this issue Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants