x86_64 deflate_quick fails on chromium-benchmark fireworks.jpeg #284

sebpop · 2019-01-17T19:21:37Z

In the past I have seen some errors on x86_64 with the snappy chromium-benchmark at compression level 1. Check that these errors are fixed.

sebpop · 2019-01-17T19:41:18Z

I still see it failing at level 1 with

fireworks.jpeg                           : compress failed (0)

sebpop · 2019-01-17T19:42:51Z

It passes on aarch64 and fails on x86_64 which points to some bug in intel's patches for deflate_quick.

sebpop · 2019-01-17T19:53:35Z

When replacing deflate_quick with deflate_fast it passes.

mtl1979 · 2019-01-17T19:58:54Z

It would be good to know why deflate_quick fails... That pretty much requires enabling debug output and possibly adding more sanity checks.

sebpop · 2019-01-17T20:28:27Z

The benchmark uses an estimate of the compress size to resize a buffer before decompressing.
In the case of fireworks.jpeg with deflate_quick we grow the output by 5.2%:

test/bench/snappy/fireworks.jpeg         : GZIP -1: [b 1M] bytes 123093 -> 129548 105.2% comp  34.9 ( 34.9) MB/s uncomp 139.7 (139.7) MB/s

The benchmark passes if we set the estimated compression size to double of the input:

 z_size_t ZEXPORT PREFIX(compressBound)(z_size_t sourceLen) {
-    return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) + (sourceLen >> 25) + 13;
+    return 2 * sourceLen;

When printing the original estimated size, we see far less than 5%:

 zng_compressBound (sourceLen=123093)
(gdb) finish
Value returned is $7 = 123143

The fix is to special case intel's deflate_quick to have a different function to estimate its output buffer size.

sebpop · 2019-01-17T21:56:42Z

There is at least one more bug that we see with lcet10.txt and others that returns -3 which is different than the jpeg which was returning 0 instead of 1.

mtl1979 · 2019-01-18T09:22:36Z

Maybe we should adjust one of the shifts to make it return slightly larger value... That way we don't completely butcher the logic inside compressBound...

    return sourceLen + (sourceLen >> 4) + (sourceLen >> 12) + (sourceLen >> 14) + (sourceLen >> 25) + 13;

This should allow 6.25~6.28% increase.

bmrzycki · 2019-01-18T21:31:16Z

I did some further investigation and it's not entirely clear to me who's at fault. Here's my patch to alter the method that the chromium zlib benchmark compresses data that passes all tests and requires no changes to zlib-ng.

--- a/zlib/ng_snappy/src/test/bench/chromium_zlib_bench.cc
+++ b/zlib/ng_snappy/src/test/bench/chromium_zlib_bench.cc
@@ -81,7 +81,10 @@ Data read_file_data_or_exit(const char* name) {
 }
 
 size_t zlib_estimate_compressed_size(size_t input_size) {
-  return PREFIX(compressBound)(input_size);
+  // Be conservative. Some algorithms like Intel's deflate_quick
+  // (x86 level 1) may make a file larger than the original if
+  // already compressed (eg. jpeg).
+  return PREFIX(compressBound)(input_size * 3);
 }
 
 enum zlib_wrapper {
@@ -113,6 +116,14 @@ const char* zlib_wrapper_name(zlib_wrapper type) {
   return 0;
 }
 
+// Mimic what minigzip.c does, uses 8 for memory level to deflateInit2.
+#ifndef DEF_MEM_LEVEL
+# define DEF_MEM_LEVEL 8
+#endif
+
+// Mimic minigzip.c, deflate in chunk sizes of 16384 * 3.
+#define BUFLENW 49152
+
 void zlib_compress(
     const zlib_wrapper type,
     const char* input,
@@ -129,24 +140,56 @@ void zlib_compress(
   memset(&stream, 0, sizeof(stream));
 
   int result = PREFIX(deflateInit2)(&stream, compression_level, Z_DEFLATED,
-      zlib_stream_wrapper_type(type), MAX_MEM_LEVEL, Z_DEFAULT_STRATEGY);
+      zlib_stream_wrapper_type(type), DEF_MEM_LEVEL, Z_DEFAULT_STRATEGY);
   if (result != Z_OK)
     error_exit("deflateInit2 failed", result);
 
-  stream.next_out = (unsigned char *)string_data(output);
-  stream.avail_out = (uint32_t)output_size;
-  stream.next_in = (const unsigned char *)input;
-  stream.avail_in = (uint32_t)input_size;
+  int in_off = 0;
+  unsigned char *in = (unsigned char *)input;
+  size_t remain = input_size;
+
+  int out_off = 0;
+  unsigned char *out = (unsigned char *)string_data(output);
+  size_t out_end = (size_t)out + output_size;
+
+  for (;;) {
+    if (out_end < ((size_t)out + out_off + BUFLENW))
+      error_exit("not enough space in output buffer!", 1);
+
+    stream.next_out = (out + out_off);
+    stream.avail_out = BUFLENW;
+    stream.next_in = (in + in_off);
+    stream.avail_in = remain;
+
+    result = PREFIX(deflate)(&stream, Z_NO_FLUSH);
+    if (result != Z_OK)
+      error_exit("deflate did not return Z_OK!", result);
+
+    out_off += (BUFLENW - stream.avail_out);
+    in_off += (remain - stream.avail_in);
+    remain = stream.avail_in;
+    if (remain == 0)
+      break;
+  }
+
+  if (out_end < ((size_t)out + out_off + BUFLENW))
+    error_exit("not enough space in output buffer!", 1);
+
+  stream.next_out = (out + out_off);
+  stream.avail_out = BUFLENW;
+  stream.next_in = NULL;
+  stream.avail_in = 0;
 
   result = PREFIX(deflate)(&stream, Z_FINISH);
-  if (result == Z_STREAM_END)
-    output_size = stream.total_out;
-  result |= PREFIX(deflateEnd)(&stream);
   if (result != Z_STREAM_END)
-    error_exit("compress failed", result);
+    error_exit("compress failed Z_FINISH", result);
+
+  result = PREFIX(deflateEnd)(&stream);
+  if (result != Z_OK)
+    error_exit("compress failed end", result);
 
   if (resize_output)
-    output->resize(output_size);
+    output->resize(stream.total_out);
 }
 
 void zlib_uncompress(

This patch was inspired by minigzip.c's method of decompressing in chunks of 3 * 16KB instead of attempting to do it all in one call. This patch passes for all the input files at level 1 and level 9 with all 3 wrapper types.

In the case of zlib_estimate_compressed_size() I had to multiply by 3 for a baddata input file, 2 wasn't enough. At the very least compressBound is not sufficient for Intel's deflate_quick() algorithm and there are some cases where the compressed data is larger than the input file.

It's also a bit unexpected that all other algorithm variant has no trouble with PREFIX(deflate)(&stream, Z_NO_FLUSH) except deflate_quick(). If this is a supported method of using the zlib library then I have concerns there's a bug in the library code when called on this path.

@sebpop and I can try to push for changes to the benchmark but I'm worried that may just be hiding a deeper set of bugs.

bmrzycki · 2019-01-25T20:33:08Z

I've made a small change to my previous patch and everything continues to pass:

--- a/zlib/ng_snappy/src/test/bench/chromium_zlib_bench.cc
+++ b/zlib/ng_snappy/src/test/bench/chromium_zlib_bench.cc
@@ -80,8 +80,16 @@ Data read_file_data_or_exit(const char* name) {
   return data;
 }

+// Mimic minigzip.c, deflate in chunk sizes of 16384 * 3.
+#define BUFLENW 49152
+
 size_t zlib_estimate_compressed_size(size_t input_size) {
-  return PREFIX(compressBound)(input_size);
+  // Be conservative. Some algorithms like Intel's deflate_quick
+  // (x86 level 1) may make a file larger than the original if
+  // already compressed (eg. jpeg).
+  // We work in BUFLENW blocks, make sure we have at least 2
+  // full blocks worth of room for padding.
+  return PREFIX(compressBound)(input_size) + (2 * BUFLENW);
 }

The big difference is I am now using zlib-ng's compressBound() value, just padding it with 2 additional write buffers. I've tried this on large files >200MB and do not see errors with the Google framework. Minigzip doesn't have to do this because it doesn't keep a copy in memory, it just streams the same chunk over and over to a filehandle.

sebpop · 2019-01-25T20:42:45Z

Some algorithms like Intel's deflate_quick
(x86 level 1) may make a file larger than the original if
already compressed (eg. jpeg).

I think the change in behavior needs to be documented for the compression algorithm on x86 at level 1.
Where should we put this documentation?
In the past we spoke about documenting the changes made in zlib-ng compared to madler/zlib.
What about adding a new file in doc/ or a top level ChangeLog.zlib-ng ?

bmrzycki · 2019-01-25T21:12:08Z

For those interested, here is my latest patch to the Google zlib benchmark. It is less prone to failure, especially if you alter the MEM_LEVEL passed to deflateInit2(). I am now using MAX_MEM_LEVEL like the original code did and do not alter zlib_estimate_compressed_size() other than adding two BUFLENW buffers of padding.

--- a/zlib/ng_snappy/src/test/bench/chromium_zlib_bench.cc
+++ b/zlib/ng_snappy/src/test/bench/chromium_zlib_bench.cc
@@ -80,8 +80,15 @@ Data read_file_data_or_exit(const char* name) {
   return data;
 }

+// Mimic minigzip.c, deflate in chunk sizes of 16384 * 3.
+#define BUFLENW 49152
+
 size_t zlib_estimate_compressed_size(size_t input_size) {
-  return PREFIX(compressBound)(input_size);
+  // Some algorithms like Intel's deflate_quick (x86 level 1) may make a file
+  // larger than the original if already compressed (eg. jpeg). Add two
+  // write buffers on the end for padding and to ensure there is space
+  // for Z_FINISH.
+  return PREFIX(compressBound)(input_size) + (BUFLENW * 2);
 }

 enum zlib_wrapper {
@@ -133,20 +140,60 @@ void zlib_compress(
   if (result != Z_OK)
     error_exit("deflateInit2 failed", result);

-  stream.next_out = (unsigned char *)string_data(output);
-  stream.avail_out = (uint32_t)output_size;
-  stream.next_in = (const unsigned char *)input;
-  stream.avail_in = (uint32_t)input_size;
+  int in_off = 0;
+  unsigned char *in = (unsigned char *)input;
+  size_t remain = input_size;

-  result = PREFIX(deflate)(&stream, Z_FINISH);
-  if (result == Z_STREAM_END)
-    output_size = stream.total_out;
-  result |= PREFIX(deflateEnd)(&stream);
-  if (result != Z_STREAM_END)
-    error_exit("compress failed", result);
+  int out_off = 0;
+  unsigned char *out = (unsigned char *)string_data(output);
+  size_t out_end = (size_t)out + output_size;
+
+  for (;;) {
+    if (out_end < ((size_t)out + out_off + BUFLENW))
+      error_exit("not enough space in output buffer!", 1);
+
+    stream.next_out = (out + out_off);
+    stream.avail_out = BUFLENW;
+    stream.next_in = (in + in_off);
+    stream.avail_in = remain;
+
+    result = PREFIX(deflate)(&stream, Z_NO_FLUSH);
+    if (result != Z_OK)
+      error_exit("deflate did not return Z_OK!", result);
+
+    out_off += (BUFLENW - stream.avail_out);
+    in_off += (remain - stream.avail_in);
+    remain = stream.avail_in;
+    if (remain == 0)
+      break;
+  }
+
+  // No more input data to process, iterate until we see Z_FINISH.
+  stream.next_in = NULL;
+  stream.avail_in = 0;
+  for(;;) {
+      if (out_end < ((size_t)out + out_off + BUFLENW))
+         error_exit("not enough space in output buffer!", 1);
+
+      stream.next_out = (out + out_off);
+      stream.avail_out = BUFLENW;
+
+      result = PREFIX(deflate)(&stream, Z_FINISH);
+      if (result == Z_STREAM_END) {
+         break;
+      } else if (result == Z_OK) {
+         out_off += (BUFLENW - stream.avail_out);
+      } else {
+         error_exit("compress failed Z_FINISH", result);
+      }
+  }
+
+  result = PREFIX(deflateEnd)(&stream);
+  if (result != Z_OK)
+    error_exit("compress failed end", result);

   if (resize_output)
-    output->resize(output_size);
+    output->resize(stream.total_out);
 }

 void zlib_uncompress(

Myriachan · 2019-04-20T00:10:53Z

Why can't this be fixed instead by changing deflateBound and compressBound to always include this extra size #if x86?

bmrzycki · 2019-04-22T15:25:44Z

Hello @Myriachan , I think that's one option. My concern with such a patch is if it masks a deeper bug or issue. I cant say with confidence if x86 at level 1 is operating correctly and don't have the deeper knowledge of Intel's deflate_quick overall algorithm.

Dead2 · 2019-08-21T07:29:31Z

Does PR #382 affect this problem at all? It seems like a real bug in deflate_quick, but I am not clear on whether it fixes this problem or not.

bmrzycki · 2019-08-22T16:32:54Z

Hello @Dead2 . I rebased my internal test framework to c0f0acf after seeing PR #382 was merged to tip. I then re-ran bench/chromium_zlib_bench.cc without any of my above patches to prevent failures. When running I still see the same failure message:

compress failed (0)
fireworks.jpeg

Dead2 · 2019-08-23T08:13:16Z

@bmrzycki Thank you for checking, I really appreciate it :)

I am hoping that this bug is related to #390, since that seems to erroneously emit an extra end block each time it runs out of data in the in-buffer.
At least that offers a logical explanation to this bug. I have otherwise been unable to understand how deflate_quick could deliver worse worst-case compression than the algorithm itself explains (without delivering corrupt files), but I had not thought about the possibility of it emitting extra metadata blocks other than at the start and end.

bmrzycki · 2019-08-23T14:45:00Z

@Dead2 you're welcome. :)

I manually patched c0f0acf with the code from PR #390 :

$ git diff
diff --git a/zlib/ng_snappy/src/arch/x86/deflate_quick.c b/zlib/ng_snappy/src/arch/x86/deflate_quick.c
index ce1f52f5..ac13a714 100644
--- a/zlib/ng_snappy/src/arch/x86/deflate_quick.c
+++ b/zlib/ng_snappy/src/arch/x86/deflate_quick.c
@@ -238,7 +238,7 @@ ZLIB_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
         if (s->lookahead < MIN_LOOKAHEAD) {
             fill_window_sse(s);
             if (s->lookahead < MIN_LOOKAHEAD && flush == Z_NO_FLUSH) {
-                static_emit_end_block(s, 0);
+                /* static_emit_end_block(s, 0); -- BMR test PR 390 */
                 return need_more;
             }
             if (s->lookahead == 0)

and I still see the same compress failed (0) message for fireworks.jpeg. It's unclear to me if this is a false path or merely an incomplete patch.

nmoinvaz · 2019-09-07T18:22:43Z

With fireworks.jpeg I am getting the following error using deflate level 1 and mem level 1:

chunkmemset should be called on larger chunks

When I comment out the following line, it works:

#define INFFAST_CHUNKSIZE sizeof(inffast_chunk_t)

nmoinvaz · 2019-09-09T16:59:28Z

@bmrzycki can you test my PR #396 and see if it solves the problem for you?

bmrzycki · 2019-09-11T16:28:21Z

Hello @nmoinvaz , I applied the code in #396 on top of 5d4d630 :

$ git diff
diff --git a/zlib/ng_snappy/src/inffast.c b/zlib/ng_snappy/src/inffast.c
index 62cb9513..fa8992f4 100644
--- a/zlib/ng_snappy/src/inffast.c
+++ b/zlib/ng_snappy/src/inffast.c
@@ -307,7 +307,9 @@ void ZLIB_INTERNAL zng_inflate_fast(PREFIX3(stream) *strm, unsigned long start)
                        operations can write beyond `out+len` so long as they
                        stay within 258 bytes of `out`.
                     */
-                    if (dist >= len || dist >= INFFAST_CHUNKSIZE)
+                    if (len < sizeof(uint64_t))
+                        out = chunkcopysafe(out, out - dist, len, safe);
+                    else if (dist >= len || dist >= INFFAST_CHUNKSIZE)
                         out = chunkcopy(out, out - dist, len);
                     else
                         out = chunkmemset(out, dist, len);
diff --git a/zlib/ng_snappy/src/inflate.c b/zlib/ng_snappy/src/inflate.c
index fae8847d..2c1dfbd2 100644
--- a/zlib/ng_snappy/src/inflate.c
+++ b/zlib/ng_snappy/src/inflate.c
@@ -978,7 +978,10 @@ int ZEXPORT PREFIX(inflate)(PREFIX3(stream) *strm, int flush) {
                 if (copy > left)
                     copy = left;
 #if defined(INFFAST_CHUNKSIZE)
-                put = chunkcopysafe(put, from, copy, put + left);
+               if (copy >= sizeof(uint64_t))
+                    put = chunkmemsetsafe(put, state->offset, copy, left);
+                else
+                    put = chunkcopysafe(put, put - state->offset, copy, put);
 #else
                 if (copy >= sizeof(uint64_t))
                     put = chunk_memcpy(put, from, copy);

And the test still failed on x86_64 (zlib wrapper):

compress failed (0)
fireworks.jpeg                           :   GO_ALL_SCORES_FILE ==> /work/b.rzycki/single-test/zlib/ng_snappy/b.rzycki_zlibng_zlib_bench.5682-1848-8150/score.out
  TIME_END ==> Wed Sep 11 11:14:57 CDT 2019

nmoinvaz · 2020-03-04T03:18:10Z

Worst case for deflate_quick is no matches found in entire data and max of 9 bits for each literal.

max_output = ((input_size + 17) >> 3) + input_size + 18

That is, 1 bit extra for each byte + 3 bits for block start + 10 bits for block end + 7 bits for round up, converted to bytes + the input size + 18 bytes for gzip header and footer.

Here is a text file that produces this worse case result:
nomatch.txt
nomatch.bin.gz

Buffer size must be about 1.1367 * input_size. I don't think there is anything wrong with the deflate_quick algorithm. It is just emitting the distance and literal codes as it sees them without creating a huffman tree.

nmoinvaz · 2020-03-14T03:32:17Z

@sebpop, @bmrzycki can you please retest your use cases against develop branch?

gdevenyi · 2020-03-15T17:28:31Z

@neurolabusc is still indicating this triggers a segfault in pigz at least: afni/afni#138 (comment)

nmoinvaz · 2020-03-15T21:29:46Z

I think that might be a separate issue fixed in #541. This change made in develop was to fix compressBound calculation.

bmrzycki · 2020-03-24T16:11:28Z

Hello @nmoinvaz. I am unable to perform this testing now. I'm at a new company and do not have formal clearance from management to contribute to this project yet. In the meantime you should be able to test it yourself on any x86_64 Linux host. The code diff to fix the errors is from #284 (comment) on top of the Chromium zlib benchmark C++ code: https://github.com/chromium/chromium/blob/master/third_party/zlib/contrib/bench/zlib_bench.cc . Without the patch we see the failure.

Linking to zlib.a is a simple compile. The zlib_bench is a standalone C++ program:

$CC -O3 -Wall -std=c++11 -c zlib_bench.cc
$CC zlib_bench.o /path/to/zlib-ng/develop/zlib.a -o zlib_bench

The source files for compression/decompression come from Google's snappy testdata directory: https://github.com/google/snappy/tree/master/testdata .

You can execute the test with the following:

zlib_bench gzip --compression 1 fireworks.jpeg

nmoinvaz · 2020-03-24T16:55:10Z

Thanks @sebpop. I have tested it on x86 myself but I was looking for independent verification since this issue has been open a long time. Congrats on your new position! Hopefully they will let you contribute.

nmoinvaz · 2020-04-17T01:33:41Z

This should be resolved now that #541 has been merged.

sebpop changed the title ~~make sure chromium-benchmark works at compression level 1~~ x86_64 deflate_quick fails on chromium-benchmark fireworks.jpeg Jan 17, 2019

Dead2 added the bug label Jan 21, 2019

nmoinvaz mentioned this issue Sep 7, 2019

Fixed chunkmemset should be called on larger chunks assert #396

Closed

Dead2 mentioned this issue Oct 7, 2019

Is zlib-ng "production ready"? #90

Closed

nmoinvaz mentioned this issue Mar 6, 2020

Fixed compressBound calculation for quick deflate strategy. #548

Merged

gdevenyi mentioned this issue Mar 13, 2020

check that pigz is working afni/afni#138

Open

neurolabusc mentioned this issue Mar 24, 2020

pigz compiled with zlib-ng generates errors with level=1 #536

Closed

nmoinvaz closed this as completed May 3, 2020

dio mentioned this issue Aug 5, 2020

compressor: Allow to optionally use zlib-ng envoyproxy/envoy#12408

Merged

nmoinvaz mentioned this issue Dec 14, 2021

Fix deflateBound and compressBound returning very small size estimates. #1071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86_64 deflate_quick fails on chromium-benchmark fireworks.jpeg #284

x86_64 deflate_quick fails on chromium-benchmark fireworks.jpeg #284

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

mtl1979 commented Jan 17, 2019

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

mtl1979 commented Jan 18, 2019 •

edited

Loading

bmrzycki commented Jan 18, 2019

bmrzycki commented Jan 25, 2019

sebpop commented Jan 25, 2019

bmrzycki commented Jan 25, 2019

Myriachan commented Apr 20, 2019

bmrzycki commented Apr 22, 2019

Dead2 commented Aug 21, 2019

bmrzycki commented Aug 22, 2019

Dead2 commented Aug 23, 2019

bmrzycki commented Aug 23, 2019

nmoinvaz commented Sep 7, 2019 •

edited

Loading

nmoinvaz commented Sep 9, 2019

bmrzycki commented Sep 11, 2019

nmoinvaz commented Mar 4, 2020 •

edited

Loading

nmoinvaz commented Mar 14, 2020

gdevenyi commented Mar 15, 2020

nmoinvaz commented Mar 15, 2020

bmrzycki commented Mar 24, 2020 •

edited

Loading

nmoinvaz commented Mar 24, 2020

nmoinvaz commented Apr 17, 2020 •

edited

Loading

x86_64 deflate_quick fails on chromium-benchmark fireworks.jpeg #284

x86_64 deflate_quick fails on chromium-benchmark fireworks.jpeg #284

Comments

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

mtl1979 commented Jan 17, 2019

sebpop commented Jan 17, 2019

sebpop commented Jan 17, 2019

mtl1979 commented Jan 18, 2019 • edited Loading

bmrzycki commented Jan 18, 2019

bmrzycki commented Jan 25, 2019

sebpop commented Jan 25, 2019

bmrzycki commented Jan 25, 2019

Myriachan commented Apr 20, 2019

bmrzycki commented Apr 22, 2019

Dead2 commented Aug 21, 2019

bmrzycki commented Aug 22, 2019

Dead2 commented Aug 23, 2019

bmrzycki commented Aug 23, 2019

nmoinvaz commented Sep 7, 2019 • edited Loading

nmoinvaz commented Sep 9, 2019

bmrzycki commented Sep 11, 2019

nmoinvaz commented Mar 4, 2020 • edited Loading

nmoinvaz commented Mar 14, 2020

gdevenyi commented Mar 15, 2020

nmoinvaz commented Mar 15, 2020

bmrzycki commented Mar 24, 2020 • edited Loading

nmoinvaz commented Mar 24, 2020

nmoinvaz commented Apr 17, 2020 • edited Loading

mtl1979 commented Jan 18, 2019 •

edited

Loading

nmoinvaz commented Sep 7, 2019 •

edited

Loading

nmoinvaz commented Mar 4, 2020 •

edited

Loading

bmrzycki commented Mar 24, 2020 •

edited

Loading

nmoinvaz commented Apr 17, 2020 •

edited

Loading