Fix deflateBound and compressBound returning very small size estimates. #1071

Dead2 · 2021-12-13T23:24:27Z

Fixes deflateBound and compressBound returning very small size estimates.
Remove workaround in switchlevels.c, so we do actual testing of this.

I also made a few of the magic numbers more understandable by using defines,
I would have liked to do so with the rest too.

I reasoned that the + 13 comes from compressBound, and thus includes zlib header size (6 bytes).
That seems to be the reason why it was +13 - 6 in deflateBound, since we do not always want to use the zlib header type there.
The remaining 7 bytes I am unsure about, raw deflate block headers? I was unable to find any documentation saying more than 2-3 bytes for that though, so I left that as a 7.

Also not sure why the ((sourceLen + 13 + 7) >> 3) is used for deflate_quick, it kind of looks like those additions should have been outside of the bitshifting, but without any comments those are just magic numbers. They also fail to add enough when the buffer size is 1 byte for example, thus the reason for adding the +1.

With these changes, the buffer size ends up being a lot bigger, but several of the changes we have made have affected the worst case compressed size. Deflate_quick is a big one for example, possibly we could investigate why level 1 becomes bigger than level 0 for random data, but I suspect it has to do with skipping extra checks for raw speed.
Also since hash_bits is now a define and not set depending on wsize, the check on hash_bits no longer checks wsize, and thus we need to be more pessimistic. (unless we add window_bits to the state struct so we can verify that is actually 15)

I also considered using the DFLTCC calculation, but that thing is calculating around 2x the size of the input buffer. Is that a mistake, or does the hardware compression worst case actually become that big? @iii-i

Resolves the problem reported in #1039

codecov · 2021-12-13T23:25:12Z

Codecov Report

Merging #1071 (65d8e43) into develop (3b2d34a) will decrease coverage by 0.00%.
The diff coverage is 87.50%.

❗ Current head 65d8e43 differs from pull request most recent head 7547522. Consider uploading reports for the commit 7547522 to get more accurate results

@@             Coverage Diff             @@
##           develop    #1071      +/-   ##
===========================================
- Coverage    81.48%   81.47%   -0.01%     
===========================================
  Files           87       87              
  Lines         8838     8839       +1     
  Branches      1425     1426       +1     
===========================================
  Hits          7202     7202              
  Misses        1079     1079              
- Partials       557      558       +1

Flag	Coverage Δ
macos_clang	`69.47% <100.00%> (ø)`
macos_gcc	`68.01% <100.00%> (ø)`
ubuntu_clang	`74.24% <100.00%> (ø)`
ubuntu_clang_debug	`74.39% <83.33%> (ø)`
ubuntu_clang_inflate_allow_invalid_dist	`74.10% <100.00%> (ø)`
ubuntu_clang_inflate_strict	`74.15% <100.00%> (ø)`
ubuntu_clang_mmap	`74.21% <100.00%> (ø)`
ubuntu_clang_msan	`74.24% <100.00%> (ø)`
ubuntu_clang_pigz	`35.12% <0.00%> (ø)`
ubuntu_clang_pigz_no_optim	`39.62% <0.00%> (ø)`
ubuntu_clang_pigz_no_threads	`34.63% <0.00%> (ø)`
ubuntu_clang_reduced_mem	`74.38% <100.00%> (ø)`
ubuntu_gcc	`73.66% <100.00%> (ø)`
ubuntu_gcc_aarch64	`75.53% <100.00%> (ø)`
ubuntu_gcc_aarch64_compat_no_opt	`73.23% <100.00%> (ø)`
ubuntu_gcc_aarch64_no_acle	`74.98% <100.00%> (ø)`
ubuntu_gcc_aarch64_no_neon	`75.09% <100.00%> (ø)`
ubuntu_gcc_armhf	`75.51% <100.00%> (ø)`
ubuntu_gcc_armhf_compat_no_opt	`73.19% <100.00%> (ø)`
ubuntu_gcc_armhf_no_acle	`75.78% <100.00%> (ø)`
ubuntu_gcc_armhf_no_neon	`75.86% <100.00%> (ø)`
ubuntu_gcc_armsf	`75.52% <100.00%> (ø)`
ubuntu_gcc_armsf_compat_no_opt	`73.19% <100.00%> (ø)`
ubuntu_gcc_compat_no_opt	`74.18% <100.00%> (ø)`
ubuntu_gcc_compat_sprefix	`73.55% <100.00%> (ø)`
ubuntu_gcc_mingw_i686	`0.00% <0.00%> (ø)`
ubuntu_gcc_mingw_x86_64	`0.00% <0.00%> (ø)`
ubuntu_gcc_no_avx2	`75.70% <100.00%> (ø)`
ubuntu_gcc_no_ctz	`76.22% <100.00%> (ø)`
ubuntu_gcc_no_ctzll	`75.90% <100.00%> (ø)`
ubuntu_gcc_no_pclmulqdq	`72.33% <100.00%> (ø)`
ubuntu_gcc_no_sse2	`74.74% <100.00%> (ø)`
ubuntu_gcc_no_sse4	`72.66% <100.00%> (ø)`
ubuntu_gcc_o3	`72.87% <100.00%> (ø)`
ubuntu_gcc_osb	`75.91% <83.33%> (ø)`
ubuntu_gcc_pigz	`35.27% <0.00%> (-0.03%)`	⬇️
ubuntu_gcc_pigz_aarch64	`37.12% <0.00%> (+0.06%)`	⬆️
ubuntu_gcc_ppc	`73.25% <100.00%> (ø)`
ubuntu_gcc_ppc64	`76.65% <100.00%> (ø)`
ubuntu_gcc_ppc64le	`75.13% <100.00%> (ø)`
ubuntu_gcc_ppc_no_power8	`75.94% <100.00%> (ø)`
ubuntu_gcc_s390x	`78.06% <100.00%> (ø)`
ubuntu_gcc_sparc64	`77.20% <100.00%> (ø)`
ubuntu_gcc_sprefix	`73.35% <100.00%> (ø)`
win64_gcc	`∅ <ø> (∅)`
win64_gcc_compat_no_opt	`∅ <ø> (∅)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
deflate.c	`74.89% <80.00%> (+0.03%)`	⬆️
compress.c	`91.89% <100.00%> (ø)`
test/switchlevels.c	`65.00% <100.00%> (ø)`
trees.c	`96.17% <0.00%> (-0.28%)`	⬇️
trees_emit.h	`100.00% <0.00%> (ø)`
test/example.c	`81.84% <0.00%> (ø)`
test/infcover.c	`90.58% <0.00%> (ø)`
test/minideflate.c	`59.34% <0.00%> (ø)`
test/fuzz/checksum_fuzzer.c	`94.23% <0.00%> (ø)`
test/fuzz/compress_fuzzer.c	`94.87% <0.00%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b2d34a...7547522. Read the comment docs.

nmoinvaz · 2021-12-14T00:29:17Z

I think these changes look okay. I don't remember the original reason behind the 13 + 7. All I could find was this comment of mine: #284 (comment).

Dead2 · 2021-12-14T00:44:41Z

@nmoinvaz Interesting. Do you agree with my second commit?
I was not quite sure about the 7 bits for rounding however. As that rounds it up to 20 bits, and I would have expected 16 or 24 bits..

nmoinvaz

Looks like you were able to make sense of it. The >> 3 does the bit to byte conversion so I had trouble reading it. +7 is to round to nearest byte I assume.

nmoinvaz · 2021-12-14T00:46:52Z

You may be correct about the number for rounding up. Maybe I had use the number for rounding up to nearest byte. Perhaps if it was 16 or 24 then we wouldn't need that extra 1 byte added at the end.

Dead2 · 2021-12-14T00:57:59Z

I tested the theory, and indeed, changing the 7 bits to 11 bits (totaling 24bytes) removes the need for that extra byte.
I amended the second commit with that change.

nmoinvaz

Looks great!

nmoinvaz · 2021-12-14T01:01:22Z

compress.c

-       plus the size of block & gzip headers and footers */
-    return sourceLen + ((sourceLen + 13 + 7) >> 3) + 18;
+    /* Quick deflate strategy worst case is 9 bits per literal, rounded to nearest byte,
+       plus the size of block headers, padding and wrappers */


The comment is a bit confusing.

Come to think of it, perhaps the 7 bits was not for the deflate blocks but rather to make sure the bits conversion of sourceLen is rounded up to the next byte. I think that is the case actually, and that should stay as a 7, but that the deflate block header bits also needs to be rounded up to the next byte. I'll amend the commit with that change.

And that still passes the test 😄

I split up the calculation a little, so it is easier to read, this way it also matches the comment a little better.

iii-i · 2021-12-14T11:06:58Z

@Dead2 according to our hardware folks, in degenerate cases this can happen. Fortunately on e.g. squash-benchmark we haven't seen anything that atrocious.

EDIT: Just from the description, shouldn't DEFLATE_PADBITS be 7? In case the last bit of End-of-Block Symbol ends up in a new byte.

Dead2 · 2021-12-14T13:06:31Z

@iii-i Thanks for confirming the hardware bounds.

I am not sure it should not be 7, but I don't quite understand what you are saying. Could you explain your thinking?

My thinking is that 3 + 10 = 13, and thus needs another 3 bits to fill up two bytes.
The 3 bits for block header is well understood, but I am a little unsure still where the 10 bits come from. RFC1951 does not seem to explain any such static overhead.

I also think the current non-deflate_quick calculation is a bit pessimistic, but better to err on the safe side.

iii-i · 2021-12-14T13:24:45Z

At the end of each deflate block there is an end-of-block symbol. I guess that 10 comes from the assumption that the software implementation can never generate an EOBS longer than 10 bits? The format allows it to be as long as 15, if I'm reading "3.2.7. Compression with dynamic Huffman codes (BTYPE=10)" correctly. 15 is also used in DFLTCC calculations (DFLTCC_MAX_EOBS_BITS).

So, when we add padding to the byte boundary at the end, the header size is actually irrelevant. The way I see the worst case then is: EOBS is 15 bits long and starts at byte X bit 2. Then, the last EOBS bit is at byte X + 2 bit 0. In order to get to the byte boundary, we need to skip the remaining 7 bits of byte X + 2.

Dead2 · 2021-12-14T13:46:10Z

Current code for reference:

    /* Quick deflate strategy worst case is 9 bits per literal, rounded to nearest byte (+7 bits),
       plus the size of deflate block headers and zlib/gzip wrapper */
    return sourceLen + ((sourceLen + 7) >> 3) + ((DEFLATE_WRAPBITS + DEFLATE_PADBITS) >> 3) + wraplen;

@iii-i Hmm. Is that worst case valid, considering we already added 7 bits to the compressed data just in case it ends before a byte boundary?

The way I am reading this is we need 3 bits block header, compressed data, 7 bits padding for compressed data, 15bits of EOBS.
So, 3-bit block header is unpadded, and 15-bit EOBS is unpadded. Would that not mean it needs another 6 bits of padding?

If so, that would look something like:

DEFLATE_HDR_BITS = 3
DEFLATE_EOBS_BITS = 15
DEFLATE_PAD_BITS = 6  // These can be added together in the header file, making DEFLATE_BLOCK_OVERHEAD or something

    return sourceLen            // The source size itself
    + ((sourceLen + 7) >> 3)    // Source encoding overhad, padded to next full byte
    + ((DEFLATE_HDR_BITS + DEFLATE_EOBS_BITS + DEFLATE_PAD_BITS) >> 3)    // Deflate block overhead
    + wraplen;                  // zlib or gzip wrapper

I might still be wrong, what do you think?
I hate magic numbers, so if we can get this somewhat accurate that would be great. (accurate in documentation and concept, not necessarily as in the total calculation being accurate)

Dead2 · 2021-12-14T15:20:18Z

Looks like we might have a too small size in some cases after all, I hadn't noticed the failed CI fuzzer.

==25==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000e69 at pc 0x00000052472a bp 0x7ffc583a8fa0 sp 0x7ffc583a8768
WRITE of size 1 at 0x603000000e69 thread T0
SCARINESS: 31 (1-byte-write-heap-buffer-overflow)
    #0 0x524729 in __asan_memcpy /src/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3
    #1 0x569613 in flush_pending /src/zlib-ng/deflate.c:677:5
    #2 0x564a90 in zng_deflate /src/zlib-ng/deflate.c:718:9
    #3 0x55ea74 in test_deflate /src/zlib-ng/test/fuzz/example_small_fuzzer.c:54:15

Maybe that is related to the 10 vs 15 bits and padding.

The above suggestion would increase the estimate by 8 bits, and I think that would fix this problem at least.

iii-i · 2021-12-14T15:37:03Z

Ah, you are right, we have already added 7. I just tried to come up with a formula on my own, and that's what I ended up with:

#define DEFLATE_HDR_BITS 3
#define DEFLATE_QUICK_LIT_MAX_BITS 9
#define DEFLATE_EOBS_MAX_BITS 15

sourceLen + ((DEFLATE_HDR_BITS + sourceLen * (DEFLATE_QUICK_LIT_MAX_BITS - 8) + DEFLATE_EOBS_MAX_BITS + 7) >> 3) + ZLIB_WRAPLEN

The initial sourceLen appears to be an alternative for having to multiply it by 9 in parentheses - presumably to reduce the chance of an overflow (I think it can still happen with the right inputs, especially on a 32-bit system).

However, the only difference with the original formula now is that the result is 5 bits larger due to DEFLATE_EOBS_MAX_BITS.

The difference with your approach is that you shift and pad compressed literals and deflate block overhead separately. Why does that help?

Dead2 · 2021-12-14T16:04:55Z

The reason I split it up was to make it easier to read.
It also makes sure the 7 bits added are dedicated to rounding up the sourcelen and is not affected by the deflate-block overhead bits missing any of its required padding bits, by making it clear that the block part needs to have its own padding handled separately.

Therefore HDRBITS (3) + EOBS (15) = 18 bits + padding (6) = 24 bits = 3bytes for the block overhead

Updated pseudo-code to be much more readable:

#define DEFLATE_HEADER_BITS 3
#define DEFLATE_EOBS_BITS 15
#define DEFLATE_PAD_BITS 6
#define DEFLATE_BLOCK_OVERHEAD = (DEFLATE_HEADER_BITS + DEFLATE_EOBS_BITS + DEFLATE_PAD_BITS) >> 3)

#define DEFLATE_QUICK_LIT_MAX_BITS = 9
#define DEFLATE_QUICK_OVERHEAD(x) = ((x * (DEFLATE_QUICK_LIT_MAX_BITS - 8) + 7) >> 3)

    return sourceLen                       // The source size itself
      + DEFLATE_QUICK_OVERHEAD(sourceLen)  // Source encoding overhad, padded to next full byte
      + DEFLATE_BLOCK_OVERHEAD             // Deflate block overhead bytes
      + wraplen;                           // zlib or gzip wrapper

Would a 32bit overflow really be feasible though? Wouldn't that require allocating a ~3641MiB or thereabouts input buffer? And then you would need an output buffer as well. Hmm, but the calculation is perhaps done on signed ints.

nmoinvaz · 2021-12-14T16:32:20Z

The format allows it to be as long as 15, if I'm reading "3.2.7. Compression with dynamic Huffman codes (BTYPE=10)" correctly. 15 is also used in DFLTCC calculations (DFLTCC_MAX_EOBS_BITS).

This calculation is mainly used for deflate quick which doesn't use BTYPE=10.

The value 7 comes from zng_tr_emit_end_block which is used by deflate_quick.

nmoinvaz · 2021-12-14T16:42:47Z

We may not be able to accurately calculate the exact number of bits/bytes. Back in the day there was only one deflate block that deflate quick used. But now due to other issues we had to have more than one deflate block. See QUICK_START_BLOCK and QUICK_END_BLOCK.

Dead2 · 2021-12-14T18:10:27Z

@nmoinvaz Right, there is that too.
The only thing we need to do is make sure we have a value that is good enough to catch the worst-case, preferably without a huge amount of extra bloat that never gets used.

Suggestions for changes to the last pseudo-code I posted?

nmoinvaz · 2021-12-14T19:06:53Z

I think that looks great. Otherwise overhead is misspelled

Dead2 · 2021-12-19T15:24:38Z

I made a little test that tests a bunch of different buffer sizes.
For the deflate_quick mthod; Up to about 32 bytes, the new method calculates a tighter buffer than the old method, at 64 bytes the new one is bigger than the old one, in my tests the old one fails at 128 bytes and above. The new one succeeds all tests up to 4MiB and shows no sign of failing higher than that.

zutil.h

Remove workaround in switchlevels.c, so we do actual testing of this. Use named defines instead of magic numbers where we can.

- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075

- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075

- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075

Dead2 added the bug label Dec 13, 2021

Dead2 requested review from iii-i and nmoinvaz December 13, 2021 23:27

nmoinvaz approved these changes Dec 14, 2021

View reviewed changes

Dead2 force-pushed the deflatebound branch from 911e640 to 6b1dcce Compare December 14, 2021 00:56

nmoinvaz approved these changes Dec 14, 2021

View reviewed changes

nmoinvaz reviewed Dec 14, 2021

View reviewed changes

Dead2 force-pushed the deflatebound branch 2 times, most recently from 855c712 to 65d8e43 Compare December 14, 2021 01:14

Dead2 mentioned this pull request Dec 14, 2021

Tag version 2.0.6 #1068

Merged

Dead2 force-pushed the deflatebound branch 3 times, most recently from 59afb8f to 904f665 Compare December 19, 2021 15:14

mtl1979 reviewed Dec 19, 2021

View reviewed changes

zutil.h Outdated Show resolved Hide resolved

Fix deflateBound and compressBound returning very small size estimates.

7547522

Remove workaround in switchlevels.c, so we do actual testing of this. Use named defines instead of magic numbers where we can.

Dead2 force-pushed the deflatebound branch from 904f665 to 7547522 Compare December 20, 2021 12:57

Dead2 merged commit 70f608c into develop Dec 20, 2021

Dead2 mentioned this pull request Dec 21, 2021

Fix deflateBound for HASH_BITS > 15 #1039

Closed

Dead2 deleted the deflatebound branch December 24, 2021 12:04

Dead2 mentioned this pull request Apr 5, 2022

deflateBound() returns a value that is too small #1235

Closed

al13n321 mentioned this pull request May 27, 2024

Update zlib-ng from 2.0.2 to 2.1.7 ClickHouse/ClickHouse#64489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deflateBound and compressBound returning very small size estimates. #1071

Fix deflateBound and compressBound returning very small size estimates. #1071

Dead2 commented Dec 13, 2021 •

edited

codecov bot commented Dec 13, 2021 •

edited

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 14, 2021

nmoinvaz left a comment •

edited

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 14, 2021

nmoinvaz left a comment

nmoinvaz Dec 14, 2021

Dead2 Dec 14, 2021

Dead2 Dec 14, 2021

Dead2 Dec 14, 2021

iii-i commented Dec 14, 2021 •

edited

Dead2 commented Dec 14, 2021

iii-i commented Dec 14, 2021 •

edited

Dead2 commented Dec 14, 2021

Dead2 commented Dec 14, 2021

iii-i commented Dec 14, 2021 •

edited

Dead2 commented Dec 14, 2021

nmoinvaz commented Dec 14, 2021

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 14, 2021

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 19, 2021

Fix deflateBound and compressBound returning very small size estimates. #1071

Fix deflateBound and compressBound returning very small size estimates. #1071

Conversation

Dead2 commented Dec 13, 2021 • edited

codecov bot commented Dec 13, 2021 • edited

Codecov Report

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 14, 2021

nmoinvaz left a comment • edited

Choose a reason for hiding this comment

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 14, 2021

nmoinvaz left a comment

Choose a reason for hiding this comment

nmoinvaz Dec 14, 2021

Choose a reason for hiding this comment

Dead2 Dec 14, 2021

Choose a reason for hiding this comment

Dead2 Dec 14, 2021

Choose a reason for hiding this comment

Dead2 Dec 14, 2021

Choose a reason for hiding this comment

iii-i commented Dec 14, 2021 • edited

Dead2 commented Dec 14, 2021

iii-i commented Dec 14, 2021 • edited

Dead2 commented Dec 14, 2021

Dead2 commented Dec 14, 2021

iii-i commented Dec 14, 2021 • edited

Dead2 commented Dec 14, 2021

nmoinvaz commented Dec 14, 2021

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 14, 2021

nmoinvaz commented Dec 14, 2021

Dead2 commented Dec 19, 2021

Dead2 commented Dec 13, 2021 •

edited

codecov bot commented Dec 13, 2021 •

edited

nmoinvaz left a comment •

edited

iii-i commented Dec 14, 2021 •

edited

iii-i commented Dec 14, 2021 •

edited

iii-i commented Dec 14, 2021 •

edited