Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deflateBound and compressBound returning very small size estimates. #1071

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

Dead2
Copy link
Member

@Dead2 Dead2 commented Dec 13, 2021

Fixes deflateBound and compressBound returning very small size estimates.
Remove workaround in switchlevels.c, so we do actual testing of this.

I also made a few of the magic numbers more understandable by using defines,
I would have liked to do so with the rest too.

I reasoned that the + 13 comes from compressBound, and thus includes zlib header size (6 bytes).
That seems to be the reason why it was +13 - 6 in deflateBound, since we do not always want to use the zlib header type there.
The remaining 7 bytes I am unsure about, raw deflate block headers? I was unable to find any documentation saying more than 2-3 bytes for that though, so I left that as a 7.

Also not sure why the ((sourceLen + 13 + 7) >> 3) is used for deflate_quick, it kind of looks like those additions should have been outside of the bitshifting, but without any comments those are just magic numbers. They also fail to add enough when the buffer size is 1 byte for example, thus the reason for adding the +1.

With these changes, the buffer size ends up being a lot bigger, but several of the changes we have made have affected the worst case compressed size. Deflate_quick is a big one for example, possibly we could investigate why level 1 becomes bigger than level 0 for random data, but I suspect it has to do with skipping extra checks for raw speed.
Also since hash_bits is now a define and not set depending on wsize, the check on hash_bits no longer checks wsize, and thus we need to be more pessimistic. (unless we add window_bits to the state struct so we can verify that is actually 15)

I also considered using the DFLTCC calculation, but that thing is calculating around 2x the size of the input buffer. Is that a mistake, or does the hardware compression worst case actually become that big? @iii-i

Resolves the problem reported in #1039

@codecov
Copy link

codecov bot commented Dec 13, 2021

Codecov Report

Merging #1071 (65d8e43) into develop (3b2d34a) will decrease coverage by 0.00%.
The diff coverage is 87.50%.

❗ Current head 65d8e43 differs from pull request most recent head 7547522. Consider uploading reports for the commit 7547522 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1071      +/-   ##
===========================================
- Coverage    81.48%   81.47%   -0.01%     
===========================================
  Files           87       87              
  Lines         8838     8839       +1     
  Branches      1425     1426       +1     
===========================================
  Hits          7202     7202              
  Misses        1079     1079              
- Partials       557      558       +1     
Flag Coverage Δ
macos_clang 69.47% <100.00%> (ø)
macos_gcc 68.01% <100.00%> (ø)
ubuntu_clang 74.24% <100.00%> (ø)
ubuntu_clang_debug 74.39% <83.33%> (ø)
ubuntu_clang_inflate_allow_invalid_dist 74.10% <100.00%> (ø)
ubuntu_clang_inflate_strict 74.15% <100.00%> (ø)
ubuntu_clang_mmap 74.21% <100.00%> (ø)
ubuntu_clang_msan 74.24% <100.00%> (ø)
ubuntu_clang_pigz 35.12% <0.00%> (ø)
ubuntu_clang_pigz_no_optim 39.62% <0.00%> (ø)
ubuntu_clang_pigz_no_threads 34.63% <0.00%> (ø)
ubuntu_clang_reduced_mem 74.38% <100.00%> (ø)
ubuntu_gcc 73.66% <100.00%> (ø)
ubuntu_gcc_aarch64 75.53% <100.00%> (ø)
ubuntu_gcc_aarch64_compat_no_opt 73.23% <100.00%> (ø)
ubuntu_gcc_aarch64_no_acle 74.98% <100.00%> (ø)
ubuntu_gcc_aarch64_no_neon 75.09% <100.00%> (ø)
ubuntu_gcc_armhf 75.51% <100.00%> (ø)
ubuntu_gcc_armhf_compat_no_opt 73.19% <100.00%> (ø)
ubuntu_gcc_armhf_no_acle 75.78% <100.00%> (ø)
ubuntu_gcc_armhf_no_neon 75.86% <100.00%> (ø)
ubuntu_gcc_armsf 75.52% <100.00%> (ø)
ubuntu_gcc_armsf_compat_no_opt 73.19% <100.00%> (ø)
ubuntu_gcc_compat_no_opt 74.18% <100.00%> (ø)
ubuntu_gcc_compat_sprefix 73.55% <100.00%> (ø)
ubuntu_gcc_mingw_i686 0.00% <0.00%> (ø)
ubuntu_gcc_mingw_x86_64 0.00% <0.00%> (ø)
ubuntu_gcc_no_avx2 75.70% <100.00%> (ø)
ubuntu_gcc_no_ctz 76.22% <100.00%> (ø)
ubuntu_gcc_no_ctzll 75.90% <100.00%> (ø)
ubuntu_gcc_no_pclmulqdq 72.33% <100.00%> (ø)
ubuntu_gcc_no_sse2 74.74% <100.00%> (ø)
ubuntu_gcc_no_sse4 72.66% <100.00%> (ø)
ubuntu_gcc_o3 72.87% <100.00%> (ø)
ubuntu_gcc_osb 75.91% <83.33%> (ø)
ubuntu_gcc_pigz 35.27% <0.00%> (-0.03%) ⬇️
ubuntu_gcc_pigz_aarch64 37.12% <0.00%> (+0.06%) ⬆️
ubuntu_gcc_ppc 73.25% <100.00%> (ø)
ubuntu_gcc_ppc64 76.65% <100.00%> (ø)
ubuntu_gcc_ppc64le 75.13% <100.00%> (ø)
ubuntu_gcc_ppc_no_power8 75.94% <100.00%> (ø)
ubuntu_gcc_s390x 78.06% <100.00%> (ø)
ubuntu_gcc_sparc64 77.20% <100.00%> (ø)
ubuntu_gcc_sprefix 73.35% <100.00%> (ø)
win64_gcc ∅ <ø> (∅)
win64_gcc_compat_no_opt ∅ <ø> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
deflate.c 74.89% <80.00%> (+0.03%) ⬆️
compress.c 91.89% <100.00%> (ø)
test/switchlevels.c 65.00% <100.00%> (ø)
trees.c 96.17% <0.00%> (-0.28%) ⬇️
trees_emit.h 100.00% <0.00%> (ø)
test/example.c 81.84% <0.00%> (ø)
test/infcover.c 90.58% <0.00%> (ø)
test/minideflate.c 59.34% <0.00%> (ø)
test/fuzz/checksum_fuzzer.c 94.23% <0.00%> (ø)
test/fuzz/compress_fuzzer.c 94.87% <0.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b2d34a...7547522. Read the comment docs.

@Dead2 Dead2 added the bug label Dec 13, 2021
@Dead2 Dead2 requested review from iii-i and nmoinvaz December 13, 2021 23:27
@nmoinvaz
Copy link
Member

I think these changes look okay. I don't remember the original reason behind the 13 + 7. All I could find was this comment of mine: #284 (comment).

@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

@nmoinvaz Interesting. Do you agree with my second commit?
I was not quite sure about the 7 bits for rounding however. As that rounds it up to 20 bits, and I would have expected 16 or 24 bits..

Copy link
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you were able to make sense of it. The >> 3 does the bit to byte conversion so I had trouble reading it. +7 is to round to nearest byte I assume.

@nmoinvaz
Copy link
Member

You may be correct about the number for rounding up. Maybe I had use the number for rounding up to nearest byte. Perhaps if it was 16 or 24 then we wouldn't need that extra 1 byte added at the end.

@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

I tested the theory, and indeed, changing the 7 bits to 11 bits (totaling 24bytes) removes the need for that extra byte.
I amended the second commit with that change.

Copy link
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

compress.c Outdated
plus the size of block & gzip headers and footers */
return sourceLen + ((sourceLen + 13 + 7) >> 3) + 18;
/* Quick deflate strategy worst case is 9 bits per literal, rounded to nearest byte,
plus the size of block headers, padding and wrappers */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, perhaps the 7 bits was not for the deflate blocks but rather to make sure the bits conversion of sourceLen is rounded up to the next byte. I think that is the case actually, and that should stay as a 7, but that the deflate block header bits also needs to be rounded up to the next byte. I'll amend the commit with that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that still passes the test 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split up the calculation a little, so it is easier to read, this way it also matches the comment a little better.

@Dead2 Dead2 force-pushed the deflatebound branch 2 times, most recently from 855c712 to 65d8e43 Compare December 14, 2021 01:14
@iii-i
Copy link
Member

iii-i commented Dec 14, 2021

@Dead2 according to our hardware folks, in degenerate cases this can happen. Fortunately on e.g. squash-benchmark we haven't seen anything that atrocious.

EDIT: Just from the description, shouldn't DEFLATE_PADBITS be 7? In case the last bit of End-of-Block Symbol ends up in a new byte.

@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

@iii-i Thanks for confirming the hardware bounds.

I am not sure it should not be 7, but I don't quite understand what you are saying. Could you explain your thinking?

My thinking is that 3 + 10 = 13, and thus needs another 3 bits to fill up two bytes.
The 3 bits for block header is well understood, but I am a little unsure still where the 10 bits come from. RFC1951 does not seem to explain any such static overhead.

I also think the current non-deflate_quick calculation is a bit pessimistic, but better to err on the safe side.

@iii-i
Copy link
Member

iii-i commented Dec 14, 2021

At the end of each deflate block there is an end-of-block symbol. I guess that 10 comes from the assumption that the software implementation can never generate an EOBS longer than 10 bits? The format allows it to be as long as 15, if I'm reading "3.2.7. Compression with dynamic Huffman codes (BTYPE=10)" correctly. 15 is also used in DFLTCC calculations (DFLTCC_MAX_EOBS_BITS).

So, when we add padding to the byte boundary at the end, the header size is actually irrelevant. The way I see the worst case then is: EOBS is 15 bits long and starts at byte X bit 2. Then, the last EOBS bit is at byte X + 2 bit 0. In order to get to the byte boundary, we need to skip the remaining 7 bits of byte X + 2.

@Dead2 Dead2 mentioned this pull request Dec 14, 2021
@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

Current code for reference:

    /* Quick deflate strategy worst case is 9 bits per literal, rounded to nearest byte (+7 bits),
       plus the size of deflate block headers and zlib/gzip wrapper */
    return sourceLen + ((sourceLen + 7) >> 3) + ((DEFLATE_WRAPBITS + DEFLATE_PADBITS) >> 3) + wraplen;

@iii-i Hmm. Is that worst case valid, considering we already added 7 bits to the compressed data just in case it ends before a byte boundary?

The way I am reading this is we need 3 bits block header, compressed data, 7 bits padding for compressed data, 15bits of EOBS.
So, 3-bit block header is unpadded, and 15-bit EOBS is unpadded. Would that not mean it needs another 6 bits of padding?

If so, that would look something like:

DEFLATE_HDR_BITS = 3
DEFLATE_EOBS_BITS = 15
DEFLATE_PAD_BITS = 6  // These can be added together in the header file, making DEFLATE_BLOCK_OVERHEAD or something

    return sourceLen            // The source size itself
    + ((sourceLen + 7) >> 3)    // Source encoding overhad, padded to next full byte
    + ((DEFLATE_HDR_BITS + DEFLATE_EOBS_BITS + DEFLATE_PAD_BITS) >> 3)    // Deflate block overhead
    + wraplen;                  // zlib or gzip wrapper

I might still be wrong, what do you think?
I hate magic numbers, so if we can get this somewhat accurate that would be great. (accurate in documentation and concept, not necessarily as in the total calculation being accurate)

@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

Looks like we might have a too small size in some cases after all, I hadn't noticed the failed CI fuzzer.

==25==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000e69 at pc 0x00000052472a bp 0x7ffc583a8fa0 sp 0x7ffc583a8768
WRITE of size 1 at 0x603000000e69 thread T0
SCARINESS: 31 (1-byte-write-heap-buffer-overflow)
    #0 0x524729 in __asan_memcpy /src/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3
    #1 0x569613 in flush_pending /src/zlib-ng/deflate.c:677:5
    #2 0x564a90 in zng_deflate /src/zlib-ng/deflate.c:718:9
    #3 0x55ea74 in test_deflate /src/zlib-ng/test/fuzz/example_small_fuzzer.c:54:15

Maybe that is related to the 10 vs 15 bits and padding.

The above suggestion would increase the estimate by 8 bits, and I think that would fix this problem at least.

@iii-i
Copy link
Member

iii-i commented Dec 14, 2021

Ah, you are right, we have already added 7. I just tried to come up with a formula on my own, and that's what I ended up with:

#define DEFLATE_HDR_BITS 3
#define DEFLATE_QUICK_LIT_MAX_BITS 9
#define DEFLATE_EOBS_MAX_BITS 15

sourceLen + ((DEFLATE_HDR_BITS + sourceLen * (DEFLATE_QUICK_LIT_MAX_BITS - 8) + DEFLATE_EOBS_MAX_BITS + 7) >> 3) + ZLIB_WRAPLEN

The initial sourceLen appears to be an alternative for having to multiply it by 9 in parentheses - presumably to reduce the chance of an overflow (I think it can still happen with the right inputs, especially on a 32-bit system).

However, the only difference with the original formula now is that the result is 5 bits larger due to DEFLATE_EOBS_MAX_BITS.

The difference with your approach is that you shift and pad compressed literals and deflate block overhead separately. Why does that help?

@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

The reason I split it up was to make it easier to read.
It also makes sure the 7 bits added are dedicated to rounding up the sourcelen and is not affected by the deflate-block overhead bits missing any of its required padding bits, by making it clear that the block part needs to have its own padding handled separately.

Therefore HDRBITS (3) + EOBS (15) = 18 bits + padding (6) = 24 bits = 3bytes for the block overhead

Updated pseudo-code to be much more readable:

#define DEFLATE_HEADER_BITS 3
#define DEFLATE_EOBS_BITS 15
#define DEFLATE_PAD_BITS 6
#define DEFLATE_BLOCK_OVERHEAD = (DEFLATE_HEADER_BITS + DEFLATE_EOBS_BITS + DEFLATE_PAD_BITS) >> 3)

#define DEFLATE_QUICK_LIT_MAX_BITS = 9
#define DEFLATE_QUICK_OVERHEAD(x) = ((x * (DEFLATE_QUICK_LIT_MAX_BITS - 8) + 7) >> 3)

    return sourceLen                       // The source size itself
      + DEFLATE_QUICK_OVERHEAD(sourceLen)  // Source encoding overhad, padded to next full byte
      + DEFLATE_BLOCK_OVERHEAD             // Deflate block overhead bytes
      + wraplen;                           // zlib or gzip wrapper

Would a 32bit overflow really be feasible though? Wouldn't that require allocating a ~3641MiB or thereabouts input buffer? And then you would need an output buffer as well. Hmm, but the calculation is perhaps done on signed ints.

@nmoinvaz
Copy link
Member

The format allows it to be as long as 15, if I'm reading "3.2.7. Compression with dynamic Huffman codes (BTYPE=10)" correctly. 15 is also used in DFLTCC calculations (DFLTCC_MAX_EOBS_BITS).

This calculation is mainly used for deflate quick which doesn't use BTYPE=10.

The value 7 comes from zng_tr_emit_end_block which is used by deflate_quick.

@nmoinvaz
Copy link
Member

We may not be able to accurately calculate the exact number of bits/bytes. Back in the day there was only one deflate block that deflate quick used. But now due to other issues we had to have more than one deflate block. See QUICK_START_BLOCK and QUICK_END_BLOCK.

@Dead2
Copy link
Member Author

Dead2 commented Dec 14, 2021

@nmoinvaz Right, there is that too.
The only thing we need to do is make sure we have a value that is good enough to catch the worst-case, preferably without a huge amount of extra bloat that never gets used.

Suggestions for changes to the last pseudo-code I posted?

@nmoinvaz
Copy link
Member

I think that looks great. Otherwise overhead is misspelled

@Dead2 Dead2 force-pushed the deflatebound branch 3 times, most recently from 59afb8f to 904f665 Compare December 19, 2021 15:14
@Dead2
Copy link
Member Author

Dead2 commented Dec 19, 2021

I made a little test that tests a bunch of different buffer sizes.
For the deflate_quick mthod; Up to about 32 bytes, the new method calculates a tighter buffer than the old method, at 64 bytes the new one is bigger than the old one, in my tests the old one fails at 128 bytes and above. The new one succeeds all tests up to 4MiB and shows no sign of failing higher than that.

zutil.h Outdated Show resolved Hide resolved
Remove workaround in switchlevels.c, so we do actual testing of this.
Use named defines instead of magic numbers where we can.
Dead2 added a commit that referenced this pull request Dec 20, 2021
 - Fix hangs on macOS #1031
 - Fix minideflate write buffers being overwritten #1060
 - Fix deflateBound and compressBound returning too small size estimates #1071
 - Fix build problems when building outside of source dir #1049
 - Fix build problems on arm2-7 #1030
 - Fixed some compile warnings #1020 #1036 #1037 #1048
 - Improved posix memalign support #888
 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067
 - Improvements for integration into other projects #1022 #1042
 - Code style fixes #637 #1040 #1050 #1075
Dead2 added a commit that referenced this pull request Dec 20, 2021
 - Fix hangs on macOS #1031
 - Fix minideflate write buffers being overwritten #1060
 - Fix deflateBound and compressBound returning too small size estimates #1071
 - Fix build problems when building outside of source dir #1049
 - Fix build problems on arm2-7 #1030
 - Fixed some compile warnings #1020 #1036 #1037 #1048
 - Improved posix memalign support #888
 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067
 - Improvements for integration into other projects #1022 #1042
 - Code style fixes #637 #1040 #1050 #1075
@Dead2 Dead2 merged commit 70f608c into develop Dec 20, 2021
Dead2 added a commit that referenced this pull request Dec 20, 2021
 - Fix hangs on macOS #1031
 - Fix minideflate write buffers being overwritten #1060
 - Fix deflateBound and compressBound returning too small size estimates #1071
 - Fix build problems when building outside of source dir #1049
 - Fix build problems on arm2-7 #1030
 - Fixed some compile warnings #1020 #1036 #1037 #1048
 - Improved posix memalign support #888
 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067
 - Improvements for integration into other projects #1022 #1042
 - Code style fixes #637 #1040 #1050 #1075
Dead2 added a commit that referenced this pull request Dec 20, 2021
 - Fix hangs on macOS #1031
 - Fix minideflate write buffers being overwritten #1060
 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071
 - Fix incorrect function declaration warning #1080
 - Fix build problems when building outside of source dir #1049
 - Fix build problems on arm2-7 #1030
 - Fixed some compile warnings #1020 #1036 #1037 #1048
 - Improved posix memalign support #888
 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067
 - Improvements for integration into other projects #1022 #1042
 - Code style fixes #637 #1040 #1050 #1075
Dead2 added a commit that referenced this pull request Dec 20, 2021
 - Fix hangs on macOS #1031
 - Fix minideflate write buffers being overwritten #1060
 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071
 - Fix incorrect function declaration warning #1080
 - Fix build problems when building outside of source dir #1049
 - Fix build problems on arm2-7 #1030
 - Fixed some compile warnings #1020 #1036 #1037 #1048
 - Improved posix memalign support #888
 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079
 - Improvements for integration into other projects #1022 #1042
 - Code style fixes #637 #1040 #1050 #1075
Dead2 added a commit that referenced this pull request Dec 24, 2021
 - Fix hangs on macOS #1031
 - Fix minideflate write buffers being overwritten #1060
 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071
 - Fix incorrect function declaration warning #1080
 - Fix build problems when building outside of source dir #1049
 - Fix build problems on arm2-7 #1030
 - Fixed some compile warnings #1020 #1036 #1037 #1048
 - Improved posix memalign support #888
 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079
 - Improvements for integration into other projects #1022 #1042
 - Code style fixes #637 #1040 #1050 #1075
@Dead2 Dead2 deleted the deflatebound branch December 24, 2021 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants