-
-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deflateBound and compressBound returning very small size estimates. #1071
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1071 +/- ##
===========================================
- Coverage 81.48% 81.47% -0.01%
===========================================
Files 87 87
Lines 8838 8839 +1
Branches 1425 1426 +1
===========================================
Hits 7202 7202
Misses 1079 1079
- Partials 557 558 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I think these changes look okay. I don't remember the original reason behind the |
@nmoinvaz Interesting. Do you agree with my second commit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you were able to make sense of it. The >> 3
does the bit to byte conversion so I had trouble reading it. +7
is to round to nearest byte I assume.
You may be correct about the number for rounding up. Maybe I had use the number for rounding up to nearest byte. Perhaps if it was 16 or 24 then we wouldn't need that extra 1 byte added at the end. |
I tested the theory, and indeed, changing the 7 bits to 11 bits (totaling 24bytes) removes the need for that extra byte. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
compress.c
Outdated
plus the size of block & gzip headers and footers */ | ||
return sourceLen + ((sourceLen + 13 + 7) >> 3) + 18; | ||
/* Quick deflate strategy worst case is 9 bits per literal, rounded to nearest byte, | ||
plus the size of block headers, padding and wrappers */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, perhaps the 7 bits was not for the deflate blocks but rather to make sure the bits conversion of sourceLen is rounded up to the next byte. I think that is the case actually, and that should stay as a 7, but that the deflate block header bits also needs to be rounded up to the next byte. I'll amend the commit with that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that still passes the test 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split up the calculation a little, so it is easier to read, this way it also matches the comment a little better.
855c712
to
65d8e43
Compare
@Dead2 according to our hardware folks, in degenerate cases this can happen. Fortunately on e.g. squash-benchmark we haven't seen anything that atrocious. EDIT: Just from the description, shouldn't |
@iii-i Thanks for confirming the hardware bounds. I am not sure it should not be 7, but I don't quite understand what you are saying. Could you explain your thinking? My thinking is that 3 + 10 = 13, and thus needs another 3 bits to fill up two bytes. I also think the current non-deflate_quick calculation is a bit pessimistic, but better to err on the safe side. |
At the end of each deflate block there is an end-of-block symbol. I guess that 10 comes from the assumption that the software implementation can never generate an EOBS longer than 10 bits? The format allows it to be as long as 15, if I'm reading "3.2.7. Compression with dynamic Huffman codes (BTYPE=10)" correctly. 15 is also used in DFLTCC calculations ( So, when we add padding to the byte boundary at the end, the header size is actually irrelevant. The way I see the worst case then is: EOBS is 15 bits long and starts at byte X bit 2. Then, the last EOBS bit is at byte X + 2 bit 0. In order to get to the byte boundary, we need to skip the remaining 7 bits of byte X + 2. |
Current code for reference:
@iii-i Hmm. Is that worst case valid, considering we already added 7 bits to the compressed data just in case it ends before a byte boundary? The way I am reading this is we need 3 bits block header, compressed data, 7 bits padding for compressed data, 15bits of EOBS. If so, that would look something like:
I might still be wrong, what do you think? |
Looks like we might have a too small size in some cases after all, I hadn't noticed the failed CI fuzzer.
Maybe that is related to the 10 vs 15 bits and padding. The above suggestion would increase the estimate by 8 bits, and I think that would fix this problem at least. |
Ah, you are right, we have already added 7. I just tried to come up with a formula on my own, and that's what I ended up with:
The initial However, the only difference with the original formula now is that the result is 5 bits larger due to The difference with your approach is that you shift and pad compressed literals and deflate block overhead separately. Why does that help? |
The reason I split it up was to make it easier to read. Therefore HDRBITS (3) + EOBS (15) = 18 bits + padding (6) = 24 bits = 3bytes for the block overhead Updated pseudo-code to be much more readable:
Would a 32bit overflow really be feasible though? Wouldn't that require allocating a ~3641MiB or thereabouts input buffer? And then you would need an output buffer as well. Hmm, but the calculation is perhaps done on signed ints. |
This calculation is mainly used for deflate quick which doesn't use The value 7 comes from |
We may not be able to accurately calculate the exact number of bits/bytes. Back in the day there was only one deflate block that deflate quick used. But now due to other issues we had to have more than one deflate block. See |
@nmoinvaz Right, there is that too. Suggestions for changes to the last pseudo-code I posted? |
I think that looks great. Otherwise overhead is misspelled |
59afb8f
to
904f665
Compare
I made a little test that tests a bunch of different buffer sizes. |
Remove workaround in switchlevels.c, so we do actual testing of this. Use named defines instead of magic numbers where we can.
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
Fixes deflateBound and compressBound returning very small size estimates.
Remove workaround in switchlevels.c, so we do actual testing of this.
I also made a few of the magic numbers more understandable by using defines,
I would have liked to do so with the rest too.
I reasoned that the
+ 13
comes from compressBound, and thus includes zlib header size (6 bytes).That seems to be the reason why it was
+13 - 6
in deflateBound, since we do not always want to use the zlib header type there.The remaining 7 bytes I am unsure about, raw deflate block headers? I was unable to find any documentation saying more than 2-3 bytes for that though, so I left that as a 7.
Also not sure why the
((sourceLen + 13 + 7) >> 3)
is used for deflate_quick, it kind of looks like those additions should have been outside of the bitshifting, but without any comments those are just magic numbers. They also fail to add enough when the buffer size is 1 byte for example, thus the reason for adding the+1
.With these changes, the buffer size ends up being a lot bigger, but several of the changes we have made have affected the worst case compressed size. Deflate_quick is a big one for example, possibly we could investigate why level 1 becomes bigger than level 0 for random data, but I suspect it has to do with skipping extra checks for raw speed.
Also since hash_bits is now a define and not set depending on wsize, the check on hash_bits no longer checks wsize, and thus we need to be more pessimistic. (unless we add window_bits to the state struct so we can verify that is actually 15)
I also considered using the DFLTCC calculation, but that thing is calculating around 2x the size of the input buffer. Is that a mistake, or does the hardware compression worst case actually become that big? @iii-i
Resolves the problem reported in #1039