-
-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed unsigned integer overflow ASAN error when hash_head > s->strstart. #772
Conversation
nmoinvaz
commented
Sep 23, 2020
Codecov Report
@@ Coverage Diff @@
## develop #772 +/- ##
===========================================
+ Coverage 73.40% 75.42% +2.02%
===========================================
Files 127 70 -57
Lines 13801 7688 -6113
Branches 2525 1335 -1190
===========================================
- Hits 10131 5799 -4332
+ Misses 2629 1355 -1274
+ Partials 1041 534 -507
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
43c02b5
to
cafee06
Compare
I have added a second commit which is an alternative way to fix it. It also includes some changes to |
cafee06
to
5d50538
Compare
Rebased |
5d50538
to
84455c6
Compare
Rebased again. |
84455c6
to
322f7d7
Compare
Rebased |
Baseline
PR #772 322f7d7
deflate_quick is ~7.7% slower, but compresses slightly better. I do wish some in-line comments were added to better describe what is going on and why the change. Could you either add in-line comments if relevant (helpful for future readers too), or add a comment describing the changes in the PR/commit text? |
Just to compare the effects, it could be wise to change the check also on Line 85 in 322f7d7
|
@mtl1979 With the check in insert_string changed to match, the compressed sizes remained the same as with this PR. |
322f7d7 + insert_string check fix
I don't really see any significant changes. |
@Dead2 I didn't expect big change as quick_insert_string is used anyways for each match... It was more about keeping the two functions as similar as possible without causing any regressions. Basically when sliding the window, small amount of hashes in the table need to be detached because first match after the slide is inserted logically backwards, before the byte sequence it is duplicate of. |
Regression in deflate_medium might be because it doesn't update the hash table if the order of inserts is incorrect (which is likely)... It might help to break it to two separate tests, one for inequality test and another for comparing which one of
|
I think in that last statement it should be |
Compresses slightly better might be due to the |
322f7d7
to
dd603a0
Compare
@nmoinvaz I wasn't sure if comparing inside the inequality check is a good thing as when updating the hash chain, the old values already in the chain can be garbage after sliding the window, and the actual hash chain updating order in that case is B, A, C instead of A, B, C... |
dd603a0
to
7e86eae
Compare
DEVELOP HEAD
7e86eae PR
|
The slow down in deflate_quick was due to equality operator change |
There does appear to be a small increase in compression. In my other tests I was able to squeeze a bit more out if I returned
|
@nmoinvaz It all comes to the question if the time spent is worth the extra compression rate... Like I said earlier, in some cases the hash indices can be out of order and in those cases returning 0 will split the hash chain... I was thinking if the insert is out of order, we should just fix the prev table by updating entries for both indices (head and str)... that way C, A, B order would change so that prev of C is B and prev of B is A... essentially just needs one extra temporary variable.
|
Baseline
PR #772 7e86eae
Test@@ -85,8 +85,9 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
if (LIKELY(s->lookahead >= MIN_MATCH)) {
hash_head = functable.quick_insert_string(s, s->strstart);
+ dist = s->strstart - hash_head;
- if (hash_head != 0 && (dist = s->strstart - hash_head) < MAX_DIST(s)) {
+ if (hash_head != 0 && dist < MAX_DIST(s)) {
match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);
if (match_len >= MIN_MATCH) {
So with this PR, deflate_quick is approx 10% slower, and on this dataset it also compresses 1 byte worse. I included a test of how performance is affected when not doing the assignment inside the if statement, it helps a bit but not a whole lot. |
It's possible
|
@mtl1979 I do not have the final numbers yet, but the first few test runs unfortunately do not look faster. |
@Dead2 It's possible compiler is already reordering the statements as it knows the |
Test2--- a/deflate_quick.c
+++ b/deflate_quick.c
@@ -86,7 +86,9 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
if (LIKELY(s->lookahead >= MIN_MATCH)) {
hash_head = functable.quick_insert_string(s, s->strstart);
+ if (hash_head != 0) {
- if (hash_head != 0 && (dist = s->strstart - hash_head) < MAX_DIST(s)) {
+ dist = s->strstart - hash_head;
+ if (dist < MAX_DIST(s)) {
match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);
if (match_len >= MIN_MATCH) {
@@ -102,6 +104,7 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
}
}
}
+ }
zng_tr_emit_lit(s, static_ltree, s->window[s->strstart]);
s->strstart++;
(whitespace changes suppressed)
|
It looks like me that checking hash_head right after it was set makes a HOL-blockage, and neither compiler nor speculative execution in the cpu is able to make any positive use of that deadtime. |
@Dead2 There isn't much we can do about any blockage as big chunk of the code is run only if both tests pass... so basically we can only make sure there is no unnecessary register spills due to too many variables... That's why I previously suggested narrowing the variable scopes as much as possible... |
So far, this looks like the fastest alternative, for whatever reason.. @@ -85,8 +85,9 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
if (LIKELY(s->lookahead >= MIN_MATCH)) {
hash_head = functable.quick_insert_string(s, s->strstart);
+ dist = s->strstart - hash_head;
- if (hash_head != 0 && (dist = s->strstart - hash_head) < MAX_DIST(s)) {
+ if (dist < MAX_DIST(s) && hash_head != 0) {
match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);
if (match_len >= MIN_MATCH) {
|
Has anybody tried only the first commit? |
@Dead2 Like I said earlier, it all becomes down to which one of the choices is most likely and if the compiler and CPU chooses the same one... Sometimes it defies logic and that's why I often ask people to test different choices to see which one is best... |
Here is the test of the first commit. I would pull this no problem.
|
7e86eae
to
e4d560b
Compare
I made a few tweaks. Instead of:
We can do:
This should also remove the asan warning because it casts to Here are the results (2 runs of each build): HEAD
e4d560b
|
I also found that performance improvement yesterday evening, great to see you incorporated it. This looks great, but now I think you should actually set dist outside of the if. |
Yes, I know I used your findings for the performance enhancement. 😀 I will run some tests and move the statement to the other line, like you said. It now is better like that. |
zlib-ng/deflate_medium.c:244:47: runtime error: unsigned integer overflow: 58442 - 58452 cannot be represented in type 'unsigned int' Co-authored-by: Mika Lindqvist <postmaster@raasu.org> Co-authored-by: Hans Kristian Rosbach <hk-git@circlestorm.org>
e4d560b
to
78770a4
Compare
HEAD
78770a4
|
Commit bc5915e ("Fixed unsigned integer overflow ASAN error when hash_head > s->strstart.") removed hash_head != 0 checks in fast, medium and slow deflate, because it improved performance [1]. Unfortunately, the attached test started failing after that. Apparently, as the comments suggest, the code implicitly relies on matches with the beginning of the window being skipped. So restore the check. [1] zlib-ng#772 (comment)
Commit bc5915e ("Fixed unsigned integer overflow ASAN error when hash_head > s->strstart.") removed hash_head != 0 checks in fast, medium and slow deflate, because it improved performance [1]. Unfortunately, the attached test started failing after that. Apparently, as the comments suggest, the code implicitly relies on matches with the beginning of the window being skipped. So restore the check. [1] zlib-ng#772 (comment)
Commit bc5915e ("Fixed unsigned integer overflow ASAN error when hash_head > s->strstart.") removed hash_head != 0 checks in fast, medium and slow deflate, because it improved performance [1]. Unfortunately, the attached test started failing after that. Apparently, as the comments suggest, the code implicitly relies on matches with the beginning of the window being skipped. So restore the check. [1] #772 (comment)