Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed unsigned integer overflow ASAN error when hash_head > s->strstart. #772

Merged
merged 1 commit into from
Oct 18, 2020

Conversation

nmoinvaz
Copy link
Member

  zlib-ng/deflate_medium.c:244:47: runtime error: unsigned integer overflow: 58442 - 58452 cannot be represented in type 'unsigned int'

@codecov
Copy link

codecov bot commented Sep 23, 2020

Codecov Report

Merging #772 into develop will increase coverage by 2.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #772      +/-   ##
===========================================
+ Coverage    73.40%   75.42%   +2.02%     
===========================================
  Files          127       70      -57     
  Lines        13801     7688    -6113     
  Branches      2525     1335    -1190     
===========================================
- Hits         10131     5799    -4332     
+ Misses        2629     1355    -1274     
+ Partials      1041      534     -507     
Flag Coverage Δ
#macos_clang 63.74% <54.54%> (-0.03%) ⬇️
#macos_gcc 63.74% <54.54%> (-0.03%) ⬇️
#ubuntu_clang 68.85% <100.00%> (-1.13%) ⬇️
#ubuntu_clang_debug 68.32% <100.00%> (-1.11%) ⬇️
#ubuntu_clang_inflate_allow_invalid_dist 68.50% <100.00%> (-1.22%) ⬇️
#ubuntu_clang_inflate_strict 68.77% <100.00%> (-1.23%) ⬇️
#ubuntu_clang_mmap 68.54% <100.00%> (-1.19%) ⬇️
#ubuntu_clang_msan 68.49% <100.00%> (-1.23%) ⬇️
#ubuntu_gcc 72.63% <100.00%> (+0.95%) ⬆️
#ubuntu_gcc_aarch64 ?
#ubuntu_gcc_aarch64_compat_no_opt ?
#ubuntu_gcc_aarch64_no_acle ?
#ubuntu_gcc_aarch64_no_neon ?
#ubuntu_gcc_armhf 72.00% <100.00%> (+0.29%) ⬆️
#ubuntu_gcc_armhf_compat_no_opt 70.58% <57.14%> (+0.26%) ⬆️
#ubuntu_gcc_armhf_no_acle 71.01% <100.00%> (+0.30%) ⬆️
#ubuntu_gcc_armhf_no_neon 71.32% <100.00%> (+0.30%) ⬆️
#ubuntu_gcc_armsf ?
#ubuntu_gcc_armsf_compat_no_opt ?
#ubuntu_gcc_compat_no_opt 70.56% <57.14%> (?)
#ubuntu_gcc_mingw_i686 69.21% <100.00%> (-0.99%) ⬇️
#ubuntu_gcc_mingw_x86_64 69.27% <100.00%> (-1.03%) ⬇️
#ubuntu_gcc_no_avx2 71.90% <100.00%> (+0.26%) ⬆️
#ubuntu_gcc_no_pclmulqdq 68.85% <100.00%> (-1.05%) ⬇️
#ubuntu_gcc_no_sse2 71.20% <100.00%> (+0.26%) ⬆️
#ubuntu_gcc_no_sse4 ?
#ubuntu_gcc_o3 69.83% <100.00%> (-1.01%) ⬇️
#ubuntu_gcc_osb 70.14% <100.00%> (-1.06%) ⬇️
#ubuntu_gcc_ppc 71.92% <100.00%> (+0.30%) ⬆️
#ubuntu_gcc_ppc64 71.15% <100.00%> (+0.30%) ⬆️
#ubuntu_gcc_ppc64le 70.55% <100.00%> (+0.31%) ⬆️
#ubuntu_gcc_s390x 70.02% <100.00%> (+0.51%) ⬆️
#ubuntu_gcc_sparc64 67.91% <100.00%> (-3.36%) ⬇️
#win64_gcc 73.19% <100.00%> (-0.07%) ⬇️
#win64_gcc_compat_no_opt 74.72% <57.14%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
deflate_fast.c 92.68% <100.00%> (+0.18%) ⬆️
deflate_medium.c 85.06% <100.00%> (-1.12%) ⬇️
deflate_quick.c 95.23% <100.00%> (ø)
deflate_slow.c 98.30% <100.00%> (+0.02%) ⬆️
chunkset.c 93.33% <0.00%> (-6.67%) ⬇️
chunkset_tpl.h 96.07% <0.00%> (-2.95%) ⬇️
test/example.c 76.13% <0.00%> (-0.38%) ⬇️
gzread.c 54.60% <0.00%> (-0.32%) ⬇️
gzlib.c 54.68% <0.00%> (-0.21%) ⬇️
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b55680b...78770a4. Read the comment docs.

@nmoinvaz
Copy link
Member Author

I have added a second commit which is an alternative way to fix it. It also includes some changes to deflate_quick which additionally checks for a return value of 0 from quick_insert_string. I don't think this case was covered previously.

@nmoinvaz
Copy link
Member Author

Rebased

@nmoinvaz
Copy link
Member Author

Rebased again.

@nmoinvaz
Copy link
Member Author

Rebased

@Dead2
Copy link
Member

Dead2 commented Sep 25, 2020

Baseline

   text    data     bss     dec     hex filename
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.297/1.317/1.326/0.008        0.872/0.888/0.899/0.007      127,296,397
 2     43.874%      1.252/1.267/1.273/0.006        0.450/0.463/0.470/0.005       48,329,132
 3     42.490%      1.317/1.330/1.336/0.005        0.375/0.384/0.389/0.004       40,118,068
 4     41.472%      1.280/1.296/1.303/0.005        0.298/0.311/0.318/0.005       32,630,864
 5     41.212%      1.390/1.405/1.412/0.005        0.290/0.311/0.317/0.006       32,425,995
 6     41.037%      1.318/1.329/1.334/0.004        0.238/0.247/0.252/0.004       25,831,160
 7     40.784%      1.277/1.291/1.297/0.005        0.174/0.184/0.190/0.004       19,253,737
 8     40.706%      1.098/1.108/1.113/0.004        0.108/0.121/0.127/0.005       12,811,166
 9     40.695%      1.220/1.229/1.234/0.004        0.114/0.121/0.127/0.003       12,807,941

 avg1  43.339%                        1.286                          0.337
 tot                                347.147                         90.919      351,504,460

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.302/1.319/1.331/0.009        0.869/0.888/0.896/0.007      127,296,397
 2     43.874%      1.246/1.266/1.273/0.007        0.450/0.462/0.467/0.005       48,329,132
 3     42.490%      1.320/1.333/1.339/0.005        0.376/0.387/0.393/0.005       40,118,068
 4     41.472%      1.285/1.300/1.304/0.004        0.295/0.312/0.318/0.007       32,630,864
 5     41.212%      1.392/1.404/1.412/0.006        0.297/0.309/0.316/0.005       32,425,995
 6     41.037%      1.322/1.331/1.337/0.004        0.235/0.248/0.252/0.005       25,831,160
 7     40.784%      1.275/1.291/1.296/0.004        0.167/0.183/0.187/0.005       19,253,737
 8     40.706%      1.093/1.109/1.113/0.005        0.117/0.122/0.124/0.003       12,811,166
 9     40.695%      1.222/1.230/1.235/0.003        0.114/0.121/0.127/0.003       12,807,941

 avg1  43.339%                        1.287                          0.337
 tot                                347.522                         90.931      351,504,460

PR #772 322f7d7

   text    data     bss     dec     hex filename
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.780%      1.394/1.421/1.430/0.008        0.864/0.883/0.894/0.008      127,294,824
 2     43.874%      1.247/1.263/1.270/0.006        0.444/0.463/0.470/0.005       48,329,132
 3     42.490%      1.317/1.328/1.334/0.005        0.373/0.383/0.389/0.004       40,118,068
 4     41.472%      1.286/1.298/1.303/0.004        0.292/0.314/0.321/0.006       32,630,870
 5     41.212%      1.386/1.404/1.410/0.006        0.303/0.310/0.316/0.004       32,426,061
 6     41.038%      1.312/1.328/1.335/0.007        0.229/0.248/0.252/0.005       25,831,174
 7     40.784%      1.285/1.295/1.299/0.004        0.174/0.183/0.190/0.004       19,253,737
 8     40.706%      1.101/1.110/1.114/0.003        0.114/0.122/0.124/0.003       12,811,166
 9     40.695%      1.215/1.231/1.235/0.005        0.100/0.120/0.124/0.005       12,807,941

 avg1  43.339%                        1.298                          0.336
 tot                                350.344                         90.771      351,502,973

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.780%      1.404/1.424/1.434/0.008        0.861/0.878/0.891/0.009      127,294,824
 2     43.874%      1.248/1.263/1.271/0.006        0.450/0.461/0.470/0.005       48,329,132
 3     42.490%      1.322/1.330/1.335/0.004        0.369/0.386/0.392/0.005       40,118,068
 4     41.472%      1.286/1.297/1.304/0.005        0.291/0.313/0.318/0.006       32,630,870
 5     41.212%      1.391/1.404/1.411/0.006        0.290/0.308/0.313/0.006       32,426,061
 6     41.038%      1.315/1.329/1.336/0.006        0.229/0.247/0.252/0.005       25,831,174
 7     40.784%      1.276/1.292/1.296/0.005        0.164/0.183/0.187/0.005       19,253,737
 8     40.706%      1.099/1.111/1.115/0.004        0.107/0.121/0.127/0.004       12,811,166
 9     40.695%      1.222/1.232/1.235/0.004        0.110/0.121/0.127/0.005       12,807,941

 avg1  43.339%                        1.298                          0.335
 tot                                350.449                         90.536      351,502,973

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.780%      1.408/1.426/1.434/0.007        0.870/0.886/0.893/0.006      127,294,824
 2     43.874%      1.255/1.264/1.269/0.004        0.441/0.464/0.470/0.006       48,329,132
 3     42.490%      1.314/1.327/1.333/0.005        0.369/0.387/0.392/0.006       40,118,068
 4     41.472%      1.278/1.292/1.298/0.005        0.302/0.313/0.318/0.005       32,630,870
 5     41.212%      1.392/1.403/1.410/0.005        0.293/0.311/0.316/0.005       32,426,061
 6     41.038%      1.317/1.327/1.333/0.004        0.232/0.248/0.255/0.007       25,831,174
 7     40.784%      1.271/1.289/1.294/0.005        0.174/0.185/0.187/0.004       19,253,737
 8     40.706%      1.099/1.109/1.113/0.003        0.110/0.121/0.124/0.003       12,811,166
 9     40.695%      1.217/1.229/1.234/0.005        0.114/0.121/0.124/0.003       12,807,941

 avg1  43.339%                        1.296                          0.337
 tot                                349.984                         91.045      351,502,973

deflate_quick is ~7.7% slower, but compresses slightly better.
deflate_medium compresses slightly worse.

I do wish some in-line comments were added to better describe what is going on and why the change.
As I see it, there are three logic-changes here but it is very difficult to review them.

Could you either add in-line comments if relevant (helpful for future readers too), or add a comment describing the changes in the PR/commit text?

@mtl1979
Copy link
Collaborator

mtl1979 commented Sep 25, 2020

Just to compare the effects, it could be wise to change the check also on

if (LIKELY(head != idx)) {
as it still updates the hash tables even though nothing is returned...

@Dead2
Copy link
Member

Dead2 commented Sep 25, 2020

@mtl1979 With the check in insert_string changed to match, the compressed sizes remained the same as with this PR.
Benchmarks are running, but I doubt it will show any change significant enough to measure reliably.

@Dead2
Copy link
Member

Dead2 commented Sep 25, 2020

322f7d7 + insert_string check fix

   text    data     bss     dec     hex filename
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.780%      1.407/1.426/1.435/0.007        0.870/0.887/0.894/0.007      127,294,824
 2     43.874%      1.252/1.263/1.269/0.004        0.444/0.461/0.470/0.007       48,329,132
 3     42.490%      1.300/1.327/1.334/0.008        0.376/0.385/0.389/0.004       40,118,068
 4     41.472%      1.281/1.294/1.298/0.004        0.305/0.313/0.318/0.005       32,630,870
 5     41.212%      1.391/1.401/1.407/0.005        0.296/0.309/0.316/0.005       32,426,061
 6     41.038%      1.315/1.329/1.335/0.006        0.235/0.247/0.252/0.005       25,831,174
 7     40.784%      1.286/1.292/1.297/0.003        0.174/0.183/0.187/0.004       19,253,737
 8     40.706%      1.105/1.111/1.115/0.003        0.113/0.121/0.124/0.003       12,811,166
 9     40.695%      1.215/1.229/1.234/0.004        0.113/0.121/0.127/0.004       12,807,941

 avg1  43.339%                        1.297                          0.336
 tot                                350.150                         90.824      351,502,973

I don't really see any significant changes.

@mtl1979
Copy link
Collaborator

mtl1979 commented Sep 27, 2020

@Dead2 I didn't expect big change as quick_insert_string is used anyways for each match... It was more about keeping the two functions as similar as possible without causing any regressions. Basically when sliding the window, small amount of hashes in the table need to be detached because first match after the slide is inserted logically backwards, before the byte sequence it is duplicate of.

@mtl1979
Copy link
Collaborator

mtl1979 commented Sep 27, 2020

Regression in deflate_medium might be because it doesn't update the hash table if the order of inserts is incorrect (which is likely)... It might help to break it to two separate tests, one for inequality test and another for comparing which one of head and stris higher for deciding return value.

if (LIKELY(head != str)) {
    s->prev[str & s->w_mask] = head;
    s->head[hm] = str;
}
if (UNLIKELY(head < str)) {
    return 0;
}

@nmoinvaz
Copy link
Member Author

nmoinvaz commented Oct 2, 2020

I think in that last statement it should be head > str.

@nmoinvaz
Copy link
Member Author

nmoinvaz commented Oct 2, 2020

deflate_quick is ~7.7% slower, but compresses slightly better.

Compresses slightly better might be due to the dist <= MAX_DIST(s) change.

@mtl1979
Copy link
Collaborator

mtl1979 commented Oct 2, 2020

@nmoinvaz I wasn't sure if comparing inside the inequality check is a good thing as when updating the hash chain, the old values already in the chain can be garbage after sliding the window, and the actual hash chain updating order in that case is B, A, C instead of A, B, C...

@nmoinvaz
Copy link
Member Author

nmoinvaz commented Oct 4, 2020

DEVELOP HEAD

 Tool: minigzip-head.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     47.621%      1.006/1.015/1.022/0.005        0.641/0.652/0.658/0.006      100,931,528
 2     35.523%      1.637/1.662/1.672/0.008        0.660/0.675/0.685/0.007       75,289,388
 3     34.202%      1.962/1.995/2.007/0.009        0.645/0.658/0.667/0.006       72,490,913
 4     32.935%      2.358/2.386/2.398/0.011        0.631/0.640/0.645/0.004       69,804,530
 5     32.667%      2.536/2.564/2.576/0.010        0.620/0.635/0.642/0.006       69,236,504
 6     32.515%      2.950/2.980/2.994/0.011        0.623/0.632/0.639/0.005       68,915,028
 7     32.257%      3.955/3.978/3.990/0.009        0.617/0.629/0.636/0.005       68,367,404
 8     32.172%      6.127/6.167/6.180/0.013        0.620/0.628/0.633/0.003       68,186,811
 9     32.156%      8.540/8.580/8.606/0.018        0.618/0.628/0.636/0.006       68,154,042

 avg1  34.672%                        3.481                          0.642
 tot                                939.869                        173.338      661,376,148


 Tool: minigzip-head.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     47.621%      0.989/1.003/1.011/0.006        0.640/0.655/0.661/0.005      100,931,528
 2     35.523%      1.647/1.661/1.669/0.006        0.657/0.670/0.679/0.006       75,289,388
 3     34.202%      1.973/1.989/2.002/0.009        0.641/0.656/0.663/0.006       72,490,913
 4     32.935%      2.369/2.389/2.402/0.009        0.625/0.638/0.644/0.005       69,804,530
 5     32.667%      2.538/2.558/2.567/0.008        0.624/0.633/0.641/0.005       69,236,504
 6     32.515%      2.951/2.974/2.983/0.008        0.617/0.631/0.637/0.006       68,915,028
 7     32.257%      3.947/3.973/3.985/0.010        0.618/0.627/0.632/0.005       68,367,404
 8     32.172%      6.144/6.170/6.183/0.012        0.617/0.627/0.633/0.004       68,186,811
 9     32.156%      8.555/8.585/8.598/0.012        0.616/0.625/0.632/0.005       68,154,042

 avg1  34.672%                        3.478                          0.640
 tot                                939.098                        172.814      661,376,148

7e86eae PR

 Tool: minigzip-asan-v4.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     47.621%      1.055/1.068/1.075/0.005        0.638/0.653/0.661/0.006      100,931,528
 2     35.523%      1.648/1.667/1.682/0.009        0.662/0.674/0.682/0.006       75,289,388
 3     34.202%      1.990/2.002/2.012/0.006        0.646/0.659/0.668/0.006       72,490,913
 4     32.935%      2.369/2.390/2.402/0.008        0.625/0.639/0.648/0.006       69,804,394
 5     32.667%      2.539/2.564/2.573/0.008        0.625/0.637/0.643/0.005       69,235,922
 6     32.515%      2.962/2.982/2.993/0.008        0.621/0.633/0.639/0.004       68,914,867
 7     32.257%      3.938/3.977/3.993/0.011        0.622/0.630/0.637/0.004       68,367,404
 8     32.172%      6.124/6.172/6.188/0.015        0.616/0.627/0.633/0.004       68,186,811
 9     32.156%      8.521/8.567/8.600/0.024        0.615/0.628/0.636/0.007       68,154,042

 avg1  34.672%                        3.488                          0.642
 tot                                941.693                        173.409      661,375,269

@nmoinvaz
Copy link
Member Author

nmoinvaz commented Oct 4, 2020

The slow down in deflate_quick was due to equality operator change < MAX_DIST(s) versus <= MAX_DIST(s), so I backed that out. I can reserve that for another PR. I think it only causes it to take longer because of the nature of the content being compressed.

deflate_quick.c Outdated Show resolved Hide resolved
@nmoinvaz
Copy link
Member Author

nmoinvaz commented Oct 4, 2020

There does appear to be a small increase in compression. In my other tests I was able to squeeze a bit more out if I returned prev in quick_insert_string in cases where it might have returned 0, but it consumed more time:

    head = s->head[hm];
    if (LIKELY(head != str)) {
        s->prev[str & s->w_mask] = head;
        s->head[hm] = str;
    } else {
        head = s->prev[str & s->w_mask];
    }
    if (UNLIKELY(head >= str)) {
        return 0;
    }
    return head;

@mtl1979
Copy link
Collaborator

mtl1979 commented Oct 4, 2020

@nmoinvaz It all comes to the question if the time spent is worth the extra compression rate... Like I said earlier, in some cases the hash indices can be out of order and in those cases returning 0 will split the hash chain...

I was thinking if the insert is out of order, we should just fix the prev table by updating entries for both indices (head and str)... that way C, A, B order would change so that prev of C is B and prev of B is A... essentially just needs one extra temporary variable.

if (UNLIKELY(head > str)) {
    Pos tmp = s->prev[head & s->w_mask];
    s->prev[head & s->w_mask] = str;
    s->prev[str & s->w_mask] = tmp;
    // s->head[hm] is not updated
}

@Dead2
Copy link
Member

Dead2 commented Oct 14, 2020

Baseline

   text    data     bss     dec     hex filename
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Tool: minigzip   Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.286/1.308/1.319/0.009        0.867/0.883/0.897/0.009      127,296,397
 2     43.874%      1.247/1.264/1.270/0.005        0.440/0.458/0.467/0.006       48,329,132
 3     42.490%      1.302/1.324/1.332/0.007        0.379/0.386/0.392/0.005       40,118,068
 4     41.472%      1.278/1.298/1.304/0.007        0.298/0.313/0.318/0.005       32,630,864
 5     41.212%      1.389/1.407/1.412/0.005        0.290/0.310/0.316/0.005       32,425,995
 6     41.037%      1.311/1.324/1.331/0.005        0.238/0.247/0.252/0.003       25,831,160
 7     40.784%      1.278/1.293/1.297/0.005        0.174/0.184/0.187/0.004       19,253,737
 8     40.706%      1.106/1.111/1.115/0.003        0.117/0.123/0.127/0.004       12,811,166
 9     40.695%      1.225/1.230/1.232/0.002        0.113/0.121/0.127/0.003       12,807,941

 avg1  43.339%                        1.284                          0.336
 tot                                346.743                         90.737      351,504,460

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.284/1.308/1.320/0.010        0.857/0.881/0.890/0.008      127,296,397
 2     43.874%      1.247/1.262/1.271/0.006        0.450/0.462/0.467/0.004       48,329,132
 3     42.490%      1.306/1.324/1.330/0.006        0.372/0.387/0.392/0.004       40,118,068
 4     41.472%      1.284/1.296/1.301/0.004        0.294/0.313/0.318/0.006       32,630,864
 5     41.212%      1.389/1.408/1.413/0.005        0.303/0.311/0.316/0.004       32,425,995
 6     41.037%      1.314/1.324/1.328/0.004        0.235/0.248/0.252/0.004       25,831,160
 7     40.784%      1.283/1.292/1.296/0.003        0.167/0.184/0.190/0.005       19,253,737
 8     40.706%      1.099/1.111/1.115/0.004        0.107/0.121/0.123/0.004       12,811,166
 9     40.695%      1.219/1.230/1.235/0.004        0.103/0.119/0.123/0.005       12,807,941

 avg1  43.339%                        1.284                          0.336
 tot                                346.617                         90.795      351,504,460

PR #772 7e86eae

   text    data     bss     dec     hex filename
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.430/1.452/1.463/0.009        0.874/0.888/0.897/0.006      127,296,398
 2     43.874%      1.256/1.277/1.283/0.006        0.444/0.463/0.467/0.005       48,329,132
 3     42.490%      1.322/1.336/1.342/0.005        0.373/0.385/0.392/0.006       40,118,068
 4     41.472%      1.286/1.301/1.308/0.006        0.298/0.313/0.318/0.004       32,630,864
 5     41.212%      1.397/1.413/1.417/0.004        0.303/0.312/0.316/0.004       32,425,995
 6     41.037%      1.317/1.327/1.332/0.005        0.235/0.248/0.252/0.004       25,831,160
 7     40.784%      1.280/1.294/1.300/0.005        0.174/0.183/0.187/0.004       19,253,737
 8     40.706%      1.107/1.114/1.117/0.003        0.114/0.122/0.127/0.003       12,811,166
 9     40.695%      1.220/1.232/1.237/0.004        0.113/0.121/0.123/0.003       12,807,941

 avg1  43.339%                        1.305                          0.337
 tot                                352.328                         91.023      351,504,461

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.427/1.449/1.460/0.009        0.864/0.888/0.897/0.010      127,296,398
 2     43.874%      1.270/1.279/1.286/0.005        0.450/0.463/0.470/0.005       48,329,132
 3     42.490%      1.326/1.336/1.342/0.004        0.373/0.387/0.392/0.005       40,118,068
 4     41.472%      1.295/1.301/1.306/0.003        0.305/0.314/0.318/0.005       32,630,864
 5     41.212%      1.399/1.411/1.417/0.005        0.303/0.313/0.319/0.004       32,425,995
 6     41.037%      1.322/1.329/1.335/0.004        0.235/0.246/0.252/0.004       25,831,160
 7     40.784%      1.277/1.294/1.300/0.006        0.170/0.183/0.187/0.005       19,253,737
 8     40.706%      1.097/1.112/1.117/0.004        0.113/0.120/0.123/0.003       12,811,166
 9     40.695%      1.220/1.233/1.237/0.004        0.110/0.121/0.127/0.004       12,807,941

 avg1  43.339%                        1.305                          0.337
 tot                                352.352                         91.079      351,504,461

Test

@@ -85,8 +85,9 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {

         if (LIKELY(s->lookahead >= MIN_MATCH)) {
             hash_head = functable.quick_insert_string(s, s->strstart);
+            dist = s->strstart - hash_head;

-            if (hash_head != 0 && (dist = s->strstart - hash_head) < MAX_DIST(s)) {
+            if (hash_head != 0 && dist < MAX_DIST(s)) {
                 match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);

                 if (match_len >= MIN_MATCH) {
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.420/1.438/1.449/0.008        0.864/0.888/0.897/0.007      127,296,398
 2     43.874%      1.261/1.276/1.283/0.005        0.447/0.460/0.467/0.005       48,329,132
 3     42.490%      1.321/1.333/1.341/0.005        0.359/0.387/0.392/0.008       40,118,068
 4     41.472%      1.286/1.298/1.303/0.005        0.291/0.311/0.318/0.006       32,630,864
 5     41.212%      1.397/1.411/1.415/0.005        0.296/0.309/0.316/0.006       32,425,995
 6     41.037%      1.311/1.326/1.333/0.006        0.238/0.248/0.255/0.005       25,831,160
 7     40.784%      1.288/1.297/1.301/0.004        0.174/0.184/0.187/0.003       19,253,737
 8     40.706%      1.098/1.114/1.118/0.005        0.113/0.121/0.124/0.003       12,811,166
 9     40.695%      1.224/1.233/1.237/0.003        0.114/0.121/0.123/0.003       12,807,941

 avg1  43.339%                        1.303                          0.336
 tot                                351.778                         90.848      351,504,461

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.427/1.440/1.447/0.006        0.860/0.882/0.890/0.009      127,296,398
 2     43.874%      1.255/1.275/1.282/0.006        0.450/0.460/0.467/0.006       48,329,132
 3     42.490%      1.322/1.337/1.342/0.006        0.372/0.383/0.389/0.005       40,118,068
 4     41.472%      1.283/1.297/1.303/0.005        0.298/0.313/0.318/0.005       32,630,864
 5     41.212%      1.399/1.411/1.417/0.005        0.300/0.312/0.316/0.004       32,425,995
 6     41.037%      1.311/1.325/1.330/0.005        0.238/0.248/0.252/0.004       25,831,160
 7     40.784%      1.288/1.296/1.301/0.004        0.170/0.183/0.187/0.004       19,253,737
 8     40.706%      1.107/1.113/1.118/0.003        0.113/0.121/0.123/0.003       12,811,166
 9     40.695%      1.224/1.233/1.238/0.003        0.113/0.122/0.127/0.004       12,807,941

 avg1  43.339%                        1.303                          0.336
 tot                                351.787                         90.764      351,504,461

So with this PR, deflate_quick is approx 10% slower, and on this dataset it also compresses 1 byte worse.
Is there any other way to fix this that we could investigate?

I included a test of how performance is affected when not doing the assignment inside the if statement, it helps a bit but not a whole lot.

@mtl1979
Copy link
Collaborator

mtl1979 commented Oct 14, 2020

It's possible hash_head is 0 quite often, so it might be faster to write the tests as:

            if (hash_head != 0) {
                dist = s->strstart - hash_head;
                if (dist < MAX_DIST(s)) {
                    match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);

                    if (match_len >= MIN_MATCH) {
...

@Dead2
Copy link
Member

Dead2 commented Oct 14, 2020

@mtl1979 I do not have the final numbers yet, but the first few test runs unfortunately do not look faster.

@mtl1979
Copy link
Collaborator

mtl1979 commented Oct 14, 2020

@Dead2 It's possible compiler is already reordering the statements as it knows the dist contents are not used outside second test...

@Dead2
Copy link
Member

Dead2 commented Oct 14, 2020

Test2

--- a/deflate_quick.c
+++ b/deflate_quick.c
@@ -86,7 +86,9 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
         if (LIKELY(s->lookahead >= MIN_MATCH)) {
             hash_head = functable.quick_insert_string(s, s->strstart);

+            if (hash_head != 0) {
-            if (hash_head != 0 && (dist = s->strstart - hash_head) < MAX_DIST(s)) {
+                dist = s->strstart - hash_head;
+                if (dist < MAX_DIST(s)) {
                     match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);

                     if (match_len >= MIN_MATCH) {
@@ -102,6 +104,7 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {
                     }
                 }
             }
+        }

         zng_tr_emit_lit(s, static_ltree, s->window[s->strstart]);
         s->strstart++;

(whitespace changes suppressed)

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.432/1.453/1.463/0.008        0.858/0.886/0.894/0.009      127,296,398
 2     43.874%      1.263/1.281/1.288/0.007        0.447/0.463/0.467/0.007       48,329,132
 3     42.490%      1.317/1.341/1.347/0.006        0.369/0.386/0.393/0.006       40,118,068
 4     41.472%      1.289/1.303/1.309/0.006        0.305/0.313/0.318/0.005       32,630,864
 5     41.212%      1.399/1.409/1.416/0.006        0.300/0.311/0.317/0.005       32,425,995
 6     41.037%      1.319/1.333/1.339/0.005        0.239/0.247/0.252/0.003       25,831,160
 7     40.784%      1.283/1.298/1.303/0.004        0.171/0.183/0.187/0.004       19,253,737
 8     40.706%      1.101/1.113/1.118/0.005        0.114/0.120/0.124/0.003       12,811,166
 9     40.695%      1.224/1.232/1.237/0.003        0.114/0.122/0.127/0.004       12,807,941

 avg1  43.339%                        1.307                          0.337
 tot                                352.877                         90.936      351,504,461

@Dead2
Copy link
Member

Dead2 commented Oct 14, 2020

It looks like me that checking hash_head right after it was set makes a HOL-blockage, and neither compiler nor speculative execution in the cpu is able to make any positive use of that deadtime.

@mtl1979
Copy link
Collaborator

mtl1979 commented Oct 14, 2020

@Dead2 There isn't much we can do about any blockage as big chunk of the code is run only if both tests pass... so basically we can only make sure there is no unnecessary register spills due to too many variables... That's why I previously suggested narrowing the variable scopes as much as possible...

@Dead2
Copy link
Member

Dead2 commented Oct 14, 2020

So far, this looks like the fastest alternative, for whatever reason..

@@ -85,8 +85,9 @@ Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) {

         if (LIKELY(s->lookahead >= MIN_MATCH)) {
             hash_head = functable.quick_insert_string(s, s->strstart);
+            dist = s->strstart - hash_head;

-            if (hash_head != 0 && (dist = s->strstart - hash_head) < MAX_DIST(s)) {
+            if (dist < MAX_DIST(s) && hash_head != 0) {
                 match_len = functable.compare258(s->window + s->strstart, s->window + hash_head);

                 if (match_len >= MIN_MATCH) {
   text    data     bss     dec     hex filename
 112457    1376      32  113865   1bcc9 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.333/1.356/1.366/0.009        0.867/0.882/0.894/0.008      127,296,398
 2     43.874%      1.260/1.278/1.284/0.007        0.444/0.460/0.467/0.006       48,329,132
 3     42.490%      1.331/1.340/1.345/0.004        0.372/0.386/0.392/0.005       40,118,068
 4     41.472%      1.277/1.297/1.304/0.006        0.298/0.311/0.318/0.004       32,630,864
 5     41.212%      1.400/1.409/1.415/0.004        0.303/0.310/0.313/0.003       32,425,995
 6     41.037%      1.327/1.334/1.339/0.003        0.242/0.248/0.252/0.004       25,831,160
 7     40.784%      1.284/1.297/1.302/0.004        0.167/0.182/0.187/0.005       19,253,737
 8     40.706%      1.103/1.114/1.118/0.004        0.114/0.121/0.127/0.003       12,811,166
 9     40.695%      1.222/1.234/1.239/0.004        0.111/0.121/0.127/0.004       12,807,941

 avg1  43.339%                        1.295                          0.336
 tot                                349.785                         90.667      351,504,461

@nmoinvaz
Copy link
Member Author

Has anybody tried only the first commit?

@mtl1979
Copy link
Collaborator

mtl1979 commented Oct 15, 2020

@Dead2 Like I said earlier, it all becomes down to which one of the choices is most likely and if the compiler and CPU chooses the same one... Sometimes it defies logic and that's why I often ask people to test different choices to see which one is best...

@Dead2
Copy link
Member

Dead2 commented Oct 16, 2020

Here is the test of the first commit. I would pull this no problem.

## PR #722 First commit only ba785ef5dcac2395dd1b262c29ce43ec7f7ca728
   text    data     bss     dec     hex filename
 112521    1376      32  113929   1bd09 libz-ng.so.1

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     57.781%      1.299/1.314/1.324/0.006        0.856/0.881/0.889/0.008      127,296,397
 2     43.874%      1.245/1.260/1.266/0.004        0.444/0.459/0.467/0.006       48,329,132
 3     42.490%      1.313/1.323/1.328/0.004        0.373/0.384/0.392/0.006       40,118,068
 4     41.472%      1.288/1.299/1.306/0.005        0.295/0.312/0.318/0.006       32,630,864
 5     41.212%      1.385/1.403/1.412/0.007        0.286/0.310/0.316/0.007       32,425,995
 6     41.037%      1.314/1.332/1.338/0.007        0.239/0.248/0.252/0.004       25,831,160
 7     40.784%      1.282/1.292/1.295/0.003        0.177/0.183/0.187/0.003       19,253,737
 8     40.706%      1.090/1.108/1.112/0.006        0.111/0.122/0.127/0.004       12,811,166
 9     40.695%      1.220/1.230/1.234/0.004        0.110/0.122/0.127/0.004       12,807,941

 avg1  43.339%                        1.285                          0.336
 tot                                346.818                         90.609      351,504,460

@nmoinvaz
Copy link
Member Author

nmoinvaz commented Oct 17, 2020

I made a few tweaks. Instead of:

hash_head != 0 && hash_head < s->strstart && s->strstart - hash_head <= MAX_DIST(s)

We can do:

int64_t dist;
(dist = (int64_t)s->strstart - hash_head) <= MAX_DIST(s) && dist > 0

This should also remove the asan warning because it casts to int64_t before doing subtraction. I think it is a also bit faster because it does the MAX_DIST check first which is where most will fail.

Here are the results (2 runs of each build):

HEAD

 Tool: minigzip-head.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      0.975/1.001/1.022/0.018       100,931,528
 2     35.523%      1.672/1.703/1.714/0.011        75,289,388
 3     34.202%      1.999/2.029/2.043/0.012        72,490,913
 4     32.935%      2.384/2.416/2.430/0.012        69,804,530
 5     32.667%      2.557/2.593/2.621/0.016        69,236,504
 6     32.515%      2.995/3.024/3.039/0.011        68,915,028
 7     32.257%      4.014/4.039/4.054/0.010        68,367,404
 8     32.172%      6.254/6.278/6.294/0.011        68,186,811
 9     32.156%      8.715/8.739/8.760/0.013        68,154,042

 avg1  34.672%                        3.536
 tot                                954.654                                     661,376,148

 Tool: minigzip-head.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      1.014/1.020/1.024/0.003       100,931,528
 2     35.523%      1.687/1.707/1.716/0.008        75,289,388
 3     34.202%      2.023/2.042/2.057/0.008        72,490,913
 4     32.935%      2.415/2.434/2.450/0.009        69,804,530
 5     32.667%      2.590/2.609/2.625/0.010        69,236,504
 6     32.515%      3.022/3.041/3.054/0.008        68,915,028
 7     32.257%      4.028/4.053/4.068/0.010        68,367,404
 8     32.172%      6.288/6.302/6.316/0.008        68,186,811
 9     32.156%      8.739/8.763/8.778/0.011        68,154,042

 avg1  34.672%                        3.552
 tot                                959.128                                     661,376,148

e4d560b


Tool: minigzip-asan-103.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      1.014/1.027/1.032/0.004       100,930,921
 2     35.523%      1.628/1.650/1.659/0.008        75,289,402
 3     34.202%      1.969/1.991/2.005/0.009        72,490,888
 4     32.935%      2.387/2.402/2.419/0.009        69,804,533
 5     32.667%      2.562/2.579/2.592/0.009        69,236,516
 6     32.515%      2.993/3.012/3.026/0.009        68,915,050
 7     32.257%      3.979/4.004/4.019/0.010        68,367,409
 8     32.172%      6.216/6.248/6.261/0.011        68,186,790
 9     32.156%      8.649/8.703/8.721/0.018        68,154,025

 avg1  34.672%                        3.513
 tot                                948.493                                     661,375,534

 Tool: minigzip-asan-103.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      1.027/1.033/1.036/0.002       100,930,921
 2     35.523%      1.640/1.654/1.662/0.006        75,289,402
 3     34.202%      1.982/2.001/2.018/0.011        72,490,888
 4     32.935%      2.396/2.412/2.428/0.009        69,804,533
 5     32.667%      2.565/2.591/2.605/0.010        69,236,516
 6     32.515%      3.000/3.019/3.040/0.010        68,915,050
 7     32.257%      3.996/4.021/4.033/0.010        68,367,409
 8     32.172%      6.248/6.269/6.280/0.009        68,186,790
 9     32.156%      8.661/8.723/8.746/0.021        68,154,025

 avg1  34.672%                        3.525
 tot                                951.668                                     661,375,534

@Dead2
Copy link
Member

Dead2 commented Oct 17, 2020

I also found that performance improvement yesterday evening, great to see you incorporated it.

This looks great, but now I think you should actually set dist outside of the if.

@nmoinvaz
Copy link
Member Author

Yes, I know I used your findings for the performance enhancement. 😀 I will run some tests and move the statement to the other line, like you said. It now is better like that.

  zlib-ng/deflate_medium.c:244:47: runtime error: unsigned integer overflow: 58442 - 58452 cannot be represented in type 'unsigned int'

Co-authored-by: Mika Lindqvist <postmaster@raasu.org>
Co-authored-by: Hans Kristian Rosbach <hk-git@circlestorm.org>
@nmoinvaz
Copy link
Member Author

HEAD

 Tool: minigzip-head.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      0.983/0.997/1.003/0.004       100,931,528
 2     35.523%      1.680/1.693/1.701/0.006        75,289,388
 3     34.202%      1.999/2.024/2.037/0.010        72,490,913
 4     32.935%      2.392/2.409/2.419/0.007        69,804,530
 5     32.667%      2.578/2.590/2.601/0.007        69,236,504
 6     32.515%      2.996/3.018/3.030/0.008        68,915,028
 7     32.257%      4.016/4.032/4.047/0.009        68,367,404
 8     32.172%      6.235/6.268/6.280/0.010        68,186,811
 9     32.156%      8.680/8.719/8.738/0.014        68,154,042

 avg1  34.672%                        3.528
 tot                                952.483                                     661,376,148

 Tool: minigzip-head.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      0.991/0.996/1.003/0.004       100,931,528
 2     35.523%      1.675/1.691/1.702/0.007        75,289,388
 3     34.202%      2.005/2.028/2.045/0.011        72,490,913
 4     32.935%      2.383/2.411/2.432/0.014        69,804,530
 5     32.667%      2.554/2.586/2.603/0.012        69,236,504
 6     32.515%      2.995/3.020/3.035/0.011        68,915,028
 7     32.257%      4.007/4.026/4.041/0.009        68,367,404
 8     32.172%      6.244/6.272/6.287/0.012        68,186,811
 9     32.156%      8.659/8.713/8.739/0.021        68,154,042

 avg1  34.672%                        3.527
 tot                                952.291                                     661,376,148

78770a4

 Tool: minigzip-asan-104.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      0.977/0.989/0.995/0.004       100,930,921
 2     35.523%      1.628/1.643/1.653/0.007        75,289,402
 3     34.202%      1.965/1.987/2.000/0.010        72,490,888
 4     32.935%      2.376/2.396/2.410/0.010        69,804,533
 5     32.667%      2.555/2.576/2.592/0.011        69,236,516
 6     32.515%      2.981/3.007/3.020/0.010        68,915,050
 7     32.257%      3.966/3.993/4.008/0.010        68,367,409
 8     32.172%      6.195/6.232/6.248/0.014        68,186,790
 9     32.156%      8.636/8.683/8.705/0.017        68,154,025

 avg1  34.672%                        3.501
 tot                                945.167                                     661,375,534

 Tool: minigzip-asan-104.exe
 Runs: 70
 Levels: 1-9
 Trimworst: 40

 Level   Comp   Comptime min/avg/max/stddev                          Compressed size
 1     47.621%      0.981/0.991/0.996/0.004       100,930,921
 2     35.523%      1.622/1.640/1.651/0.008        75,289,402
 3     34.202%      1.960/1.984/1.996/0.009        72,490,888
 4     32.935%      2.378/2.400/2.415/0.012        69,804,533
 5     32.667%      2.553/2.576/2.591/0.010        69,236,516
 6     32.515%      2.974/2.998/3.016/0.012        68,915,050
 7     32.257%      3.955/3.991/4.011/0.015        68,367,409
 8     32.172%      6.205/6.239/6.256/0.014        68,186,790
 9     32.156%      8.638/8.687/8.705/0.017        68,154,025

 avg1  34.672%                        3.501
 tot                                945.197                                     661,375,534

@Dead2 Dead2 merged commit bc5915e into zlib-ng:develop Oct 18, 2020
@nmoinvaz nmoinvaz deleted the fixes/asan-negative-dist branch October 18, 2020 15:58
@iii-i iii-i mentioned this pull request Mar 10, 2021
iii-i added a commit to iii-i/zlib-ng that referenced this pull request Mar 20, 2021
Commit bc5915e ("Fixed unsigned integer overflow ASAN error when
hash_head > s->strstart.") removed hash_head != 0 checks in fast,
medium and slow deflate, because it improved performance [1].

Unfortunately, the attached test started failing after that.
Apparently, as the comments suggest, the code implicitly relies on
matches with the beginning of the window being skipped. So restore the
check.

[1] zlib-ng#772 (comment)
iii-i added a commit to iii-i/zlib-ng that referenced this pull request Mar 20, 2021
Commit bc5915e ("Fixed unsigned integer overflow ASAN error when
hash_head > s->strstart.") removed hash_head != 0 checks in fast,
medium and slow deflate, because it improved performance [1].

Unfortunately, the attached test started failing after that.
Apparently, as the comments suggest, the code implicitly relies on
matches with the beginning of the window being skipped. So restore the
check.

[1] zlib-ng#772 (comment)
Dead2 pushed a commit that referenced this pull request Mar 20, 2021
Commit bc5915e ("Fixed unsigned integer overflow ASAN error when
hash_head > s->strstart.") removed hash_head != 0 checks in fast,
medium and slow deflate, because it improved performance [1].

Unfortunately, the attached test started failing after that.
Apparently, as the comments suggest, the code implicitly relies on
matches with the beginning of the window being skipped. So restore the
check.

[1] #772 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants