Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM Fixes... #1008

Merged
merged 3 commits into from
Jun 21, 2021
Merged

ARM Fixes... #1008

merged 3 commits into from
Jun 21, 2021

Conversation

mtl1979
Copy link
Collaborator

@mtl1979 mtl1979 commented Jun 18, 2021

  • Some optimizations for NEON to reduce code size
  • Don't call chunkcopy() from inside chunkcopy_safe() as it might write past safe length if len is not multiple of chunk size.

@codecov
Copy link

codecov bot commented Jun 18, 2021

Codecov Report

Merging #1008 (858ec3e) into develop (101653c) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1008      +/-   ##
===========================================
+ Coverage    77.93%   77.96%   +0.02%     
===========================================
  Files           77       77              
  Lines         8241     8249       +8     
  Branches      1336     1340       +4     
===========================================
+ Hits          6423     6431       +8     
  Misses        1290     1290              
  Partials       528      528              
Flag Coverage Δ
macos_clang 68.41% <100.00%> (+0.02%) ⬆️
macos_gcc 67.24% <100.00%> (+0.03%) ⬆️
ubuntu_clang 74.88% <100.00%> (+0.02%) ⬆️
ubuntu_clang_debug 75.20% <100.00%> (+0.02%) ⬆️
ubuntu_clang_inflate_allow_invalid_dist 74.62% <100.00%> (+0.02%) ⬆️
ubuntu_clang_inflate_strict 74.86% <100.00%> (+0.02%) ⬆️
ubuntu_clang_mmap 74.85% <100.00%> (+0.02%) ⬆️
ubuntu_clang_msan 74.88% <100.00%> (+0.02%) ⬆️
ubuntu_clang_pigz 34.95% <100.00%> (-0.25%) ⬇️
ubuntu_clang_pigz_no_optim 37.41% <100.00%> (-0.32%) ⬇️
ubuntu_clang_pigz_no_threads 34.60% <100.00%> (-0.25%) ⬇️
ubuntu_gcc 74.43% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_aarch64 74.72% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_aarch64_compat_no_opt 73.49% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_aarch64_no_acle 73.89% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_aarch64_no_neon 74.25% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_armhf 74.71% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_armhf_compat_no_opt 73.46% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_armhf_no_acle 74.90% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_armhf_no_neon 75.21% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_armsf 74.72% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_armsf_compat_no_opt 73.46% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_compat_no_opt 74.62% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_mingw_i686 0.00% <0.00%> (ø)
ubuntu_gcc_mingw_x86_64 0.00% <0.00%> (ø)
ubuntu_gcc_no_avx2 74.68% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_no_pclmulqdq 72.99% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_no_sse2 74.01% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_no_sse4 73.20% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_o3 74.35% <100.00%> (+0.03%) ⬆️
ubuntu_gcc_osb 77.40% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_pigz 35.25% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_pigz_aarch64 35.97% <100.00%> (?)
ubuntu_gcc_ppc 74.98% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_ppc64 75.83% <100.00%> (+0.01%) ⬆️
ubuntu_gcc_ppc64le 74.60% <100.00%> (+0.02%) ⬆️
ubuntu_gcc_s390x 73.52% <100.00%> (+0.01%) ⬆️
ubuntu_gcc_sparc64 76.19% <100.00%> (+0.01%) ⬆️
win64_gcc 70.06% <0.00%> (-0.10%) ⬇️
win64_gcc_compat_no_opt 73.60% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
arch/arm/chunkset_neon.c 100.00% <100.00%> (ø)
chunkset_tpl.h 99.10% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 101653c...858ec3e. Read the comment docs.

@nmoinvaz
Copy link
Member

It looks like it passes the pigz CI test I added for aarch64 which is great! My only comment is that the commit messages could be a bit better and explain the changes. Other than that it LGTM.

@nmoinvaz
Copy link
Member

Actually we may need benchmarks for x86 and aarch64.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jun 19, 2021

@nmoinvaz It's 3 am here and I've been awake since Wednesday ;)

@nmoinvaz
Copy link
Member

I can run some x86 benchmarks tonight.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jun 19, 2021

I noticed my compiler can also do 32 byte chunks on AArch64... Dunno how much faster that is than using 16 byte chunks...

I know 48 byte chunks caused some issues because that is odd register count, not even.

* There is no need to convert between unsigned and signed vector types. All relevant intrinsics have versions for all unsigned vector types.
* Using vdupq_n_u64 duplicates the unsigned 64-bit integer to two consecutive aligned memory locations in stack so compiler can use wider load instructions.
  All different-sized general-purpose registers overlay on ARM/AArch64, so any vector cast is no-op in assembly.
* chunkcopy() can read or write more than the safe length if the length is not multiple of chunk size.
@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jun 19, 2021

I made the individual commit messages a little more descriptive... I guess @Dead2 wants to be the second one to give feedback as I know he likes to play with ARM devices ;)

@nmoinvaz
Copy link
Member

Corpora.tar

ZLIB-NG 834e7d8

 Tool: minigzip-ng-develop.exe Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.809%      1.054/1.071/1.081/0.007        0.668/0.685/0.693/0.006      104,067,822
 2     35.391%      1.731/1.754/1.763/0.008        0.685/0.695/0.699/0.004       82,195,416
 3     33.698%      2.376/2.395/2.404/0.007        0.658/0.673/0.678/0.004       78,262,436
 4     32.989%      2.689/2.703/2.712/0.007        0.644/0.659/0.664/0.004       76,616,183
 5     32.483%      2.818/2.858/2.869/0.012        0.643/0.654/0.658/0.003       75,441,984
 6     32.318%      3.465/3.486/3.496/0.008        0.642/0.650/0.656/0.004       75,058,801
 7     32.061%      4.731/4.760/4.773/0.011        0.639/0.651/0.656/0.005       74,461,300
 8     31.978%      7.218/7.309/7.327/0.025        0.644/0.653/0.657/0.003       74,269,032
 9     31.961%      9.862/9.932/9.953/0.019        0.647/0.659/0.666/0.005       74,230,342

 avg1  34.188%                        4.030                          0.664
 tot                               1088.034                        179.321      714,603,316

ZLIB-NG PR 1008 0025f87

 Tool: minigzip-ng-chunkset-neon.exe Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.809%      1.057/1.067/1.075/0.006        0.674/0.681/0.686/0.003      104,067,822
 2     35.391%      1.727/1.747/1.757/0.008        0.681/0.691/0.695/0.003       82,195,416
 3     33.698%      2.360/2.395/2.409/0.011        0.656/0.665/0.672/0.004       78,262,436
 4     32.989%      2.667/2.699/2.710/0.009        0.642/0.654/0.659/0.004       76,616,183
 5     32.483%      2.839/2.861/2.871/0.009        0.637/0.650/0.654/0.004       75,441,984
 6     32.318%      3.454/3.486/3.496/0.009        0.636/0.645/0.650/0.003       75,058,801
 7     32.061%      4.723/4.767/4.782/0.012        0.639/0.650/0.654/0.003       74,461,300
 8     31.978%      7.233/7.305/7.329/0.025        0.636/0.648/0.652/0.004       74,269,032
 9     31.961%      9.808/9.915/9.945/0.036        0.642/0.653/0.660/0.005       74,230,342

 avg1  34.188%                        4.027                          0.660
 tot                               1087.268                        178.137      714,603,316

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jun 19, 2021

That's like 0.007% and 0.66% difference... lol...

@nmoinvaz
Copy link
Member

Well it's not bad. I just wanted to make sure there wasn't a degradation of any kind.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jun 19, 2021

@nmoinvaz Exactly... I was expecting some degradation... But I guess the optimizer works as I wanted, as in modifying len in every step doesn't slow it down.

@nmoinvaz nmoinvaz added Architecture Architecture specific bug labels Jun 19, 2021
@Dead2
Copy link
Member

Dead2 commented Jun 19, 2021

Baseline 834e7d8 aarch64

   text    data     bss     dec     hex filename
 105194    1576      16  106786   1a122 libz-ng.so.2

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.166%      2.746/2.764/2.780/0.011        0.965/0.993/1.015/0.013       42,618,858
 2     43.879%      2.819/2.851/2.870/0.017        0.563/0.601/0.631/0.022       20,714,938
 3     42.383%      3.713/3.758/3.780/0.020        0.565/0.580/0.592/0.008       20,008,755
 4     41.660%      2.768/2.802/2.818/0.015        0.353/0.372/0.381/0.008       13,111,391
 5     41.216%      3.014/3.031/3.051/0.012        0.365/0.375/0.383/0.005       12,971,746
 6     41.039%      3.505/3.539/3.555/0.014        0.358/0.372/0.384/0.009       12,916,025
 7     40.778%      2.239/2.270/2.284/0.011        0.168/0.175/0.181/0.004        6,416,924
 8     40.704%      2.769/2.800/2.809/0.010        0.168/0.179/0.188/0.007        6,405,244
 9     40.696%      2.983/3.016/3.029/0.013        0.160/0.172/0.180/0.007        6,404,084

 avg1  42.947%                        2.981                          0.424
 tot                                402.450                         57.285      141,567,965

PR #1008 858ec3e aarch64

   text    data     bss     dec     hex filename
 105194    1576      16  106786   1a122 libz-ng.so.2

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.166%      2.739/2.769/2.803/0.017        0.979/0.997/1.010/0.009       42,618,858
 2     43.879%      2.838/2.857/2.871/0.011        0.584/0.612/0.625/0.013       20,714,938
 3     42.383%      3.733/3.754/3.780/0.016        0.551/0.582/0.597/0.015       20,008,755
 4     41.660%      2.765/2.800/2.815/0.014        0.353/0.375/0.387/0.011       13,111,391
 5     41.216%      3.003/3.026/3.042/0.012        0.352/0.372/0.381/0.008       12,971,746
 6     41.039%      3.513/3.533/3.540/0.007        0.351/0.364/0.378/0.008       12,916,025
 7     40.778%      2.247/2.271/2.283/0.009        0.156/0.179/0.188/0.009        6,416,924
 8     40.704%      2.772/2.804/2.817/0.011        0.161/0.177/0.184/0.007        6,405,244
 9     40.696%      2.988/3.024/3.039/0.013        0.156/0.177/0.188/0.008        6,404,084

 avg1  42.947%                        2.982                          0.426
 tot                                402.583                         57.530      141,567,965

@Dead2 Dead2 merged commit 4521023 into zlib-ng:develop Jun 21, 2021
Dead2 added a commit that referenced this pull request Jun 21, 2021
- Fix inflate corruption on aarch64 #1008
- Fix MSVC warnings #1002
- Minor chunkset improvements #1000 #994
- Minor cleanup #997
- Add CI test for pigz on aarch64 #1004
- Cmake improvements #996
@Dead2 Dead2 mentioned this pull request Jun 21, 2021
Dead2 added a commit that referenced this pull request Jun 21, 2021
- Fix inflate corruption on aarch64 #1008
- Fix MSVC warnings #1002 #1013
- Minor chunkset improvements #1000 #994
- Minor cleanup #997
- Add CI test for pigz on aarch64 #1004
- Cmake improvements #996
Dead2 added a commit that referenced this pull request Jun 22, 2021
- Fix inflate corruption on aarch64 #1008
- Fix MSVC warnings #1002 #1013
- Minor chunkset improvements #1000 #994 #1015
- Minor cleanup #997
- Add CI test for pigz on aarch64 #1004
- Cmake improvements #996
Dead2 added a commit that referenced this pull request Jun 25, 2021
- Fix inflate corruption on aarch64 #1008
- Fix MSVC warnings #1002 #1013
- Minor chunkset improvements #1000 #994 #1015
- Minor cleanup #997
- Add CI test for pigz on aarch64 #1004
- Cmake improvements #996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture Architecture specific bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants