Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable dynamic function dispatching for native or arch-specific builds #1659

Merged
merged 5 commits into from
Mar 6, 2024

Conversation

phprus
Copy link
Contributor

@phprus phprus commented Feb 3, 2024

Another implementation of PR #1645.

Main differences:

  • Separate CPU-specific function prototypes and CPU-arch features checker.
  • Reducing dependencies between header files.

Copy link

codecov bot commented Feb 3, 2024

Codecov Report

Attention: Patch coverage is 97.22222% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 83.03%. Comparing base (af494fc) to head (a25d5a7).

❗ Current head a25d5a7 differs from pull request most recent head 660b36b. Consider uploading reports for the commit 660b36b to get more accurate results

Files Patch % Lines
deflate.c 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1659       +/-   ##
============================================
+ Coverage         0   83.03%   +83.03%     
============================================
  Files            0      134      +134     
  Lines            0    10336    +10336     
  Branches         0     2813     +2813     
============================================
+ Hits             0     8583     +8583     
- Misses           0     1055     +1055     
- Partials         0      698      +698     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

arch/power/power_features.h Outdated Show resolved Hide resolved
arch/riscv/riscv_features.h Outdated Show resolved Hide resolved
arch/arm/arm_functions.h Outdated Show resolved Hide resolved
@phprus
Copy link
Contributor Author

phprus commented Feb 4, 2024

All compile time CPU features checks is implemented.

@Dead2 Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 5, 2024
@phprus
Copy link
Contributor Author

phprus commented Feb 5, 2024

Rebased.

@nmoinvaz nmoinvaz removed the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 5, 2024
@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 6, 2024

I like how it is split into cpu_features and cpu_functions.

arch/arm/arm_functions.h Outdated Show resolved Hide resolved
arch/arm/arm_functions.h Outdated Show resolved Hide resolved
arch/generic/generic_functions.h Outdated Show resolved Hide resolved
arch/generic/generic_functions.h Show resolved Hide resolved
functable.c Outdated

#include "zbuild.h"
#include "zendian.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency can be removed from win32/ makefiles. And the one below should be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is not direct now, but indirect, through cpu_functions.h.

As I see the dependencies in win32/Makefile.* were broken before this PR.
For example, win32/Makefile.arm has a dependency on x86 file $(SRCDIR)/arch/x86/x86_features.h.

If we need actual nmake support (without cmake), I suggest fix the nmake-dependencies with separate PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to fix the nmake dependencies when all the "more important" changes have been merged... It would help a lot if people would create issues when there is something wrong in the nmake Makefiles as I can't track all the changes if pull requests are merged in batches. I try to check most of the new pull requests at least once a week (unless I'm tagged in the pull request), but my time is limited due to other projects that are more important.

@Dead2
Copy link
Member

Dead2 commented Feb 6, 2024

I don't see any of the new header files added to the win32/ makefiles, those should probably go in so the dependency handling works properly.
Actually we should probably make a script that generates those deps, and have a CI job that fails when there is a diff. Those are kind of a mess already, and steadily becoming worse. @mtl1979 what do you think, is it worth it or not?

I think perhaps the cpu_functions.h file should be named arch_functions.h, since it is not so much related to the running cpu (like cpu_features.h does) but rather to the arch-specific functions located in the arch/* directories.

I have only done a quick review so far, but what I have looked at so far looks good. I agree that the separation looks good.

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 6, 2024

I don't see any of the new header files added to the win32/ makefiles, those should probably go in so the dependency handling works properly. Actually we should probably make a script that generates those deps, and have a CI job that fails when there is a diff. Those are kind of a mess already, and steadily becoming worse. @mtl1979 what do you think, is it worth it or not?

I think perhaps the cpu_functions.h file should be named arch_functions.h, since it is not so much related to the running cpu (like cpu_features.h does) but rather to the arch-specific functions located in the arch/* directories.

I have only done a quick review so far, but what I have looked at so far looks good. I agree that the separation looks good.

I have never looked up if cl.exe can generate dependency information for Windows. Even if it can, there's always issue that there is no sed-like tool to reformat the output so it matches the format nmake expects.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 6, 2024

Time to ditch nmake like I've been saying for years. :)

@mtl1979
Copy link
Collaborator

mtl1979 commented Feb 6, 2024

Time to ditch nmake like I've been saying for years. :)

nmake is still getting new features... It's just that some people use older versions, for example version that is bundled with Visual C++ 2017.9 ...

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 6, 2024

nmake is still getting new features...

I can't locate a nmake change log anywhere..

@phprus
Copy link
Contributor Author

phprus commented Feb 7, 2024

Rebased.

@Dead2, Renamed cpu_functions.h to arch_functions.h.

@phprus phprus force-pushed the functable-native-3 branch 2 times, most recently from 4b939cc to a8897ef Compare February 7, 2024 19:57
@Dead2
Copy link
Member

Dead2 commented Feb 21, 2024

Develop ca0e463 Feb 21

 Tool: minigzip   Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.166%      0.150/0.154/0.156/0.001        0.034/0.036/0.038/0.001        8,523,732
 2     43.871%      0.244/0.249/0.250/0.001        0.032/0.036/0.038/0.002        6,903,609
 3     42.387%      0.295/0.297/0.299/0.001        0.031/0.034/0.036/0.001        6,670,099
 4     41.647%      0.317/0.324/0.326/0.002        0.030/0.034/0.035/0.001        6,553,723
 5     41.216%      0.344/0.352/0.354/0.002        0.032/0.034/0.035/0.001        6,485,938
 6     41.037%      0.390/0.397/0.400/0.003        0.030/0.033/0.035/0.001        6,457,776
 7     40.778%      0.493/0.498/0.503/0.003        0.030/0.034/0.035/0.001        6,416,919
 8     40.704%      0.605/0.613/0.618/0.004        0.029/0.033/0.034/0.001        6,405,244
 9     40.409%      0.912/0.919/0.925/0.004        0.029/0.033/0.035/0.002        6,358,951

 avg1  42.913%                        0.423                          0.034
 tot                                114.095                          9.195       60,775,991

   text    data     bss     dec     hex filename
 138190    1352       8  139550   2211e libz-ng.so.2

Develop Native=on

Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
1     54.166%      0.147/0.151/0.153/0.002        0.028/0.035/0.037/0.002        8,523,732
2     43.871%      0.239/0.242/0.244/0.001        0.031/0.035/0.037/0.002        6,903,609
3     42.387%      0.285/0.289/0.291/0.002        0.031/0.034/0.035/0.001        6,670,099
4     41.647%      0.316/0.318/0.320/0.001        0.030/0.033/0.035/0.001        6,553,723
5     41.216%      0.345/0.351/0.354/0.002        0.030/0.033/0.034/0.001        6,485,938
6     41.037%      0.389/0.397/0.401/0.003        0.029/0.033/0.034/0.002        6,457,776
7     40.778%      0.485/0.495/0.498/0.003        0.030/0.033/0.035/0.001        6,416,919
8     40.704%      0.600/0.607/0.611/0.003        0.027/0.032/0.034/0.002        6,405,244
9     40.409%      0.903/0.909/0.912/0.003        0.028/0.033/0.034/0.002        6,358,951

avg1  42.913%                        0.418                          0.033
tot                                112.795                          9.000       60,775,991

  text    data     bss     dec     hex filename
137222    1352       8  138582   21d56 libz-ng.so.2

PR #1659 Native=off

Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
1     54.166%      0.151/0.154/0.155/0.001        0.031/0.035/0.037/0.001        8,523,732
2     43.871%      0.247/0.249/0.250/0.001        0.032/0.036/0.038/0.002        6,903,609
3     42.387%      0.291/0.295/0.297/0.002        0.032/0.035/0.036/0.001        6,670,099
4     41.647%      0.319/0.323/0.326/0.002        0.032/0.034/0.035/0.001        6,553,723
5     41.216%      0.345/0.351/0.354/0.002        0.029/0.034/0.035/0.001        6,485,938
6     41.037%      0.392/0.397/0.400/0.002        0.031/0.033/0.035/0.001        6,457,776
7     40.778%      0.491/0.499/0.503/0.003        0.029/0.033/0.035/0.001        6,416,919
8     40.704%      0.606/0.611/0.616/0.003        0.031/0.033/0.035/0.001        6,405,244
9     40.409%      0.911/0.918/0.924/0.004        0.031/0.033/0.035/0.001        6,358,951

avg1  42.913%                        0.422                          0.034
tot                                113.946                          9.209       60,775,991

  text    data     bss     dec     hex filename
138190    1352       8  139550   2211e libz-ng.so.2

PR #1659 Native=on

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.166%      0.093/0.097/0.099/0.001        0.027/0.034/0.037/0.003        8,523,732
 2     43.871%      0.172/0.176/0.178/0.002        0.032/0.035/0.037/0.001        6,903,609
 3     42.387%      0.224/0.227/0.229/0.001        0.031/0.034/0.035/0.001        6,670,099
 4     41.647%      0.253/0.256/0.259/0.002        0.029/0.033/0.035/0.002        6,553,723
 5     41.216%      0.285/0.290/0.293/0.002        0.029/0.033/0.034/0.002        6,485,938
 6     41.037%      0.328/0.338/0.342/0.003        0.028/0.032/0.034/0.002        6,457,776
 7     40.778%      0.489/0.495/0.499/0.003        0.027/0.033/0.035/0.002        6,416,919
 8     40.704%      0.601/0.608/0.612/0.003        0.028/0.033/0.034/0.002        6,405,244
 9     40.409%      0.903/0.912/0.918/0.004        0.025/0.031/0.033/0.002        6,358,951

 avg1  42.913%                        0.378                          0.033
 tot                                102.026                          8.896       60,775,991

   text    data     bss     dec     hex filename
 133814    1208       8  135030   20f76 libz-ng.so.2

Develop w/native=ON vs PR w/native=ON compression speedup:

  • Level 1: 35%
  • Level 3: 21%
  • Level 6: 15%
  • Level 9: 0.3%

PR w/native=ON vs PR #1681 w/native=OFF:

  • Level 1: 11%
  • Level 3: 8%
  • Level 6: 5%
  • Level 9: 0.5% slower

Looks like a decent speedup to me.
Ideally this would have been based on top of #1681, as it removes a good portion of the complexity in this PR.

@phprus
Copy link
Contributor Author

phprus commented Feb 21, 2024

My suggestion:

  1. Extract commits:
  • Split CPU features checks and CPU-specific function prototypes and reduce include-dependencies.
  • Move select for generic functions into generic_functions.h.
  • Rename cpu_functions.h to arch_functions.h.

into separate PR.

  1. Rebase Remove update_hash and insert_string implementations from functable #1681 onto separated files implementation.

  2. Rebase this PR onto new develop branch after previous PRs is merged.

What do your think about this plan?

@Dead2
Copy link
Member

Dead2 commented Feb 21, 2024

What do your think about this plan?

@phprus I think that is a good plan and would probably ensure we get it all merged the fastest. 👍

@phprus
Copy link
Contributor Author

phprus commented Feb 21, 2024

@Dead2

  1. PR Remove duplicated codecov parameter #1684 - small fix for Allow overwrite NATIVEFLAG value. #1662.
  2. PR Split cpu features and arch functions #1685 - split headers.

@phprus phprus marked this pull request as draft February 21, 2024 14:32
@phprus
Copy link
Contributor Author

phprus commented Feb 23, 2024

Rebase required.
Waiting PR #1673 (NMake)
Waiting #1681, #1682 (deflate optimizations).

@phprus phprus force-pushed the functable-native-3 branch 3 times, most recently from 0a66293 to c88339f Compare February 28, 2024 05:33
@phprus phprus marked this pull request as ready for review February 28, 2024 06:47
@phprus
Copy link
Contributor Author

phprus commented Feb 28, 2024

Ready for review.

@phprus phprus force-pushed the functable-native-3 branch 2 times, most recently from 8dbfe71 to a25d5a7 Compare March 1, 2024 09:01
.github/workflows/cmake.yml Outdated Show resolved Hide resolved
deflate.c Outdated Show resolved Hide resolved
inflate.c Outdated Show resolved Hide resolved
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
…ros. Fix it if AVX is enabled.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
…_INIT

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Copy link
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phprus
Copy link
Contributor Author

phprus commented Mar 5, 2024

Rebased onto develop

@@ -546,6 +561,12 @@ jobs:
gcov-exec: gcov-10
codecov: macos_gcc

- name: macOS Clang Native Instructions (ARM64)
os: macos-14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use macos-latest here instead? If there is a reason, perhaps it should be commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners

The macos-latest workflow label currently uses the macOS 12 runner image.

Therefore we need macos-14 (first ARM64 runner).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Got it.

@Dead2 Dead2 merged commit af8169a into zlib-ng:develop Mar 6, 2024
142 checks passed
@Dead2 Dead2 changed the title Proposal (version 2): Disable dynamic function dispatching for native or arch-specific builds Disable dynamic function dispatching for native or arch-specific builds May 30, 2024
@Dead2 Dead2 mentioned this pull request May 30, 2024
@Dead2 Dead2 mentioned this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants