Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve x86 intrinsics dependencies #1643

Merged
merged 5 commits into from
Jan 25, 2024
Merged

Improve x86 intrinsics dependencies #1643

merged 5 commits into from
Jan 25, 2024

Conversation

phprus
Copy link
Contributor

@phprus phprus commented Jan 19, 2024

AVX512 includes AVX2, ..., SSE(N+1) includes SSE(N).
I have not seen any compilers that violate this chain of inclusions.

We can remove unnecessary checks.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
@phprus phprus marked this pull request as draft January 19, 2024 19:15
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (86250d4) 82.97% compared to head (5dc8aa8) 83.13%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1643      +/-   ##
===========================================
+ Coverage    82.97%   83.13%   +0.16%     
===========================================
  Files          135      135              
  Lines        10885    10894       +9     
  Branches      2732     2816      +84     
===========================================
+ Hits          9032     9057      +25     
+ Misses        1143     1127      -16     
  Partials       710      710              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@phprus phprus marked this pull request as ready for review January 19, 2024 20:08
@Dead2
Copy link
Member

Dead2 commented Jan 20, 2024

Hmm, I think this looks good.

FYI, there is a plan for making it possible to compile for a minimum arch of for example x86-64-v2 or x86-64-v3, where only the best of the optimized functions needed for that target is compiled in. (So no C fallback functions, and no SSE2 if SSSE3 exists, etc. But AVX512 is still there for better than minimum cpus.) That is why I moved all the C fallback functions to arch/generic, as a first step towards making those completely optional.
The intention is for distros that now target certain arches to be able to drop the code that will never be run in any case.
I think this PR helps clean up some of the requirements for that.

@phprus
Copy link
Contributor Author

phprus commented Jan 20, 2024

@Dead2

Maybe in this PR we should also remove checks for support for pre-AVX-512 intrinsics, since all supported compilers have these options?
GCC-8.3: https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/x86-Options.html
Clang-9: https://releases.llvm.org/9.0.0/tools/clang/docs/ClangCommandLineReference.html
MSVC:
https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86
https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64

And drop support 32bit MacOS X?

The detect-intrinsics.cmake file has a comment:

the pclmul code currently crashes on Mac in 32bit mode. Avoid for now.

that was added in 2017 (d265bf7)

Latest version supporting 32 bit applications - macOS Mojave (10.14).
macOS Mojave unsupported as of October 2021.
I can't even test for this issue, because i haven't mac with os prior to Catalina.

@phprus
Copy link
Contributor Author

phprus commented Jan 20, 2024

@Dead2
Last 4 commits in branch https://github.com/phprus/zlib-ng/tree/arch-cond-cleanup

@phprus phprus marked this pull request as draft January 20, 2024 18:35
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
@phprus phprus changed the title Remove always true arch conditions Improve x86 intrinsics dependencies Jan 20, 2024
@phprus phprus marked this pull request as ready for review January 20, 2024 19:39
@KungFuJesus
Copy link
Contributor

So from what I could tell, the "WITH_*" macros for those intrinsics weren't so much as detecting if they were supported but were for compiling support for those functions. E.g. you could turn one off if you wanted to. In general you probably didn't want to, though, as some of them called the lesser variants for shorter widths.

@phprus
Copy link
Contributor Author

phprus commented Jan 20, 2024

@KungFuJesus
For example, to make build with SSE 4.2, but without SSE 2?
I don’t understand why that an build might be needed?

@KungFuJesus
Copy link
Contributor

I'm not saying it's a good idea to do so, but a possible motivating factor might be a smaller binary size. Honestly though that macro is guarding things which only need SSE2 but have no greater SSE4 variants so it's already a bad idea to turn off SSE2 in that case.

@KungFuJesus
Copy link
Contributor

I doubt this change has any negative side effects but I'll test it tonight. I'm otherwise good with it at face value

Copy link
Contributor

@KungFuJesus KungFuJesus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No discernible changes with compilation with this. I don't have a vpclmulqdq capable CPU to test that, and there's the always problematic "use native" being incompatible with build options, but I don't see anything here as an issue.

Maybe with WITH_NATIVE_INSTRUCTIONS could make cmake only use native capabilities for the WITH_ options, but maybe that warrants a different PR.

@phprus
Copy link
Contributor Author

phprus commented Jan 21, 2024

@KungFuJesus
I have one idea about implementing static function dispatching when using WITH_NATIVE_INSTRUCTIONS:
https://github.com/phprus/zlib-ng/tree/functable-native
V2: https://github.com/phprus/zlib-ng/tree/functable-native-2

The last commit implements static dispatch using predefined compiler macros.

I plan to propose this changes a bit later after minor tweaks.

@Dead2
Copy link
Member

Dead2 commented Jan 22, 2024

@Dead2
Maybe in this PR we should also remove checks for support for pre-AVX-512 intrinsics, since all supported compilers have these options? GCC-8.3: https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/x86-Options.html Clang-9: https://releases.llvm.org/9.0.0/tools/clang/docs/ClangCommandLineReference.html MSVC: https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86 https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64

I don't think it is a good idea to completely remove support for older compilers just because they are not actively supported anymore. There are for example binary distributors that do static compiles on older distros so that their binary can be run on pretty much all distros, both old and new. We still try to retain support for older compilers if we can.

But, compilers older than GCC 4.5 is probably few and far between, so I would be open to assuming the compiler supports up to sse4.2. Unless there are arguments against this of course.
But this should be a separate PR in any case.

And drop support 32bit MacOS X?

The detect-intrinsics.cmake file has a comment:

the pclmul code currently crashes on Mac in 32bit mode. Avoid for now.

that was added in 2017 (d265bf7)

Latest version supporting 32 bit applications - macOS Mojave (10.14). macOS Mojave unsupported as of October 2021. I can't even test for this issue, because i haven't mac with os prior to Catalina.

Same as above, I don't think removing support is needed. It was released in 2018, so it is not really that old. I don't think anyone should still use that, but that is really none of my business, there will be users and corner-cases. And we don't really win anything by removing that support.

CMakeLists.txt Show resolved Hide resolved
@Dead2 Dead2 merged commit 1aa53f4 into zlib-ng:develop Jan 25, 2024
139 checks passed
@Dead2 Dead2 mentioned this pull request May 30, 2024
@Dead2 Dead2 mentioned this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants