-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow SCAMP binaries to be optionally redistributable #99
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This will allow SCAMP to avoid using march=native for performance, allowing SCAMP binaries to be distributed.
…led. Also adds some flags to increase the chance a compiler will use FMA.
… different build types.
…from the environment.
…ations to be applied to more CPUS.
zpzim
added a commit
that referenced
this pull request
Jun 18, 2022
* Added runtime dispatch of AVX/AVX2-based CPU kernels. These are conditionally compiled only if they are needed to produce a redistributable binary. * Add option to disable -march=native configurations and make the SCAMP binary redistributable. This is specified via the environment variable SCAMP_ENABLE_BINARY_DISTRIBUTION=ON * Adds some flags to increase the chance a compiler will use FMA instructions when they are available. * Add testing coverage for redistributable binary builds. Including emulation tests with Intel SDE to verify SIMD dispatch runs on various CPU configurations. * Update main CMakeLists.txt to better specify global compile flags for different build types. * Update docker container to build in a redistributable way. * Update CUDA build tests to use updated action to build on windows-latest. * Minor performance tuning of CPU kernel unroll widths. * Prevent unnecessary files from being packaged with pyscamp
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since the beginning of the project, SCAMP has used -march=native and the like to maximize CPU performance on the host system. Unfortunately, this also makes it so we can't redistribute SCAMP binaries or pyscamp wheels, etc.
This PR adds an optional flag to remove compiler flags which are not redistributable
SCAMP_ENABLE_BINARY_DISTRIBUTON
In order to retain some semblance of performance in this mode. I have added code paths in the cpu kernel library which will execute based on runtime checks of the host CPU architecture.
In particular, there is a code path for AVX and AVX2. These paths will only be triggered if the FMA instruction is also available (this should be true the vast majority of the time).
I did not add an AVX512 path as I don't have a way to test that locally. It might be useful to add one in the future.