-
Notifications
You must be signed in to change notification settings - Fork 221
Fix vectorization pragmas for icx compiler #3246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -44,59 +44,49 @@ DAAL_EXPORT bool daal_check_is_intel_cpu(); | |||
|
|||
#define DAAL_CHECK_CPU_ENVIRONMENT (daal_check_is_intel_cpu()) | |||
|
|||
#if defined(__INTEL_COMPILER) | |||
#if defined(__INTEL_COMPILER) || defined(__INTEL_LLVM_COMPILER) | |||
#define PRAGMA_FORCE_SIMD _Pragma("ivdep") | |||
#define PRAGMA_NOVECTOR _Pragma("novector") | |||
#define PRAGMA_VECTOR_ALIGNED _Pragma("vector aligned") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this play along with omp simd
? How about passing the alignment to the OMP pragma? It also supports #pragma omp simd aligned(pointer_name:64)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to pass any arguments to #pragma omp simd
with PRAGMA_OMP_SIMD() macro.
For example:
https://github.com/uxlfoundation/oneDAL/pull/3246/files#diff-61b267e32558b19a2dba159ff0128be35c0c8eafe9b140791a7073e262af17f7R1201
Alignment options can be passed similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But does the compiler actually use the hint if put before omp simd
?
#define DAAL_TYPENAME typename | ||
#elif defined(_MSC_VER) | ||
#define PRAGMA_FORCE_SIMD | ||
#define PRAGMA_NOVECTOR | ||
#define PRAGMA_FORCE_SIMD _Pragma("loop(ivdep)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC supports omp simd
if enabling experimental mode:
https://learn.microsoft.com/en-us/cpp/parallel/openmp/openmp-simd?view=msvc-170
#define PRAGMA_FORCE_SIMD _Pragma("omp simd") | ||
#define PRAGMA_FORCE_SIMD _Pragma("omp simd") | ||
#define PRAGMA_TO_STR(ARGS) _Pragma(#ARGS) | ||
#define PRAGMA_OMP_SIMD(ARGS) PRAGMA_TO_STR(omp simd ARGS) | ||
#else | ||
#define PRAGMA_FORCE_SIMD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omp simd
is supported by GCC on all platforms as far as I am aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// \param[in] x Pointer to the input matrix x of size nRows * nColumns | ||
/// \param[out] sum Pointer to the output array of size nRows, where the sum of each row of x will be stored | ||
template <typename algorithmFPType, CpuType cpu> | ||
void sumByRows(const size_t nRows, const size_t nColumns, const algorithmFPType * x, algorithmFPType * sum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is out of the scope of this PR.
Here I am just trying to enable the pragmas for ICX. The refactoring was made just to reduce code duplication and not to do the same modifications in 3 files.
{ | ||
if (block[i * blockSize + i] > (algorithmFPType)0.0) | ||
{ | ||
block[i * blockSize + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(block[i * blockSize + i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PR adding a function for this from MKL: #3227
Perhaps that other one could be merged first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am Ok to have that one merged first and reuse that functionality; but I think the performance have to be measured, as that PR clearly might have performance impact on the algorithm.
/intelci: run |
Description
#pragma omp simd
for vectorization where possible:cpp/daal/src/services/service_defines.h
Note: The use of
#pargma omp simd
was not implemented for MSVC because it is required to link with OpenMP to support the feature leading to additional dependency in Windows build.PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance