[cuBLAS] Gemm tests using half can fail

# Summary
cuBLAS tests running Gemm with half precision can fail with wrong results on A100.

# Version
Using the tip of develop as of today (https://github.com/oneapi-src/oneMKL/commit/6923d402d5bccba9ae1966062bc5a277fc74776c).

# Environment
Using A100 with the DPC++ release 2024.2.0 and the associated [Codeplay Nvidia plugin](https://developer.codeplay.com/products/oneapi/nvidia/home/). The CUDA version is 12.6.2, OS is Ubuntu 22.04.

# Steps to reproduce

```
cmake -Bbuild-a100 -GNinja -DCMAKE_CXX_COMPILER=`which icpx` -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=ON -DENABLE_CURAND_BACKEND=ON -DENABLE_CUSOLVER_BACKEND=ON -DENABLE_CUFFT_BACKEND=ON -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-a100
ninja
ctest -R ".*GemmUsmTests.*Half.*" --output-on-failure
```

# Observed behavior
Full log: [log_a100.txt](https://github.com/user-attachments/files/17462689/log_a100.txt)
Short extract:
```
[ RUN      ] GemmUsmTestSuite/GemmUsmTests.HalfHalfFloatPrecision/Column_Major_NVIDIA_A100_PCIE_40GB
relative error = 0.496206 absolute error = 0.382722 limit = 0.00010848
Difference in entry (0,0): DPC++ 0.388574 vs. Reference 0.771296
relative error = 1.36303 absolute error = 1.67412 limit = 0.00010848
Difference in entry (1,0): DPC++ 0.445891 vs. Reference -1.22823
relative error = 1.05343 absolute error = 0.664006 limit = 0.00010848
Difference in entry (2,0): DPC++ 0.0336805 vs. Reference -0.630325
relative error = 1.0674 absolute error = 0.514821 limit = 0.00010848
Difference in entry (3,0): DPC++ 0.0325077 vs. Reference -0.482313
relative error = 0.789876 absolute error = 0.992507 limit = 0.00010848
Difference in entry (4,0): DPC++ -0.264029 vs. Reference -1.25654
relative error = 0.925093 absolute error = 1.07784 limit = 0.00010848
```
The differences between the output and reference seem too large to be due to a precision issue.

# Expected behavior
The tests should pass.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cuBLAS] Gemm tests using half can fail #599

Summary

Version

Environment

Steps to reproduce

Observed behavior

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[cuBLAS] Gemm tests using half can fail #599

Description

Summary

Version

Environment

Steps to reproduce

Observed behavior

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions