[Build] Build fails: 'error : no operator "+=" matches these operands' with nv_bfloat16

### Describe the issue

Build fails on the latest version: SHA-1: 3a47bd2821370df1315c0eead9189bda4e0ea186

No operator "+=" matches operands
nv_bfloat16 += nv_bfloat16
nv_bfloat16 += const nv_bfloat16
nv_bfloat16 += float

### Urgency

Urgent! Build with CUDA 11.8 fails

### Target platform

Windows

### Build script

`.\build.bat --config Debug --build_shared_lib --parallel --use_cuda --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\cuDNN\cudnn-windows-x86_64-8.9.0.131_cuda11-archive" --cuda_version 11.8 --use_tensorrt --tensorrt_home "C:\Program Files\NVIDIA GPU Computing Toolkit\TensorRT\TensorRT-10.9.0.34.Windows.win10.cuda-11.8\TensorRT-10.9.0.34" --use_tensorrt_oss_parser --cmake_generator "Visual Studio 16 2019" --compile_no_warning_as_error --cmake_path E:\3rdParties\cmake-4.0.3\build\bin\Release\cmake.exe --skip_tests --enable_cuda_line_info --use_mimalloc`

Workaround of building without contrib ops (--disable_contrib_ops) does not work in my case:
```
CMake Error at onnxruntime_providers_tensorrt.cmake:4 (message):
  To compile TensorRT execution provider contrib ops have to be enabled to
  dump an engine using com.microsoft:EPContext node.
Call Stack (most recent call first):
  onnxruntime_providers.cmake:132 (include)
  CMakeLists.txt:1890 (include)
```

### Error / output

E:\3rdParties\onnxruntime_v1.22.0\onnxruntime\contrib_ops\cuda\bert\skip_layer_norm_impl.cu(167): error : no operator "+=" matches these operands [E:\3rdParties\onnxruntime_v1.22.0\build\Windows\Debug\onnxrun
time_providers_cuda.vcxproj]
              operand types are: nv_bfloat16 += nv_bfloat16
            detected during:
              instantiation of "void onnxruntime::contrib::cuda::SkipLayerNormKernelSmall<T,TPB,ILP,Simplified>(T *, T *, const T *, const T *, const T *, const T *, const T *, T, int, int) [with T=nv_bfloat1
  6, TPB=32U, ILP=4, Simplified=true]"
  (240): here
              instantiation of "void onnxruntime::contrib::cuda::LaunchSkipLayerNormKernel<T,Simplified>(cudaStream_t, T *, T *, const T *, const T *, const T *, const T *, const T *, float, int, int, int) [w
  ith T=nv_bfloat16, Simplified=true]"
  (272): here

### Visual Studio Version

VS 16 2019

### GCC / Compiler Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Build] Build fails: 'error : no operator "+=" matches these operands' with nv_bfloat16 #25162

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Build] Build fails: 'error : no operator "+=" matches these operands' with nv_bfloat16 #25162

Description

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions