Description
Describe the issue
Build fails on the latest version: SHA-1: 3a47bd2
No operator "+=" matches operands
nv_bfloat16 += nv_bfloat16
nv_bfloat16 += const nv_bfloat16
nv_bfloat16 += float
Urgency
Urgent! Build with CUDA 11.8 fails
Target platform
Windows
Build script
.\build.bat --config Debug --build_shared_lib --parallel --use_cuda --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\cuDNN\cudnn-windows-x86_64-8.9.0.131_cuda11-archive" --cuda_version 11.8 --use_tensorrt --tensorrt_home "C:\Program Files\NVIDIA GPU Computing Toolkit\TensorRT\TensorRT-10.9.0.34.Windows.win10.cuda-11.8\TensorRT-10.9.0.34" --use_tensorrt_oss_parser --cmake_generator "Visual Studio 16 2019" --compile_no_warning_as_error --cmake_path E:\3rdParties\cmake-4.0.3\build\bin\Release\cmake.exe --skip_tests --enable_cuda_line_info --use_mimalloc
Workaround of building without contrib ops (--disable_contrib_ops) does not work in my case:
CMake Error at onnxruntime_providers_tensorrt.cmake:4 (message):
To compile TensorRT execution provider contrib ops have to be enabled to
dump an engine using com.microsoft:EPContext node.
Call Stack (most recent call first):
onnxruntime_providers.cmake:132 (include)
CMakeLists.txt:1890 (include)
Error / output
E:\3rdParties\onnxruntime_v1.22.0\onnxruntime\contrib_ops\cuda\bert\skip_layer_norm_impl.cu(167): error : no operator "+=" matches these operands [E:\3rdParties\onnxruntime_v1.22.0\build\Windows\Debug\onnxrun
time_providers_cuda.vcxproj]
operand types are: nv_bfloat16 += nv_bfloat16
detected during:
instantiation of "void onnxruntime::contrib::cuda::SkipLayerNormKernelSmall<T,TPB,ILP,Simplified>(T *, T *, const T *, const T *, const T *, const T *, const T *, T, int, int) [with T=nv_bfloat1
6, TPB=32U, ILP=4, Simplified=true]"
(240): here
instantiation of "void onnxruntime::contrib::cuda::LaunchSkipLayerNormKernel<T,Simplified>(cudaStream_t, T *, T *, const T *, const T *, const T *, const T *, const T *, float, int, int, int) [w
ith T=nv_bfloat16, Simplified=true]"
(272): here
Visual Studio Version
VS 16 2019
GCC / Compiler Version
No response