Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU #64221

QuesarVII · 2024-03-21T20:42:20Z

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.16.1

Custom code

No

OS platform and distribution

x86_64 Ubuntu 22.04

Mobile device

No response

Python version

3.10

Bazel version

6.5.0

GCC/compiler version

clang 17.0.6 (ubuntu package)

CUDA/cuDNN version

12.4 / 8.9.7

GPU model and memory

(2) RTX A4500 20GB

Current behavior?

Tensorflow 2.16.1 fails to compile with -march=native when building with CUDA support on Xeon 4410Y CPUs with clang 17.0.6. I run configure saying yes to cuda support, then edit .tf_configure.bazelrc and add these lines:
build:cuda --copt=-march=native
build:cuda --host_copt=-march=native
build:cuda --copt=-Wno-error=unused-command-line-argument (see issue 62459 regarding "unused"
and set TF_PYTHON_VERSION=3.10, the bazel build errors out on tensorflow/core/kernels/matmul_op_real.cc in external/eigen_archive/Eigen/src/Core/MathFunctions.h. The error output is in the log field.

This appears to be the same problem as issue 62047.

Standalone code to reproduce the issue

N/A

Relevant log output

...
In file included from tensorflow/core/kernels/matmul_op_real.cc:16:
In file included from ./tensorflow/core/kernels/matmul_op_impl.h:30:
In file included from external/eigen_archive/Eigen/Core:182:
external/eigen_archive/Eigen/src/Core/MathFunctions.h:430:12: error: no matching conversion for static_cast from 'const Eigen::internal::eigen_packet_wrapper<__attribute__((__vector_size__(4 * sizeof(long long)))) long long, 1>' to '__attribute__((__vector_size__(16 * sizeof(float)))) float' (vector of 16 'float' values)
  430 |     return static_cast<NewType>(x);
      |            ^~~~~~~~~~~~~~~~~~~~~~~
external/eigen_archive/Eigen/src/Core/GenericPacketMath.h:269:45: note: in instantiation of member function 'Eigen::internal::cast_impl<Eigen::internal::eigen_packet_wrapper<__attribute__((__vector_size__(4 * sizeof(long long)))) long long, 1>, __attribute__((__vector_size__(16 * sizeof(float)))) float>::run' requested here
  269 |     return cast_impl<SrcPacket, TgtPacket>::run(a);
      |                                             ^
external/eigen_archive/Eigen/src/Core/GenericPacketMath.h:290:47: note: in instantiation of member function 'Eigen::internal::pcast_generic<Eigen::internal::eigen_packet_wrapper<__attribute__((__vector_size__(4 * sizeof(long long)))) long long, 1>, __attribute__((__vector_size__(16 * sizeof(float)))) float>::run' requested here
  290 |   return pcast_generic<SrcPacket, TgtPacket>::run(a);
      |                                               ^
external/eigen_archive/Eigen/src/Core/CoreEvaluators.h:790:12: note: in instantiation of function template specialization 'Eigen::internal::pcast<Eigen::internal::eigen_packet_wrapper<__attribute__((__vector_size__(4 * sizeof(long long)))) long long, 1>, __attribute__((__vector_size__(16 * sizeof(float)))) float>' requested here
  790 |     return pcast<SizedSrcPacketType, DstPacketType>(
      |            ^
external/eigen_archive/Eigen/src/Core/AssignEvaluator.h:707:87: note: in instantiation of function template specialization 'Eigen::internal::unary_evaluator<Eigen::CwiseUnaryOp<Eigen::internal::core_cast_op<Eigen::half, float>, const Eigen::Map<const Eigen::Array<Eigen::half, -1, 1>>>>::packet<0, __attribute__((__vector_size__(16 * sizeof(float)))) float, true>' requested here
  707 |     m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(index), m_src.template packet<LoadMode,Packet>(index));
      |                                                                                       ^
external/eigen_archive/Eigen/src/Core/AssignEvaluator.h:463:23: note: in instantiation of function template specialization 'Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<float, -1, 1>>>, Eigen::internal::evaluator<Eigen::CwiseUnaryOp<Eigen::internal::core_cast_op<Eigen::half, float>, const Eigen::Map<const Eigen::Array<Eigen::half, -1, 1>>>>, Eigen::internal::assign_op<float, float>>::assignPacket<64, 0, __attribute__((__vector_size__(16 * sizeof(float)))) float>' requested here
  463 |       kernel.template assignPacket<dstAlignment, srcAlignment, PacketType>(index);
      |                       ^
external/eigen_archive/Eigen/src/Core/AssignEvaluator.h:916:46: note: (skipping 3 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
  916 |   Assignment<ActualDstTypeCleaned,Src,Func>::run(actualDst, src, func);
      |                                              ^
external/eigen_archive/Eigen/src/Core/Map.h:162:5: note: in instantiation of function template specialization 'Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, 1>>>::operator=<Eigen::CwiseUnaryOp<Eigen::internal::core_cast_op<Eigen::half, float>, const Eigen::Map<const Eigen::Array<Eigen::half, -1, 1>>>>' requested here
  162 |     EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Map)
      |     ^
external/eigen_archive/Eigen/src/Core/util/Macros.h:1124:5: note: expanded from macro 'EIGEN_INHERIT_ASSIGNMENT_OPERATORS'
 1124 |     EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \
      |     ^
external/eigen_archive/Eigen/src/Core/util/Macros.h:1097:108: note: expanded from macro 'EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR'
 1097 |     EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const DenseBase<OtherDerived>& other) { Base::operator=(other.derived()); return *this; }
      |                                                                                                            ^
./tensorflow/core/kernels/matmul_op_impl.h:867:13: note: in instantiation of function template specialization 'Eigen::Map<Eigen::Array<float, -1, 1>>::operator=<Eigen::CwiseUnaryOp<Eigen::internal::core_cast_op<Eigen::half, float>, const Eigen::Map<const Eigen::Array<Eigen::half, -1, 1>>>>' requested here
  867 |   dst_eigen = src_eigen.template cast<float>();
      |             ^
./tensorflow/core/kernels/matmul_op_impl.h:994:7: note: in instantiation of function template specialization 'tensorflow::FastConvertToFloat<Eigen::half>' requested here
  994 |       FastConvertToFloat(in0_reshaped.flat<Ta>().data(),
      |       ^
./tensorflow/core/kernels/matmul_op_impl.h:1049:12: note: in instantiation of member function 'tensorflow::BaseBatchMatMulOp<Eigen::ThreadPoolDevice, Eigen::half, Eigen::half, Eigen::half>::Compute' requested here
 1049 |   explicit BatchMatMulOp(OpKernelConstruction* context)
      |            ^
tensorflow/core/kernels/matmul_op_real.cc:24:21: note: in instantiation of member function 'tensorflow::BatchMatMulOp<Eigen::ThreadPoolDevice, Eigen::half, Eigen::half, Eigen::half>::BatchMatMulOp' requested here
   24 | TF_CALL_FLOAT_TYPES(REGISTER_BATCH_MATMUL_CPU);
      |                     ^
external/eigen_archive/Eigen/src/Core/GenericPacketMath.h:227:23: note: candidate function not viable: 'this' argument has type 'const Eigen::internal::eigen_packet_wrapper<__attribute__((__vector_size__(4 * sizeof(long long)))) long long, 1>', but method is not marked const
  227 |   EIGEN_ALWAYS_INLINE operator T&() { return m_val; }
      |                       ^
external/eigen_archive/Eigen/src/Core/GenericPacketMath.h:228:23: note: candidate function
  228 |   EIGEN_ALWAYS_INLINE operator const T&() const { return m_val; }
      |                       ^
In file included from tensorflow/core/kernels/matmul_op_real.cc:16:
In file included from ./tensorflow/core/kernels/matmul_op_impl.h:30:
In file included from external/eigen_archive/Eigen/Core:351:
external/eigen_archive/Eigen/src/Core/products/GeneralBlockPanelKernel.h:2501:123: warning: remainder by zero is undefined [-Wdivision-by-zero]
 2501 |           constexpr bool kCanLoadSRhsQuad = (unpacket_traits<SLhsPacket>::size < 4) || (unpacket_traits<SRhsPacket>::size % (unpacket_traits<SLhsPacket>::size / 4)) == 0;
      |                                                                                                                           ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h:261:5: note: in instantiation of member function 'Eigen::internal::gebp_kernel<short, short, long, Eigen::internal::blas_data_mapper<short, long, 0>, 2, 4>::operator()' requested here
  261 |     GebpKernel()(output_mapper, lhsBlock, rhsBlock, rows, depth, cols, alpha,
      |     ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h:900:18: note: in instantiation of member function 'Eigen::internal::TensorContractionKernel<short, short, short, long, Eigen::internal::blas_data_mapper<short, long, 0>, Eigen::internal::TensorContractionInputMapper<short, long, 1, Eigen::TensorEvaluator<const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, Eigen::ThreadPoolDevice>, Eigen::array<long, 1>, Eigen::array<long, 1>, 1, true, false, 0>, Eigen::internal::TensorContractionInputMapper<short, long, 0, Eigen::TensorEvaluator<const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, Eigen::ThreadPoolDevice>, Eigen::array<long, 1>, Eigen::array<long, 1>, 1, true, true, 0>>::invoke' requested here
  900 |           kernel.invoke(output_mapper, blockA, blockB, actual_mc, actual_kc,
      |                  ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h:794:5: note: in instantiation of function template specialization 'Eigen::TensorContractionEvaluatorBase<Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>>, Eigen::ThreadPoolDevice>>::evalGemmPartial<true, true, true, 0, false>' requested here
  794 |     evalGemmPartial<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous,
      |     ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h:1240:31: note: in instantiation of function template specialization 'Eigen::TensorContractionEvaluatorBase<Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>>, Eigen::ThreadPoolDevice>>::evalGemmPartialWithoutOutputKernel<true, true, true, 0>' requested here
 1240 |           evaluator->template evalGemmPartialWithoutOutputKernel, Alignment,
      |                               ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h:1384:7: note: in instantiation of function template specialization 'Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>>, Eigen::ThreadPoolDevice>::EvalShardedByInnerDimContext<Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>>, Eigen::ThreadPoolDevice>::NoCallback>::processBlock<0>' requested here
 1384 |       processBlock<Alignment>(block_idx, block_start, block_end);
      |       ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h:1163:7: note: (skipping 8 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
 1163 |       eval<Alignment>(barrier, 0, num_blocks);
      |       ^
./tensorflow/core/kernels/matmul_op_impl.h:157:20: note: in instantiation of function template specialization 'Eigen::TensorDevice<Eigen::TensorMap<Eigen::Tensor<short, 2, 1>, 16>, Eigen::ThreadPoolDevice>::operator=<Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const short, 2, 1>, 16>>>' requested here
  157 |       Tz.device(d) = Tx.contract(Ty, contract_pairs);
      |                    ^
./tensorflow/core/kernels/matmul_op_impl.h:440:29: note: in instantiation of member function 'tensorflow::(anonymous namespace)::ParallelMatMulKernel<short, false>::Run' requested here
  440 |       ParallelMatMulKernel::Run(context, in_x, in_y, adj_x, adj_y, trans_x,
      |                             ^
./tensorflow/core/kernels/matmul_op_impl.h:1016:40: note: in instantiation of member function 'tensorflow::LaunchBatchMatMul<Eigen::ThreadPoolDevice, short>::Launch' requested here
 1016 |       LaunchBatchMatMul<Device, Tout>::Launch(
      |                                        ^
./tensorflow/core/kernels/matmul_op_impl.h:1049:12: note: in instantiation of member function 'tensorflow::BaseBatchMatMulOp<Eigen::ThreadPoolDevice, short, short, short>::Compute' requested here
 1049 |   explicit BatchMatMulOp(OpKernelConstruction* context)
      |            ^
tensorflow/core/kernels/matmul_op_real.cc:25:15: note: in instantiation of member function 'tensorflow::BatchMatMulOp<Eigen::ThreadPoolDevice, short, short, short>::BatchMatMulOp' requested here
   25 | TF_CALL_int16(REGISTER_BATCH_MATMUL_CPU);
      |               ^
In file included from tensorflow/core/kernels/matmul_op_real.cc:16:
In file included from ./tensorflow/core/kernels/matmul_op_impl.h:30:
In file included from external/eigen_archive/Eigen/Core:351:
external/eigen_archive/Eigen/src/Core/products/GeneralBlockPanelKernel.h:2501:123: warning: remainder by zero is undefined [-Wdivision-by-zero]
 2501 |           constexpr bool kCanLoadSRhsQuad = (unpacket_traits<SLhsPacket>::size < 4) || (unpacket_traits<SRhsPacket>::size % (unpacket_traits<SLhsPacket>::size / 4)) == 0;
      |                                                                                                                           ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h:261:5: note: in instantiation of member function 'Eigen::internal::gebp_kernel<long, long, long, Eigen::internal::blas_data_mapper<long, long, 0>, 2, 4>::operator()' requested here
  261 |     GebpKernel()(output_mapper, lhsBlock, rhsBlock, rows, depth, cols, alpha,
      |     ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h:900:18: note: in instantiation of member function 'Eigen::internal::TensorContractionKernel<long, long, long, long, Eigen::internal::blas_data_mapper<long, long, 0>, Eigen::internal::TensorContractionInputMapper<long, long, 1, Eigen::TensorEvaluator<const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, Eigen::ThreadPoolDevice>, Eigen::array<long, 1>, Eigen::array<long, 1>, 1, true, false, 0>, Eigen::internal::TensorContractionInputMapper<long, long, 0, Eigen::TensorEvaluator<const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, Eigen::ThreadPoolDevice>, Eigen::array<long, 1>, Eigen::array<long, 1>, 1, true, true, 0>>::invoke' requested here
  900 |           kernel.invoke(output_mapper, blockA, blockB, actual_mc, actual_kc,
      |                  ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h:794:5: note: in instantiation of function template specialization 'Eigen::TensorContractionEvaluatorBase<Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>>, Eigen::ThreadPoolDevice>>::evalGemmPartial<true, true, true, 0, false>' requested here
  794 |     evalGemmPartial<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous,
      |     ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h:1240:31: note: in instantiation of function template specialization 'Eigen::TensorContractionEvaluatorBase<Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>>, Eigen::ThreadPoolDevice>>::evalGemmPartialWithoutOutputKernel<true, true, true, 0>' requested here
 1240 |           evaluator->template evalGemmPartialWithoutOutputKernel, Alignment,
      |                               ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h:1384:7: note: in instantiation of function template specialization 'Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>>, Eigen::ThreadPoolDevice>::EvalShardedByInnerDimContext<Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>>, Eigen::ThreadPoolDevice>::NoCallback>::processBlock<0>' requested here
 1384 |       processBlock<Alignment>(block_idx, block_start, block_end);
      |       ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h:1163:7: note: (skipping 8 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
 1163 |       eval<Alignment>(barrier, 0, num_blocks);
      |       ^
./tensorflow/core/kernels/matmul_op_impl.h:157:20: note: in instantiation of function template specialization 'Eigen::TensorDevice<Eigen::TensorMap<Eigen::Tensor<long, 2, 1>, 16>, Eigen::ThreadPoolDevice>::operator=<Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<long>, 1>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>, const Eigen::TensorMap<Eigen::Tensor<const long, 2, 1>, 16>>>' requested here
  157 |       Tz.device(d) = Tx.contract(Ty, contract_pairs);
      |                    ^
./tensorflow/core/kernels/matmul_op_impl.h:440:29: note: in instantiation of member function 'tensorflow::(anonymous namespace)::ParallelMatMulKernel<long, false>::Run' requested here
  440 |       ParallelMatMulKernel::Run(context, in_x, in_y, adj_x, adj_y, trans_x,
      |                             ^
./tensorflow/core/kernels/matmul_op_impl.h:1016:40: note: in instantiation of member function 'tensorflow::LaunchBatchMatMul<Eigen::ThreadPoolDevice, long>::Launch' requested here
 1016 |       LaunchBatchMatMul<Device, Tout>::Launch(
      |                                        ^
./tensorflow/core/kernels/matmul_op_impl.h:1049:12: note: in instantiation of member function 'tensorflow::BaseBatchMatMulOp<Eigen::ThreadPoolDevice, long, long, long>::Compute' requested here
 1049 |   explicit BatchMatMulOp(OpKernelConstruction* context)
      |            ^
tensorflow/core/kernels/matmul_op_real.cc:27:15: note: in instantiation of member function 'tensorflow::BatchMatMulOp<Eigen::ThreadPoolDevice, long, long, long>::BatchMatMulOp' requested here
   27 | TF_CALL_int64(REGISTER_BATCH_MATMUL_CPU);
      |               ^
23 warnings and 1 error generated.
Target //tensorflow/tools/pip_package/v2:wheel failed to build
INFO: Elapsed time: 5321.938s, Critical Path: 1114.63s
INFO: 34048 processes: 9211 internal, 24837 local.
FAILED: Build did NOT complete successfully

QuesarVII · 2024-03-21T22:19:06Z

I just tested building without cuda. I ran configure again, provided "-march=native" as the "opt" compiler flag, and then built with --config=opt, without modifying anything other than setting TF_PYTHON_VERSION=3.10. That failed with the same static_cast error in MathFunctions.h. I changed the issue subject to remove the mention of CUDA since it happens without it too.

olegsolovey · 2024-03-21T23:21:33Z

Can you try disabling avx512_fp16? For gcc use --copt=-mno-avx512fp16.

tilakrayal · 2024-03-22T08:43:13Z

@QuesarVII,
Incorrectly mark vectorized casting as available with EIGEN_VECTORIZE_AVX512FP16. Since it doesn't exist, it's triggering all the other errors. Could you please try disabling the avx512_fp16 and try to compile. Thank you!

QuesarVII · 2024-03-22T13:03:27Z

Yes, avx512_fp16 triggers the issue. Using -mno-avx512fp16 results in a successful build. Thanks.

tilakrayal · 2024-03-22T15:41:02Z

@QuesarVII
Glad the issue is resolved for you, Could you please feel free to move this issue to closed status. Thank you!

QuesarVII · 2024-03-22T15:56:45Z

Shouldn't the code base be fixed to resolve this issue before closing it? Disabling compiler flags is more of a workaround than a real fix. Thanks!

tilakrayal · 2024-03-23T09:48:50Z

@QuesarVII,
This issue has been taken to the developer notice and apparently it might be disabled in the upcoming version releases. Also it is tracking internally. So that it will be resolved in the next versions. Thank you!

github-actions · 2024-03-31T01:48:30Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions · 2024-04-08T01:47:36Z

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2024-04-08T01:47:38Z

Are you satisfied with the resolution of your issue?
Yes
No

google-ml-butler bot added the type:build/install Build and install issues label Mar 21, 2024

google-ml-butler bot assigned tilakrayal Mar 21, 2024

QuesarVII changed the title ~~Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU if including CUDA support~~ Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU Mar 21, 2024

tilakrayal added TF 2.16 subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels Mar 22, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Mar 22, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 22, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Mar 22, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 22, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Mar 23, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Mar 31, 2024

github-actions bot closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU #64221

Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU #64221

QuesarVII commented Mar 21, 2024

QuesarVII commented Mar 21, 2024

olegsolovey commented Mar 21, 2024

tilakrayal commented Mar 22, 2024

QuesarVII commented Mar 22, 2024

tilakrayal commented Mar 22, 2024

QuesarVII commented Mar 22, 2024

tilakrayal commented Mar 23, 2024

github-actions bot commented Mar 31, 2024

github-actions bot commented Apr 8, 2024

google-ml-butler bot commented Apr 8, 2024

Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU #64221

Tensorflow 2.16.1 fails to compile with -march=native on Xeon 4410Y CPU #64221

Comments

QuesarVII commented Mar 21, 2024

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

QuesarVII commented Mar 21, 2024

olegsolovey commented Mar 21, 2024

tilakrayal commented Mar 22, 2024

QuesarVII commented Mar 22, 2024

tilakrayal commented Mar 22, 2024

QuesarVII commented Mar 22, 2024

tilakrayal commented Mar 23, 2024

github-actions bot commented Mar 31, 2024

github-actions bot commented Apr 8, 2024

google-ml-butler bot commented Apr 8, 2024