[AARCH64] Building TF 2.15.0 from sources failed with undefined __Int8x8_t #62490

smuzaffar · 2023-11-28T10:51:12Z

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15.0

Custom code

Yes

OS platform and distribution

RHEL 8

Mobile device

No response

Python version

3.9

Bazel version

6.1.0

GCC/compiler version

GCC 12.3

CUDA/cuDNN version

Cuda 12.2 , cuDNN 8.8.0

GPU model and memory

No response

Current behavior?

Building TF 2.15.0 from sources for aarch64 fails with error like [a]. Note that building TF 2.15.0 from sources for x86_64 worked fine.

[a]

ERROR: <path>/tensorflow-2.15.0/tensorflow/core/kernels/BUILD:5131:18: Compiling tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc failed: (Exit 4): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //tensorflow/core/kernels:sparse_tensor_dense_matmul_op_gpu)

    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=compute_60,compute_70,compute_75,compute_80,compute_89 \
    TF_CUDA_PATHS=<path>/cudnn/8.8.0.121-7bc7095db72117b743b32c95e6e3687e \
    TF_CUDA_VERSION=12.2 \
    TF_SYSTEM_LIBS=absl_py,boringssl,com_github_grpc_grpc,curl,cython,eigen_archive,flatbuffers,gif,libjpeg_turbo,org_sqlite,pasta,png,pybind11,zlib \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/aarch64-opt/bin/tensorflow/core/kernels/_objs/sparse_tensor_dense_matmul_op_gpu/sparse_tensor_dense_matmul_op_gpu.cu.pic.d '-frandom-seed=bazel-out/aarch64-opt/bin/tensorflow/core/kernels/_objs/sparse_tensor_dense_matmul_op_gpu/sparse_tensor_dense_matmul_op_gpu.cu.pic.o' -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -DHAVE_SYS_UIO_H -DTF_USE_SNAPPY '-DBAZEL_CURRENT_REPOSITORY=""' -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_absl -iquote bazel-out/aarch64-opt/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/aarch64-opt/bin/external/nsync -iquote external/com_google_protobuf -iquote bazel-out/aarch64-opt/bin/external/com_google_protobuf -iquote external/local_tsl -iquote bazel-out/aarch64-opt/bin/external/local_tsl -iquote external/com_googlesource_code_re2 -iquote bazel-out/aarch64-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/aarch64-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/aarch64-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/aarch64-opt/bin/external/highwayhash -iquote external/gif -iquote bazel-out/aarch64-opt/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/aarch64-opt/bin/external/libjpeg_turbo -iquote external/zlib -iquote bazel-out/aarch64-opt/bin/external/zlib -iquote external/eigen_archive -iquote bazel-out/aarch64-opt/bin/external/eigen_archive -iquote external/ml_dtypes -iquote bazel-out/aarch64-opt/bin/external/ml_dtypes -iquote external/local_config_cuda -iquote bazel-out/aarch64-opt/bin/external/local_config_cuda -iquote external/snappy -iquote bazel-out/aarch64-opt/bin/external/snappy -iquote external/double_conversion -iquote bazel-out/aarch64-opt/bin/external/double_conversion -iquote external/nccl_archive -iquote bazel-out/aarch64-opt/bin/external/nccl_archive -iquote external/local_config_rocm -iquote bazel-out/aarch64-opt/bin/external/local_config_rocm -iquote external/local_config_tensorrt -iquote bazel-out/aarch64-opt/bin/external/local_config_tensorrt -iquote external/local_xla -iquote bazel-out/aarch64-opt/bin/external/local_xla -Ibazel-out/aarch64-opt/bin/external/ml_dtypes/_virtual_includes/float8 -Ibazel-out/aarch64-opt/bin/external/ml_dtypes/_virtual_includes/int4 -Ibazel-out/aarch64-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/aarch64-opt/bin/external/nccl_archive/_virtual_includes/nccl_config -Ibazel-out/aarch64-opt/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/nsync/public -isystem bazel-out/aarch64-opt/bin/external/nsync/public -isystem external/com_google_protobuf/src -isystem bazel-out/aarch64-opt/bin/external/com_google_protobuf/src -isystem external/farmhash_archive/src -isystem bazel-out/aarch64-opt/bin/external/farmhash_archive/src -isystem external/gif/include -isystem bazel-out/aarch64-opt/bin/external/gif/include -isystem external/libjpeg_turbo/include -isystem bazel-out/aarch64-opt/bin/external/libjpeg_turbo/include -isystem external/zlib -isystem bazel-out/aarch64-opt/bin/external/zlib -isystem external/eigen_archive/include/eigen3 -isystem bazel-out/aarch64-opt/bin/external/eigen_archive/include/eigen3 -isystem external/ml_dtypes -isystem bazel-out/aarch64-opt/bin/external/ml_dtypes -isystem external/ml_dtypes/ml_dtypes -isystem bazel-out/aarch64-opt/bin/external/ml_dtypes/ml_dtypes -isystem external/local_config_cuda/cuda -isystem bazel-out/aarch64-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/aarch64-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_rocm/rocm -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm/rocm/include/roctracer -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS '-march=armv8-a' -mno-outline-atomics -Wno-sign-compare '-std=c++17' '-std=c++17' -x cuda '-DGOOGLE_CUDA=1' '--cuda-include-ptx=sm_60' '--cuda-gpu-arch=sm_60' '--cuda-include-ptx=sm_70' '--cuda-gpu-arch=sm_70' '--cuda-include-ptx=sm_75' '--cuda-gpu-arch=sm_75' '--cuda-include-ptx=sm_80' '--cuda-gpu-arch=sm_80' '--cuda-include-ptx=sm_89' '--cuda-gpu-arch=sm_89' '-Xcuda-fatbinary=--compress-all' '-nvcc_options=expt-relaxed-constexpr' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DGOOGLE_CUDA=1' '-DTENSORFLOW_USE_NVCC=1' -pthread '-nvcc_options=relaxed-constexpr' '-nvcc_options=ftz=true' -c tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc -o bazel-out/aarch64-opt/bin/tensorflow/core/kernels/_objs/sparse_tensor_dense_matmul_op_gpu/sparse_tensor_dense_matmul_op_gpu.cu.pic.o)
# Configuration: b904123c2caf5d17d0cded6e4d2ae3e922ea3a81ef7f2bc493d95e1ad6f05410
# Execution platform: @local_execution_config_platform//:platform
In file included from external/local_xla/xla/stream_executor/device_options.h:27,
                 from external/local_xla/xla/stream_executor/platform.h:27,
                 from external/local_xla/xla/stream_executor/cuda/cuda_platform_id.h:19,
                 from ./tensorflow/core/platform/stream_executor.h:19,
                 from ./tensorflow/core/util/gpu_launch_config.h:27,
                 from ./tensorflow/core/util/gpu_kernel_helper.h:28,
                 from tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc:24:
external/com_google_absl/absl/log/check.h:57: warning: "CHECK" redefined
   57 | #define CHECK(condition) ABSL_CHECK_IMPL((condition), #condition)
      |
In file included from external/local_tsl/tsl/platform/logging.h:26,
                 from external/local_tsl/tsl/platform/refcount.h:23,
                 from ./tensorflow/core/platform/refcount.h:20,
                 from ./tensorflow/core/lib/core/refcount.h:19,
                 from ./tensorflow/core/framework/resource_base.h:23,
                 from ./tensorflow/core/framework/resource_handle.h:21,
                 from ./tensorflow/core/framework/register_types.h:21,
                 from tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc:21:
external/local_tsl/tsl/platform/default/logging.h:308: note: this is the location of the previous definition
  308 | #define CHECK(condition)              \
      |
external/com_google_absl/absl/log/check.h:65: warning: "QCHECK" redefined
   65 | #define QCHECK(condition) ABSL_QCHECK_IMPL((condition), #condition)
      |
external/local_tsl/tsl/platform/default/logging.h:542: note: this is the location of the previous definition
  542 | #define QCHECK(condition) CHECK(condition)
      |
external/com_google_absl/absl/log/check.h:88: warning: "DCHECK" redefined
   88 | #define DCHECK(condition) ABSL_DCHECK_IMPL((condition), #condition)
....
....
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(40): error: identifier "__Int8x8_t" is undefined
  typedef __Int8x8_t int8x8_t;
          ^
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(41): error: identifier "__Int16x4_t" is undefined
  typedef __Int16x4_t int16x4_t;
          ^
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(42): error: identifier "__Int32x2_t" is undefined
  typedef __Int32x2_t int32x2_t;
          ^
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(43): error: identifier "__Int64x1_t" is undefined
  typedef __Int64x1_t int64x1_t;

Standalone code to reproduce the issue

it is a build/configure issue

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

richgrove · 2024-03-12T18:44:09Z

I have the same build problem on the Arm64 with CPU:
processor : 0 model name : ARMv8 Processor rev 1 (v8l) BogoMIPS : 62.50 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd42 CPU revision : 1

The CPU doesn't have the NEON feature at all. It needs to turn off Arm-neon support at this platform. Any suggestion to turn off?

adamjstewart · 2024-04-17T20:27:57Z

Still broken on 2.16.1

aweits · 2024-04-25T16:02:45Z

This seems to work around the issue for me - built on grace hopper, TF 2.16.1 using spack.

Change to TF:

diff --git a/third_party/absl/workspace.bzl b/third_party/absl/workspace.bzl
index 06f75166ce4b..56d146d65abe 100644
--- a/third_party/absl/workspace.bzl
+++ b/third_party/absl/workspace.bzl
@@ -42,6 +42,7 @@ def repo():
         build_file = "//third_party/absl:com_google_absl.BUILD",
         system_build_file = "//third_party/absl:system.BUILD",
         system_link_files = SYS_LINKS,
+       patch_file = ["//third_party/absl:absl_neon.patch"],
         strip_prefix = "abseil-cpp-{commit}".format(commit = ABSL_COMMIT),
         urls = tf_mirror_urls("https://github.com/abseil/abseil-cpp/archive/{commit}.tar.gz".format(commit = ABSL_COMMIT)),
     )

Patch for absl:

diff --git a/absl/base/config.h b/absl/base/config.h
index 5fa9f0efe5a4..741e320fe40c 100644
--- a/absl/base/config.h
+++ b/absl/base/config.h
@@ -962,7 +962,7 @@ static_assert(ABSL_INTERNAL_INLINE_NAMESPACE_STR[0] != 'h' ||
 // https://llvm.org/docs/CompileCudaWithLLVM.html#detecting-clang-vs-nvcc-from-code
 #ifdef ABSL_INTERNAL_HAVE_ARM_NEON
 #error ABSL_INTERNAL_HAVE_ARM_NEON cannot be directly set
-#elif defined(__ARM_NEON) && !defined(__CUDA_ARCH__)
+#elif defined(__ARM_NEON) && !defined(__CUDACC__)
 #define ABSL_INTERNAL_HAVE_ARM_NEON 1
 #endif

adamjstewart · 2024-08-04T11:10:11Z

This was fixed in the latest version of absl: abseil/abseil-cpp#1732. Can we bump the vendored copy of absl before the next release? Then we can finally close this issue.

google-ml-butler bot added the type:build/install Build and install issues label Nov 28, 2023

google-ml-butler bot assigned SuryanarayanaY Nov 28, 2023

SuryanarayanaY added TF 2.15 For issues related to 2.15.x subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels Nov 28, 2023

SuryanarayanaY assigned sachinprasadhs and unassigned SuryanarayanaY Nov 28, 2023

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 29, 2023

adamjstewart mentioned this issue Apr 17, 2024

ML CI: Linux aarch64 spack/spack#39666

Open

aweits mentioned this issue May 1, 2024

[Bug]: arm_neon.h inclusion causing issues abseil/abseil-cpp#1665

Closed

tilakrayal mentioned this issue May 10, 2024

Failed to compile on aarch64 #67251

Open

jonatanklosko mentioned this issue May 21, 2024

Fix ARM CUDA build elixir-nx/xla#85

Merged

kylestach mentioned this issue Jun 28, 2024

Error Building Jaxlib v0.4.30 on Jetson Orin jax-ml/jax#22155

Open

smuzaffar mentioned this issue Aug 4, 2024

TF2.15: Apply abseil aarch64 patch cms-sw/cmsdist#9341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AARCH64] Building TF 2.15.0 from sources failed with undefined __Int8x8_t #62490

[AARCH64] Building TF 2.15.0 from sources failed with undefined __Int8x8_t #62490

smuzaffar commented Nov 28, 2023

richgrove commented Mar 12, 2024

adamjstewart commented Apr 17, 2024

aweits commented Apr 25, 2024

adamjstewart commented Aug 4, 2024

[AARCH64] Building TF 2.15.0 from sources failed with undefined __Int8x8_t #62490

[AARCH64] Building TF 2.15.0 from sources failed with undefined __Int8x8_t #62490

Comments

smuzaffar commented Nov 28, 2023

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

richgrove commented Mar 12, 2024

adamjstewart commented Apr 17, 2024

aweits commented Apr 25, 2024

adamjstewart commented Aug 4, 2024