Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AARCH64] Building TF 2.15.0 from sources failed with undefined __Int8x8_t #62490

Open
smuzaffar opened this issue Nov 28, 2023 · 4 comments
Open
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.15 For issues related to 2.15.x type:build/install Build and install issues

Comments

@smuzaffar
Copy link
Contributor

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15.0

Custom code

Yes

OS platform and distribution

RHEL 8

Mobile device

No response

Python version

3.9

Bazel version

6.1.0

GCC/compiler version

GCC 12.3

CUDA/cuDNN version

Cuda 12.2 , cuDNN 8.8.0

GPU model and memory

No response

Current behavior?

Building TF 2.15.0 from sources for aarch64 fails with error like [a]. Note that building TF 2.15.0 from sources for x86_64 worked fine.

[a]

ERROR: <path>/tensorflow-2.15.0/tensorflow/core/kernels/BUILD:5131:18: Compiling tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc failed: (Exit 4): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //tensorflow/core/kernels:sparse_tensor_dense_matmul_op_gpu)

    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=compute_60,compute_70,compute_75,compute_80,compute_89 \
    TF_CUDA_PATHS=<path>/cudnn/8.8.0.121-7bc7095db72117b743b32c95e6e3687e \
    TF_CUDA_VERSION=12.2 \
    TF_SYSTEM_LIBS=absl_py,boringssl,com_github_grpc_grpc,curl,cython,eigen_archive,flatbuffers,gif,libjpeg_turbo,org_sqlite,pasta,png,pybind11,zlib \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/aarch64-opt/bin/tensorflow/core/kernels/_objs/sparse_tensor_dense_matmul_op_gpu/sparse_tensor_dense_matmul_op_gpu.cu.pic.d '-frandom-seed=bazel-out/aarch64-opt/bin/tensorflow/core/kernels/_objs/sparse_tensor_dense_matmul_op_gpu/sparse_tensor_dense_matmul_op_gpu.cu.pic.o' -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -DHAVE_SYS_UIO_H -DTF_USE_SNAPPY '-DBAZEL_CURRENT_REPOSITORY=""' -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_absl -iquote bazel-out/aarch64-opt/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/aarch64-opt/bin/external/nsync -iquote external/com_google_protobuf -iquote bazel-out/aarch64-opt/bin/external/com_google_protobuf -iquote external/local_tsl -iquote bazel-out/aarch64-opt/bin/external/local_tsl -iquote external/com_googlesource_code_re2 -iquote bazel-out/aarch64-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/aarch64-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/aarch64-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/aarch64-opt/bin/external/highwayhash -iquote external/gif -iquote bazel-out/aarch64-opt/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/aarch64-opt/bin/external/libjpeg_turbo -iquote external/zlib -iquote bazel-out/aarch64-opt/bin/external/zlib -iquote external/eigen_archive -iquote bazel-out/aarch64-opt/bin/external/eigen_archive -iquote external/ml_dtypes -iquote bazel-out/aarch64-opt/bin/external/ml_dtypes -iquote external/local_config_cuda -iquote bazel-out/aarch64-opt/bin/external/local_config_cuda -iquote external/snappy -iquote bazel-out/aarch64-opt/bin/external/snappy -iquote external/double_conversion -iquote bazel-out/aarch64-opt/bin/external/double_conversion -iquote external/nccl_archive -iquote bazel-out/aarch64-opt/bin/external/nccl_archive -iquote external/local_config_rocm -iquote bazel-out/aarch64-opt/bin/external/local_config_rocm -iquote external/local_config_tensorrt -iquote bazel-out/aarch64-opt/bin/external/local_config_tensorrt -iquote external/local_xla -iquote bazel-out/aarch64-opt/bin/external/local_xla -Ibazel-out/aarch64-opt/bin/external/ml_dtypes/_virtual_includes/float8 -Ibazel-out/aarch64-opt/bin/external/ml_dtypes/_virtual_includes/int4 -Ibazel-out/aarch64-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/aarch64-opt/bin/external/nccl_archive/_virtual_includes/nccl_config -Ibazel-out/aarch64-opt/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/nsync/public -isystem bazel-out/aarch64-opt/bin/external/nsync/public -isystem external/com_google_protobuf/src -isystem bazel-out/aarch64-opt/bin/external/com_google_protobuf/src -isystem external/farmhash_archive/src -isystem bazel-out/aarch64-opt/bin/external/farmhash_archive/src -isystem external/gif/include -isystem bazel-out/aarch64-opt/bin/external/gif/include -isystem external/libjpeg_turbo/include -isystem bazel-out/aarch64-opt/bin/external/libjpeg_turbo/include -isystem external/zlib -isystem bazel-out/aarch64-opt/bin/external/zlib -isystem external/eigen_archive/include/eigen3 -isystem bazel-out/aarch64-opt/bin/external/eigen_archive/include/eigen3 -isystem external/ml_dtypes -isystem bazel-out/aarch64-opt/bin/external/ml_dtypes -isystem external/ml_dtypes/ml_dtypes -isystem bazel-out/aarch64-opt/bin/external/ml_dtypes/ml_dtypes -isystem external/local_config_cuda/cuda -isystem bazel-out/aarch64-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/aarch64-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_rocm/rocm -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/aarch64-opt/bin/external/local_config_rocm/rocm/rocm/include/roctracer -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS '-march=armv8-a' -mno-outline-atomics -Wno-sign-compare '-std=c++17' '-std=c++17' -x cuda '-DGOOGLE_CUDA=1' '--cuda-include-ptx=sm_60' '--cuda-gpu-arch=sm_60' '--cuda-include-ptx=sm_70' '--cuda-gpu-arch=sm_70' '--cuda-include-ptx=sm_75' '--cuda-gpu-arch=sm_75' '--cuda-include-ptx=sm_80' '--cuda-gpu-arch=sm_80' '--cuda-include-ptx=sm_89' '--cuda-gpu-arch=sm_89' '-Xcuda-fatbinary=--compress-all' '-nvcc_options=expt-relaxed-constexpr' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DGOOGLE_CUDA=1' '-DTENSORFLOW_USE_NVCC=1' -pthread '-nvcc_options=relaxed-constexpr' '-nvcc_options=ftz=true' -c tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc -o bazel-out/aarch64-opt/bin/tensorflow/core/kernels/_objs/sparse_tensor_dense_matmul_op_gpu/sparse_tensor_dense_matmul_op_gpu.cu.pic.o)
# Configuration: b904123c2caf5d17d0cded6e4d2ae3e922ea3a81ef7f2bc493d95e1ad6f05410
# Execution platform: @local_execution_config_platform//:platform
In file included from external/local_xla/xla/stream_executor/device_options.h:27,
                 from external/local_xla/xla/stream_executor/platform.h:27,
                 from external/local_xla/xla/stream_executor/cuda/cuda_platform_id.h:19,
                 from ./tensorflow/core/platform/stream_executor.h:19,
                 from ./tensorflow/core/util/gpu_launch_config.h:27,
                 from ./tensorflow/core/util/gpu_kernel_helper.h:28,
                 from tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc:24:
external/com_google_absl/absl/log/check.h:57: warning: "CHECK" redefined
   57 | #define CHECK(condition) ABSL_CHECK_IMPL((condition), #condition)
      |
In file included from external/local_tsl/tsl/platform/logging.h:26,
                 from external/local_tsl/tsl/platform/refcount.h:23,
                 from ./tensorflow/core/platform/refcount.h:20,
                 from ./tensorflow/core/lib/core/refcount.h:19,
                 from ./tensorflow/core/framework/resource_base.h:23,
                 from ./tensorflow/core/framework/resource_handle.h:21,
                 from ./tensorflow/core/framework/register_types.h:21,
                 from tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc:21:
external/local_tsl/tsl/platform/default/logging.h:308: note: this is the location of the previous definition
  308 | #define CHECK(condition)              \
      |
external/com_google_absl/absl/log/check.h:65: warning: "QCHECK" redefined
   65 | #define QCHECK(condition) ABSL_QCHECK_IMPL((condition), #condition)
      |
external/local_tsl/tsl/platform/default/logging.h:542: note: this is the location of the previous definition
  542 | #define QCHECK(condition) CHECK(condition)
      |
external/com_google_absl/absl/log/check.h:88: warning: "DCHECK" redefined
   88 | #define DCHECK(condition) ABSL_DCHECK_IMPL((condition), #condition)
....
....
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(40): error: identifier "__Int8x8_t" is undefined
  typedef __Int8x8_t int8x8_t;
          ^
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(41): error: identifier "__Int16x4_t" is undefined
  typedef __Int16x4_t int16x4_t;
          ^
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(42): error: identifier "__Int32x2_t" is undefined
  typedef __Int32x2_t int32x2_t;
          ^
<gcc-12.3>/bin/../lib/gcc/aarch64-redhat-linux-gnu/12.3.1/include/arm_neon.h(43): error: identifier "__Int64x1_t" is undefined
  typedef __Int64x1_t int64x1_t;

Standalone code to reproduce the issue

it is a build/configure issue

Relevant log output

No response

@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label Nov 28, 2023
@SuryanarayanaY SuryanarayanaY added TF 2.15 For issues related to 2.15.x subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels Nov 28, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 29, 2023
@richgrove
Copy link

I have the same build problem on the Arm64 with CPU:
processor : 0 model name : ARMv8 Processor rev 1 (v8l) BogoMIPS : 62.50 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd42 CPU revision : 1

The CPU doesn't have the NEON feature at all. It needs to turn off Arm-neon support at this platform. Any suggestion to turn off?

@adamjstewart
Copy link

Still broken on 2.16.1

@aweits
Copy link

aweits commented Apr 25, 2024

This seems to work around the issue for me - built on grace hopper, TF 2.16.1 using spack.

Change to TF:

diff --git a/third_party/absl/workspace.bzl b/third_party/absl/workspace.bzl
index 06f75166ce4b..56d146d65abe 100644
--- a/third_party/absl/workspace.bzl
+++ b/third_party/absl/workspace.bzl
@@ -42,6 +42,7 @@ def repo():
         build_file = "//third_party/absl:com_google_absl.BUILD",
         system_build_file = "//third_party/absl:system.BUILD",
         system_link_files = SYS_LINKS,
+       patch_file = ["//third_party/absl:absl_neon.patch"],
         strip_prefix = "abseil-cpp-{commit}".format(commit = ABSL_COMMIT),
         urls = tf_mirror_urls("https://github.com/abseil/abseil-cpp/archive/{commit}.tar.gz".format(commit = ABSL_COMMIT)),
     )

Patch for absl:

diff --git a/absl/base/config.h b/absl/base/config.h
index 5fa9f0efe5a4..741e320fe40c 100644
--- a/absl/base/config.h
+++ b/absl/base/config.h
@@ -962,7 +962,7 @@ static_assert(ABSL_INTERNAL_INLINE_NAMESPACE_STR[0] != 'h' ||
 // https://llvm.org/docs/CompileCudaWithLLVM.html#detecting-clang-vs-nvcc-from-code
 #ifdef ABSL_INTERNAL_HAVE_ARM_NEON
 #error ABSL_INTERNAL_HAVE_ARM_NEON cannot be directly set
-#elif defined(__ARM_NEON) && !defined(__CUDA_ARCH__)
+#elif defined(__ARM_NEON) && !defined(__CUDACC__)
 #define ABSL_INTERNAL_HAVE_ARM_NEON 1
 #endif

@adamjstewart
Copy link

This was fixed in the latest version of absl: abseil/abseil-cpp#1732. Can we bump the vendored copy of absl before the next release? Then we can finally close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.15 For issues related to 2.15.x type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

6 participants