Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed #40688

Closed
mmartial opened this issue Jun 22, 2020 · 23 comments
Closed

C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed #40688

mmartial opened this issue Jun 22, 2020 · 23 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.2 Issues related to TF 2.2 type:build/install Build and install issues

Comments

@mmartial
Copy link

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04 -- building inside Dockerfile with FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
  • TensorFlow installed from: source
  • TensorFlow version: 2.2.0
  • Python version: 3.6.9
  • Installed using virtualenv? pip? conda?: No
  • Bazel version: 2.0.0 (extracted from _TF_MAX_BAZEL)
  • GCC/Compiler version: 7.4.0
  • CUDA/cuDNN version: 10.1 / 7
  • GPU model and memory: tested on Titan XP and RTX 2070 8GB

Describe the problem

Build fails with

ESC[0mESC[91mtensorflow/python/lib/core/bfloat16.cc: In function 'bool tensorflow::{anonymous}::Initialize()':
tensorflow/python/lib/core/bfloat16.cc:636:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const c
har [6], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int
*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:640:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const c
har [10], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int
*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:643:77: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const c
har [5], <unresolved overloaded function type>, const std::array<int, 3>&)'
   if (!register_ufunc("less", CompareUFunc<Bfloat16LtFunctor>, compare_types)) {

                                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:647:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [8], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:651:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [11], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:655:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [14], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:610:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
ESC[0mESC[91mTarget //tensorflow/tools/pip_package:build_pip_package failed to build
ESC[0mESC[91mERROR: /usr/local/src/tensorflow/tensorflow/tools/pip_package/BUILD:62:1 C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)
ESC[0mESC[91mINFO: Elapsed time: 1828.057s, Critical Path: 881.14s
INFO: 13824 processes: 13824 local.
ESC[0mESC[91mFAILED: Build did NOT complete successfully
ESC[0mESC[91mFAILED: Build did NOT complete successfully
ESC[0mESC[91mCommand exited with non-zero status 1

Provide the exact sequence of commands / steps that you executed before running into the problem

Reproducible with the following Dockerfile

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04

# Install system packages
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y \
  && apt-get install -y --no-install-recommends apt-utils \
  && apt-get install -y \
    build-essential \
    checkinstall \
    cmake \
    curl \
    g++ \
    gcc \
    git \
    locales \
    perl \
    pkg-config \
    protobuf-compiler \
    python3-dev \
    rsync \
    software-properties-common \
    unzip \
    wget \
    zip \
    zlib1g-dev \
  && apt-get clean

# UTF-8
RUN localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8

# Setup pip
RUN wget -q -O /tmp/get-pip.py --no-check-certificate https://bootstrap.pypa.io/get-pip.py \
  && python3 /tmp/get-pip.py \
  && pip3 install -U pip \
  && rm /tmp/get-pip.py
# Some TF tools expect a "python" binary
RUN ln -s $(which python3) /usr/local/bin/python

# /etc/ld.so.conf.d/nvidia.conf point to /usr/local/nvidia which seems to be missing, point to the cuda directory install for libraries
RUN cd /usr/local && ln -s cuda nvidia
ARG CTO_CUDA_VERSION="10.1"
ARG CTO_CUDA_PRIMEVERSION="10.0"
ARG CTO_CUDA_APT="cuda-npp-${CTO_CUDA_VERSION} cuda-cublas-${CTO_CUDA_PRIMEVERSION} cuda-cufft-${CTO_CUDA_VERSION} cuda-libraries-${CTO_CUDA_VERSION} cuda-npp-dev-${CTO_CUDA_VERSION} cuda-cublas-dev-${CTO_CUDA_PRIMEVERSION} cuda-cufft-dev-${CTO_CUDA_VERSION} cuda-libraries-dev-${CTO_CUDA_VERSION}"
RUN apt-get install -y --no-install-recommends \
  time ${CTO_CUDA_APT} \
  && apt-get clean

ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64"

# Install Python tools 
RUN pip3 install -U \
  mock \
  numpy \
  setuptools \
  six \
  wheel \
  && pip3 install 'future>=0.17.1' \
  && pip3 install -U keras_applications --no-deps \
  && pip3 install -U keras_preprocessing --no-deps \
  && rm -rf /root/.cache/pip

## Download & Building TensorFlow from source
ARG LATEST_BAZELISK=1.5.0
ARG CTO_TENSORFLOW_VERSION="2.2.0"
RUN curl -s -Lo /usr/local/bin/bazel https://github.com/bazelbuild/bazelisk/releases/download/v${LATEST_BAZELISK}/bazelisk-linux-amd64 \
  && chmod +x /usr/local/bin/bazel \
  && mkdir -p /usr/local/src \
  && cd /usr/local/src \
  && wget -q --no-check-certificate https://github.com/tensorflow/tensorflow/archive/v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && tar xfz v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && mv tensorflow-${CTO_TENSORFLOW_VERSION} tensorflow \
  && rm v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && cd /usr/local/src/tensorflow \
  && fgrep _TF_MAX_BAZEL configure.py | grep '=' | perl -ne 'print $1 if (m%\=\s+.([\d\.]+).$+%)' > .bazelversion
RUN cd /usr/local/src/tensorflow \
  && TF_CUDA_CLANG=0 TF_CUDA_VERSION=${CTO_CUDA_VERSION} TF_CUDNN_VERSION=7 TF_DOWNLOAD_CLANG=0 TF_DOWNLOAD_MKL=0 TF_ENABLE_XLA=0 TF_NEED_AWS=0 TF_NEED_COMPUTECPP=0 TF_NEED_CUDA=1 TF_NEED_GCP=0 TF_NEED_GDR=0 TF_NEED_HDFS=0 TF_NEED_JEMALLOC=1 TF_NEED_KAFKA=0 TF_NEED_MKL=0 TF_NEED_MPI=0 TF_NEED_OPENCL=0 TF_NEED_OPENCL_SYCL=0 TF_NEED_ROCM=0 TF_NEED_S3=0 TF_NEED_TENSORRT=0 TF_NEED_VERBS=0 TF_SET_ANDROID_WORKSPACE=0 TF_CUDA_COMPUTE_CAPABILITIES="5.3,6.0,6.1,6.2,7.0,7.2,7.5" GCC_HOST_COMPILER_PATH=$(which gcc) CC_OPT_FLAGS="-march=native" PYTHON_BIN_PATH=$(which python) PYTHON_LIB_PATH="$(python -c 'import site; print(site.getsitepackages()[0])')" ./configure
RUN cd /usr/local/src/tensorflow \
  && time bazel build --verbose_failures --config=opt --config=v2 --config=cuda //tensorflow/tools/pip_package:build_pip_package \
  && time ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg \
  && time pip3 install /tmp/tensorflow_pkg/tensorflow-*.whl

CMD bash

Built using docker build --tag cto:test .

Note tested with CUDA 10.1, 10.0 and 10.2.
Also occurs with TF 1.15.3

Any other info / logs
I can provide the full build log if requested (91MB)

 ---> Running in 9690386205a5
2020/06/22 14:11:17 Downloading https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64...
Extracting Bazel installation...
You have bazel 2.0.0 installed.
Found CUDA 10.1 in:
    /usr/local/cuda-10.1/lib64
    /usr/local/cuda-10.1/include
Found cuDNN 7 in:
    /usr/lib/x86_64-linux-gnu
    /usr/include


Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
Removing intermediate container 9690386205a5
 ---> 8910acc4d9c5
Step 19/20 : RUN cd /usr/local/src/tensorflow   && time bazel build --verbose_failures --config=opt --config=v2 --config=cuda //tensorflow/tools/pip_package:build_pip_package   && time ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg   && time pip3 install /tmp/tensorflow_pkg/tensorflow-*.whl
 ---> Running in 3b0267b1209d
ESC[91mStarting local Bazel server and connecting to it...
ESC[0mESC[91mWARNING: The following configs were expanded more than once: [v2, cuda, using_cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ESC[0mESC[91mINFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
ESC[0mESC[91mINFO: Reading rc options for 'build' from /usr/local/src/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /usr/local/src/tensorflow/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=v2
INFO: Reading rc options for 'build' from /usr/local/src/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python --action_env PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages --python_path=/usr/local/bin/python --action_env TF_CUDA_VERSION=10.1 --action_env TF_CUDNN_VERSION=7 --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-10.1 --action_env TF_CUDA_COMPUTE_CAPABILITIES=5.3,6.0,6.1,6.2,7.0,7.2,7.5 --action_env LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/extras/CUPTI/lib64 --action_env GCC_HOST_COMPILER_PATH=/usr/bin/x86_64-linux-gnu-gcc-7 --config=cuda --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:v2 in file /usr/local/src/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
ESC[0mESC[91mINFO: Found applicable config definition build:cuda in file /usr/local/src/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /usr/local/src/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain
INFO: Found applicable config definition build:opt in file /usr/local/src/tensorflow/.tf_configure.bazelrc: --copt=-march=native --host_copt=-march=native --define with_default_optimizations=true
INFO: Found applicable config definition build:v2 in file /usr/local/src/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /usr/local/src/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /usr/local/src/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain
INFO: Found applicable config definition build:linux in file /usr/local/src/tensorflow/.bazelrc: --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /usr/local/src/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
ESC[0mESC[91mLoading: 
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mDEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
ESC[0mESC[91mDEBUG: Call stack for the definition of repository 'io_bazel_rules_docker' which is a git_repository (rule definition at /root/.cache/bazel/_bazel_root/bbcc73fcc5c2b01ab08b6bcf7c29e42e/external/bazel_tools/tools/build_defs/repo/git.bzl:195:18):
 - /root/.cache/bazel/_bazel_root/bbcc73fcc5c2b01ab08b6bcf7c29e42e/external/bazel_toolchains/repositories/repositories.bzl:37:9
 - /usr/local/src/tensorflow/WORKSPACE:37:1
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
ESC[0mESC[91mLoading: 0 packages loaded
    currently loading: tensorflow/tools/pip_package
ESC[0mESC[91mDEBUG: /root/.cache/bazel/_bazel_root/bbcc73fcc5c2b01ab08b6bcf7c29e42e/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:5: 
[...]```


@mmartial mmartial added the type:build/install Build and install issues label Jun 22, 2020
@adk9
Copy link

adk9 commented Jun 22, 2020

Possibly duplicate of #40654? I'm also seeing the same issue with v2.2.0 and GCC 7.5.0.

@mmartial
Copy link
Author

mmartial commented Jun 22, 2020

Possibly duplicate of #40654? I'm also seeing the same issue with v2.2.0 and GCC 7.5.0.

Thank you :)
I looked at the PR, and am integrating this change into the Dockerfile:
&& perl -pi.bak -e 's%, CompareUFunc%, (PyUFuncGenericFunction) CompareUFunc%g' tensorflow/python/lib/core/bfloat16.cc \
right before the ./configure step.

Will report if this fixes the build

@mmartial
Copy link
Author

mmartial commented Jun 22, 2020

Confirming that this solves the build issue (for 2.20 and 10.1):

Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package

Going to check for 2.20 and 10.2 then 1.15.3 and 10.2

@adk9
Copy link

adk9 commented Jun 23, 2020

In our testing, we found that this issue breaks building from source for TF 1.15.x and 2.x.

The issue comes from source build being incompatible with numpy 1.19.0 which has a breaking ABI change (numpy/numpy#15355) and was released 2 days ago.

Fixing numpy to pre 1.19.0 fixes the issue:

pip install numpy<1.19.0

adk9 added a commit to adk9/docs that referenced this issue Jun 23, 2020
Fix numpy to pre-1.19.0 because of breaking ABI change in numpy 1.19.0 (numpy/numpy#15355)

See tensorflow/tensorflow#40688.
@mmartial
Copy link
Author

mmartial commented Jun 23, 2020

Thank you, will force numpy<1.19.0 for the time being.

Also confirming 2.20 and 10.2 compiles with the PyUFuncGenericFunction fix

@mmartial
Copy link
Author

Confirming successful compilation on 2.20 and 10.2 with numpy<1.19.0.

Okay to close the issue.

I have different problems with 1.15.3 and nvlink (with 10.0 and 10.1) but if I can not resolve, I will open a different ticket.

@cbalint13
Copy link
Contributor

@mmartial ,

On behalf #40654 thank you for investigation !

@amahendrakar
Copy link
Contributor

Okay to close the issue.

Marking the issue as closed, as it is resolved. Please feel free to re-open the issue if required. Thanks!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@amahendrakar amahendrakar added subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.2 Issues related to TF 2.2 labels Jun 23, 2020
@xlnwel
Copy link

xlnwel commented Jun 23, 2020

Hi, @mmartial. I also run into the same issue. I've downgraded numpy to 1.18.5 but it did not fix the problem. Here's the error message I received

tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
const std::array<int, 3>& types) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: no known conversion for argument 2 from ‘’ to ‘PyUFuncGenericFunction {aka void ()(char**, const long int, const long int*, void*)}’
tensorflow/python/lib/core/bfloat16.cc:640:36: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [10], , const std::array<int, 3>&)’
compare_types)) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
const std::array<int, 3>& types) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: no known conversion for argument 2 from ‘’ to ‘PyUFuncGenericFunction {aka void ()(char**, const long int, const long int*, void*)}’
tensorflow/python/lib/core/bfloat16.cc:643:77: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [5], , const std::array<int, 3>&)’
if (!register_ufunc("less", CompareUFunc, compare_types)) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
const std::array<int, 3>& types) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: no known conversion for argument 2 from ‘’ to ‘PyUFuncGenericFunction {aka void ()(char**, const long int, const long int*, void*)}’
tensorflow/python/lib/core/bfloat16.cc:647:36: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [8], , const std::array<int, 3>&)’
compare_types)) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
const std::array<int, 3>& types) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: no known conversion for argument 2 from ‘’ to ‘PyUFuncGenericFunction {aka void ()(char**, const long int, const long int*, void*)}’
tensorflow/python/lib/core/bfloat16.cc:651:36: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [11], , const std::array<int, 3>&)’
compare_types)) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
const std::array<int, 3>& types) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: no known conversion for argument 2 from ‘’ to ‘PyUFuncGenericFunction {aka void ()(char**, const long int, const long int*, void*)}’
tensorflow/python/lib/core/bfloat16.cc:655:36: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [14], , const std::array<int, 3>&)’
compare_types)) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
const std::array<int, 3>& types) {
^
tensorflow/python/lib/core/bfloat16.cc:610:60: note: no known conversion for argument 2 from ‘’ to ‘PyUFuncGenericFunction {aka void ()(char**, const long int, const long int*, void*)}’
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/aptx4869/github/tensorflow/tensorflow/tools/pip_package/BUILD:62:1 C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)
INFO: Elapsed time: 24.977s, Critical Path: 13.97s
INFO: 2 processes: 2 local.
FAILED: Build did NOT complete successfully

It seems related to PyUFuncGenericFunction which you mentioned to "fix". How should I do it?

Here's my environment information:

Ubuntu: 18.04
TF: r2.2 (trying to build from source but failed)
CUDA: 10.2
CuDNN: 7.6.5
python: 3.7.7
Bazel: 2.0.0

And here's the output of pip list in case you need:

certifi 2020.6.20
decorator 4.4.0
future 0.18.2
h5py 2.10.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
mock 4.0.2
numpy 1.18.5
pip 20.1.1
setuptools 47.3.1.post20200622
six 1.15.0
wheel 0.34.2

@mmartial
Copy link
Author

mmartial commented Jun 23, 2020

tensorflow/python/lib/core/bfloat16.cc:655:36: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [14], , const std::array<int, 3>&)’
compare_types)) {

@xlnwel Looking at the above, I wonder: did you use both the PR (or the Perl command) and the numpy<1.19.0 ?

To make it work, I had to use either of those.

I am putting below the updated Dockerfile hoping it works for you:

ARG CTO_CUDA_VERSION="10.2"
FROM nvidia/cuda:${CTO_CUDA_VERSION}-cudnn7-devel-ubuntu18.04
ARG CTO_CUDA_VERSION="10.2"

# Install system packages
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y \
  && apt-get install -y --no-install-recommends apt-utils \
  && apt-get install -y \
    build-essential \
    checkinstall \
    cmake \
    curl \
    g++ \
    gcc \
    git \
    locales \
    perl \
    pkg-config \
    protobuf-compiler \
    python3-dev \
    rsync \
    software-properties-common \
    unzip \
    wget \
    zip \
    zlib1g-dev \
  && apt-get clean

# UTF-8
RUN localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8

# Setup pip
RUN wget -q -O /tmp/get-pip.py --no-check-certificate https://bootstrap.pypa.io/get-pip.py \
  && python3 /tmp/get-pip.py \
  && pip3 install -U pip \
  && rm /tmp/get-pip.py
# Some TF tools expect a "python" binary
RUN ln -s $(which python3) /usr/local/bin/python

# /etc/ld.so.conf.d/nvidia.conf point to /usr/local/nvidia which seems to be missing, point to the cuda directory install for libraries
RUN cd /usr/local && ln -s cuda nvidia
ARG CTO_CUDA_PRIMEVERSION="10.0"
ARG CTO_CUDA_APT="cuda-npp-${CTO_CUDA_VERSION} cuda-cublas-${CTO_CUDA_PRIMEVERSION} cuda-cufft-${CTO_CUDA_VERSION} cuda-libraries-${CTO_CUDA_VERSION} cuda-npp-dev-${CTO_CUDA_VERSION} cuda-cublas-dev-${CTO_CUDA_PRIMEVERSION} cuda-cufft-dev-${CTO_CUDA_VERSION} cuda-libraries-dev-${CTO_CUDA_VERSION}"
RUN echo ${CTO_CUDA_APT}
RUN apt-get install -y --no-install-recommends \
  time ${CTO_CUDA_APT} \
  && apt-get clean

# Install TensorRT. Requires that libcudnn7 is installed
#RUN apt-get install -y --no-install-recommends \
#  libnvinfer6=6.0.1-1+cuda${CTO_CUDA_VERSION} \
#  libnvinfer-dev=6.0.1-1+cuda${CTO_CUDA_VERSION} \
#  libnvinfer-plugin6=6.0.1-1+cuda${CTO_CUDA_VERSION} \
#  && apt-get clean

ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64"

# Install Python tools 
RUN pip3 install -U \
  mock \
  'numpy<1.19.0' \
  setuptools \
  six \
  wheel \
  && pip3 install 'future>=0.17.1' \
  && pip3 install -U keras_applications --no-deps \
  && pip3 install -U keras_preprocessing --no-deps \
  && rm -rf /root/.cache/pip

## Download & Building TensorFlow from source
ARG LATEST_BAZELISK=1.5.0
ARG CTO_TENSORFLOW_VERSION="2.2.0"
RUN curl -s -Lo /usr/local/bin/bazel https://github.com/bazelbuild/bazelisk/releases/download/v${LATEST_BAZELISK}/bazelisk-linux-amd64 \
  && chmod +x /usr/local/bin/bazel \
  && mkdir -p /usr/local/src \
  && cd /usr/local/src \
  && wget -q --no-check-certificate https://github.com/tensorflow/tensorflow/archive/v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && tar xfz v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && mv tensorflow-${CTO_TENSORFLOW_VERSION} tensorflow \
  && rm v${CTO_TENSORFLOW_VERSION}.tar.gz \
  && cd /usr/local/src/tensorflow \
  && fgrep _TF_MAX_BAZEL configure.py | grep '=' | perl -ne 'print $1 if (m%\=\s+.([\d\.]+).$+%)' > .bazelversion
RUN cd /usr/local/src/tensorflow \
  && TF_CUDA_CLANG=0 TF_CUDA_VERSION=${CTO_CUDA_VERSION} TF_CUDNN_VERSION=7 TF_DOWNLOAD_CLANG=0 TF_DOWNLOAD_MKL=0 TF_ENABLE_XLA=0 TF_NEED_AWS=0 TF_NEED_COMPUTECPP=0 TF_NEED_CUDA=1 TF_NEED_GCP=0 TF_NEED_GDR=0 TF_NEED_HDFS=0 TF_NEED_JEMALLOC=1 TF_NEED_KAFKA=0 TF_NEED_MKL=0 TF_NEED_MPI=0 TF_NEED_OPENCL=0 TF_NEED_OPENCL_SYCL=0 TF_NEED_ROCM=0 TF_NEED_S3=0 TF_NEED_TENSORRT=0 TF_NEED_VERBS=0 TF_SET_ANDROID_WORKSPACE=0 TF_CUDA_COMPUTE_CAPABILITIES="5.3,6.0,6.1,6.2,7.0,7.2,7.5" GCC_HOST_COMPILER_PATH=$(which gcc) CC_OPT_FLAGS="-march=native" PYTHON_BIN_PATH=$(which python) PYTHON_LIB_PATH="$(python -c 'import site; print(site.getsitepackages()[0])')" ./configure
RUN cd /usr/local/src/tensorflow \
  && time bazel build --verbose_failures --config=opt --config=v2 --config=cuda //tensorflow/tools/pip_package:build_pip_package \
  && time ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg \
  && time pip3 install /tmp/tensorflow_pkg/tensorflow-*.whl

CMD bash

@xlnwel
Copy link

xlnwel commented Jun 23, 2020

Hi @mmartial. I build TF2.2 following the official guide without Dockerfile. Do you mean I should execute fgrep _TF_MAX_BAZEL configure.py | grep '=' | perl -ne 'print $1 if (m%\=\s+.([\d\.]+).$+%)' > .bazelversion before bazel build? I've tried it and then found bazel build //tensorflow/tools/pip_package:build_pip_package seemed to have no effect at all.

@mmartial
Copy link
Author

Hi @mmartial. I build TF2.2 following the official guide without Dockerfile. Do you mean I should execute fgrep _TF_MAX_BAZEL configure.py | grep '=' | perl -ne 'print $1 if (m%\=\s+.([\d\.]+).$+%)' > .bazelversion before bazel build? I've tried it and then found bazel build //tensorflow/tools/pip_package:build_pip_package seemed to have no effect at all.

No, I was referring to #40688 (comment)
When I saw your error, I saw the PyUFuncGenericFunction which was fixed by that call.

Note that simply using 'numpy<1.19.0' in my pip install was sufficient to solve this issue.

@xlnwel
Copy link

xlnwel commented Jun 24, 2020

Unfortunately it does not work for me. Maybe I have to open another issue.

tensorflow-copybara pushed a commit that referenced this issue Jun 24, 2020
See #40688.

PiperOrigin-RevId: 318122157
Change-Id: Ief46c5610f3aaf0cdd7d43ce1a10d6d87e8e8e01
geetachavan1 pushed a commit to geetachavan1/tensorflow that referenced this issue Jun 25, 2020
See tensorflow#40688.

PiperOrigin-RevId: 318122157
Change-Id: Ief46c5610f3aaf0cdd7d43ce1a10d6d87e8e8e01
meteorcloudy added a commit to bazelbuild/continuous-integration that referenced this issue Jun 26, 2020
The new version of numpy seems to have some API change that doesn't work with TF anymore.
See tensorflow/tensorflow#40688 (comment)
pjattke pushed a commit to pjattke/docker-he-transformer that referenced this issue Jul 31, 2020
@ebrevdo
Copy link
Contributor

ebrevdo commented Aug 4, 2020

@amahendrakar this is still an issue on r2.3; i just tried to build tf branch r2.3 on my ubuntu system and ran into the same issue; the perl rewrite works, we should just fix the code to do a proper static cast. @penpornk who's closest to this code?

@ebrevdo ebrevdo reopened this Aug 4, 2020
@penpornk
Copy link
Member

penpornk commented Aug 4, 2020

@ebrevdo This is Python glue code so it probably belongs to TF Core folks. But the fixes are simple enough. I can do it.

@penpornk
Copy link
Member

penpornk commented Aug 5, 2020

It seems @chsigg has already fixed this in 75ea0b3 recently (Jun 26, 2020) by adding an overload function. I tried compiling with the latest code from master and didn't get the error anymore.

(It's too late to patch this into releases 2.2.0 and 2.3.0 now, so this issue will be fixed in release 2.4.0.)

@ebrevdo
Copy link
Contributor

ebrevdo commented Aug 5, 2020 via email

@mmartial
Copy link
Author

mmartial commented Aug 5, 2020

I will take this opportunity to update another part of the perl-glue (for the bazel version): since in 2.3.0 it looks like the max bazel version is now set to 3.99 (while the lastest bazel release is 3.4.1), I added the following version checking function to my 2.3.0 build (still in testing)

ARG LATEST_BAZEL=3.4.1
[...]
  && fgrep _TF_MAX_BAZEL configure.py | grep '=' | perl -ne '$lb="'${LATEST_BAZEL}'";$brv=$1 if (m%\=\s+.([\d\.]+).$+%); sub numit{@g=split(m%\.%,$_[0]);return(1000000*$g[0]+1000*$g[1]+$g[2]);}; if (&numit($brv) > &numit($lb)) { print "$lb" } else {print "$brv"};' > .bazelversion \
  && bazel clean \
[...]

@mightyroy
Copy link

mightyroy commented Aug 29, 2020

Remember to run bazel clean after downgrading numpy. I downloaded numpy 1.18 and it worked.

avdv added a commit to avdv/nixpkgs that referenced this issue Sep 16, 2020
Numpy introduced a breaking API change in version 1.19.x, see [1].

There is a simple fix [2] available in the master branch.

[1]: tensorflow/tensorflow#40688
[2]: tensorflow/tensorflow@75ea0b3
danieldk pushed a commit to danieldk/nixpkgs that referenced this issue Sep 21, 2020
Numpy introduced a breaking API change in version 1.19.x, see [1].

There is a simple fix [2] available in the master branch.

[1]: tensorflow/tensorflow#40688
[2]: tensorflow/tensorflow@75ea0b3

(cherry picked from commit 8f5bfd6)
avdv added a commit to avdv/nixpkgs that referenced this issue Sep 21, 2020
Numpy introduced a breaking API change in version 1.19.x, see [1].

There is a simple fix [2] available in the master branch.

[1]: tensorflow/tensorflow#40688
[2]: tensorflow/tensorflow@75ea0b3

(cherry picked from commit 8f5bfd6)
jonringer pushed a commit to NixOS/nixpkgs that referenced this issue Sep 22, 2020
Numpy introduced a breaking API change in version 1.19.x, see [1].

There is a simple fix [2] available in the master branch.

[1]: tensorflow/tensorflow#40688
[2]: tensorflow/tensorflow@75ea0b3

(cherry picked from commit 8f5bfd6)
PatriceVignola pushed a commit to microsoft/tensorflow-directml that referenced this issue Sep 26, 2020
…ericFunction.

See tensorflow/tensorflow#40688, tensorflow/tensorflow#40654.

PiperOrigin-RevId: 318452381
Change-Id: Icc5152f2b020ef19882a49e3c86ac80bbe048d64
@mihaimaruseac
Copy link
Collaborator

Fixed by aafe25d

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@vinhphat89
Copy link

Confirmed r2.2 source incompatible with Numpy version 1.18.5 and 1.19.0. Downgrade numpy < 1.18.5 will resolve the issues.

pip install numpy<1.18.5

@tensorflow tensorflow locked as resolved and limited conversation to collaborators Oct 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.2 Issues related to TF 2.2 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests