Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: tensorflow/core/common_runtime/gpu/gpu_process_state.h: No such file or directory #35576

Closed
abcdabcd987 opened this issue Jan 4, 2020 · 4 comments · Fixed by #44370
Assignees
Labels
subtype:bazel Bazel related Build_Installation issues TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues

Comments

@abcdabcd987
Copy link
Contributor

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux (Kernel 5.3.18, glibc 2.30-3)
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): Source
  • TensorFlow version: bd56e040ba4a8272163357a2f8786a128deb4aaf from r2.1 branch.
  • Python version: Not Applicable
  • Installed using virtualenv? pip? conda?: Not Applicable
  • Bazel version (if compiling from source): 0.29.1
  • GCC/Compiler version (if compiling from source): Host GCC 9.2.0; Cuda GCC 6.5.0
  • CUDA/cuDNN version: CUDA 10.0; cuDNN 7
  • GPU model and memory: RTX 2070 8GB

Describe the problem

I'm trying to upgrade the TensorFlow dependency in our project from 42c4f4ab6b53bce8639c203d7839d27eac11bd2f (from r1.13 branch) to bd56e040ba4a8272163357a2f8786a128deb4aaf (from r2.1 branch). The problem is that gpu_process_state.h is missing. The error line in our project is here.

/home/abcdabcd987/work/nexus/src/nexus/backend/tensorflow_model.cpp:6:10: fatal error: tensorflow/core/common_runtime/gpu/gpu_process_state.h: No such file or directory
    6 | #include "tensorflow/core/common_runtime/gpu/gpu_process_state.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I checked the cloned TensorFlow source code. I can see that gpu_process_state.h exists. However, when I checked that build output that bazel produces, it was not there:

$ pwd
/home/abcdabcd987/work/nexus/build/tensorflow/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/include

$ ls -la tensorflow/core/common_runtime/gpu/
total 28
drwxr-xr-x 2 abcdabcd987 abcdabcd987 4096 Jan  3 20:37 ./
drwxr-xr-x 3 abcdabcd987 abcdabcd987 4096 Jan  3 20:37 ../
-rw-r--r-- 1 abcdabcd987 abcdabcd987 7639 Jan  3 20:37 gpu_event_mgr.h
-rw-r--r-- 1 abcdabcd987 abcdabcd987 4091 Jan  3 20:37 gpu_id.h
-rw-r--r-- 1 abcdabcd987 abcdabcd987 1617 Jan  3 20:37 gpu_id_manager.h
-rw-r--r-- 1 abcdabcd987 abcdabcd987 1665 Jan  3 20:37 gpu_init.h

I tried to figure out the difference of the BUILD files between the old and the new version, but since the //tensorflow:install_headers target is quite complicated, I cannot get it straight. Any help is much appreciated! Thanks!

Provide the exact sequence of commands / steps that you executed before running into the problem

TensorFlow in our project is built with the following command:

bazel --output_base=../build/tensorflow build \
  --config=opt \
  --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-10.0 \
  --action_env CUDNN_INSTALL_PATH=/usr/local/cuda-10.0 \
  --action_env GCC_HOST_COMPILER_PATH=/usr/bin/gcc-6 \
  //tensorflow:libtensorflow_cc.so \
  //tensorflow:libtensorflow_framework.so \
  //tensorflow:install_headers

The .tf_configure.bazelrc is the following:

build:xla --define with_xla_support=false
build --action_env TF_NEED_OPENCL_SYCL="0"
build --action_env TF_NEED_ROCM="0"
build --action_env TF_NEED_CUDA="1"
build --action_env TF_CUDA_VERSION="10.0"
build --action_env TF_CUDNN_VERSION="7"
build --action_env TF_NCCL_VERSION=""
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="6.0,7.5"
build --action_env TF_CUDA_CLANG="0"
build --config=cuda
build:opt --copt=-march=native
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_tag_filters=-benchmark-test,-no_oss,-oss_serial
test --build_tag_filters=-benchmark-test,-no_oss
test --test_tag_filters=-gpu
test --build_tag_filters=-gpu
build --action_env TF_CONFIGURE_IOS="0"

Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

@oanush oanush self-assigned this Jan 6, 2020
@oanush oanush added subtype:bazel Bazel related Build_Installation issues TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues labels Jan 6, 2020
@oanush oanush assigned angerson and unassigned oanush Jan 6, 2020
@oanush oanush added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 8, 2020
@abcdabcd987
Copy link
Contributor Author

abcdabcd987 commented Jan 15, 2020

I found that I could get the missing headers back by the following method: In the //tensorflow/core:headers target, add :gpu_runtime to deps list.

I hope this information could help you guys debug. Also, cc @perfinion who I suppose is the author of //tensorflow:install_headers target?

@yifeif
Copy link
Contributor

yifeif commented Apr 10, 2020

@abcdabcd987, I believe this has been fixed in 2.2.0rc2. Let me know if you are still seeing this issue with the latest version.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 12, 2020
@abcdabcd987
Copy link
Contributor Author

Hi @yifeif , as of v2.2.1, the problem still exists. The built target doesn't include gpu related headers, such as gpu_process_state.h.

root@355caa9c629e:/tensorflow# git show --oneline -s
25fba035f3 (HEAD, tag: v2.2.1, origin/r2.2) Merge pull request #43445 from tensorflow-jenkins/version-numbers-2.2.1-25147
root@355caa9c629e:/tensorflow# ls bazel-out/k8-opt/bin/tensorflow/include/tensorflow/core/common_runtime/gpu/
gpu_event_mgr.h  gpu_id.h  gpu_id_manager.h  gpu_init.h

I believe PR #36013 is still relevant.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype:bazel Bazel related Build_Installation issues TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues
Projects
None yet
5 participants