Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker with GPU 2.3rc0 CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #41132

Closed
jcrousse opened this issue Jul 6, 2020 · 58 comments
Assignees
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.3 Issues related to TF 2.3 type:build/install Build and install issues

Comments

@jcrousse
Copy link

jcrousse commented Jul 6, 2020

It seem that the Docker image tensorflow/tensorflow:2.3.0rc0-gpu won't work with my GPU BUT on the other hand the image tensorflow/tensorflow:2.2.0rc0-gpu works fine

Or in other words, the solution to the present issue was to "downgrade" to tensorflow/tensorflow:2.2.0rc0-gpu
tensorflow/tensorflow:2.3.0rc0-gpu also works fine with CPU only.

System information

  • Ubuntu 20.4
  • TensorFlow through Docker
  • TensorFlow version (use command below):
  • GPU model and memory: Geforce GTX 960M, coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
  • GPU drivers: 440.100

how to reproduce

> docker run -it --rm --gpus all  --entrypoint bash tensorflow/tensorflow:2.3.0rc0-gpu
> python
>>> import tensorflow as tf
>>> inputs = tf.keras.layers.Input(shape=(None,), name="input")
>>> embedded = tf.keras.layers.Embedding(100, 16)(inputs)

full stack trace:

2020-07-06 18:46:55.604377: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-06 18:46:55.608404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.608911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-07-06 18:46:55.608943: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 18:46:55.610544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-06 18:46:55.611696: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-06 18:46:55.611988: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-06 18:46:55.613589: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-06 18:46:55.614478: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-06 18:46:55.618025: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-06 18:46:55.618159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.618734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.619206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-06 18:46:55.619480: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-06 18:46:55.643133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2693910000 Hz
2020-07-06 18:46:55.643781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44161a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-06 18:46:55.643809: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-06 18:46:55.725002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.725324: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44aa610 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-06 18:46:55.725349: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
2020-07-06 18:46:55.725532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.725767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-07-06 18:46:55.725796: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 18:46:55.725828: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-06 18:46:55.725854: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-06 18:46:55.725882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-06 18:46:55.725908: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-06 18:46:55.725938: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-06 18:46:55.725988: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-06 18:46:55.726091: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.726485: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-06 18:46:55.726724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-06 18:46:55.726756: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 926, in __call__
    input_list)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1098, in _functional_construction_call
    self._maybe_build(inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2643, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_utils.py", line 323, in wrapper
    output_shape = fn(instance, input_shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/embeddings.py", line 135, in build
    if (context.executing_eagerly() and context.context().num_gpus() and
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1082, in num_gpus
    self.ensure_initialized()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 539, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

@jcrousse jcrousse added the type:bug Bug label Jul 6, 2020
@angerson angerson self-assigned this Jul 6, 2020
@angerson
Copy link
Contributor

angerson commented Jul 6, 2020

Hmm... doesn't have trouble on my machine in the same container. Thanks a bunch for the exact replication commands.

root@7b03d7e48af6:/# python
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-07-06 21:27:04.584594: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> inputs = tf.keras.layers.Input(shape=(None,), name="input")
>>> embedded = tf.keras.layers.Embedding(100, 16)(inputs)
2020-07-06 21:27:18.540476: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-06 21:27:18.569953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:18:00.0 name: Quadro P1000 computeCapability: 6.1
coreClock: 1.4805GHz coreCount: 5 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 74.65GiB/s
2020-07-06 21:27:18.570001: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 21:27:18.985818: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-06 21:27:19.579217: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-06 21:27:19.836486: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-06 21:27:20.597542: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-06 21:27:21.103251: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-06 21:27:22.960095: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-06 21:27:22.960782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-06 21:27:22.961151: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-06 21:27:23.000149: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3000000000 Hz
2020-07-06 21:27:23.012170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45be680 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-06 21:27:23.012215: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-06 21:27:23.113028: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x462a940 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-06 21:27:23.113113: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro P1000, Compute Capability 6.1
2020-07-06 21:27:23.114100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:18:00.0 name: Quadro P1000 computeCapability: 6.1
coreClock: 1.4805GHz coreCount: 5 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 74.65GiB/s
2020-07-06 21:27:23.114167: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 21:27:23.114213: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-06 21:27:23.114240: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-06 21:27:23.114267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-06 21:27:23.114293: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-06 21:27:23.114317: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-06 21:27:23.114344: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-06 21:27:23.115591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-06 21:27:23.115649: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-06 21:27:23.790561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-06 21:27:23.790610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-07-06 21:27:23.790617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-07-06 21:27:23.791319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3353 MB memory) -> physical GPU (device: 0, name: Quadro P1000, pci bus id: 0000:18:00.0, compute capability: 6.1)
>>> 
root@7b03d7e48af6:/# nvidia-smi
Mon Jul  6 21:28:02 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P1000        Off  | 00000000:18:00.0  On |                  N/A |
| 34%   36C    P0    N/A /  N/A |    223MiB /  4037MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@7b03d7e48af6:/# 

Can you include the output of nvidia-smi? This could be a case of old drivers on your host machine (440... could it be too new?), an issue with compute capabilities, or something else.

@jcrousse
Copy link
Author

jcrousse commented Jul 7, 2020

Sure, here it is.
440.100 and CUDA 10.2 seem to be the default on Ubuntu 20.04 (fresh install).

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   56C    P0    N/A /  N/A |      0MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Since it is mentioned in the TF docs that the NVIDIA® CUDA® Toolkit does not need to be installed I thought that the CUDA version on the host machine would not matter.
I also get the same behaviour as you with tensorflow/tensorflow:2.2.0rc0-gpu on the same machine.

@ravikyram ravikyram added TF 2.3 Issues related to TF 2.3 subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues and removed type:bug Bug labels Jul 7, 2020
@ravikyram ravikyram removed their assignment Jul 7, 2020
@angerson
Copy link
Contributor

angerson commented Jul 7, 2020

CUDA Compute Capability is inherent to your graphics card. There were some size-reduction changes to our binaries in 2.3 that adjusted handling of old capabilities (such as 5.0), but I believe TensorFlow 2.3 should still support capabilities as old as 3.5.

Can you try to replicate this outside of a Docker container to see if it's related to your graphics card vs. the container environment?

FYI @chsigg

@chsigg
Copy link
Contributor

chsigg commented Jul 8, 2020

We removed PTX for all but sm_70 from TF builds in cf1b6b3. We never shipped with kernels for sm_50, only sm_52. Apparently the driver was able to compile PTX for sm_52 to sm_50, even though it's not officially supported.

If you want to run on a sm_50 card, it would be best to build TF from source.

@brentspell
Copy link
Contributor

brentspell commented Jul 9, 2020

Forgive my ignorance here, but with the change on cf1b6b3, I don't see sm_70, sm_75, or compute_75 listed in the compute capabilities. Would a GPU (the Tesla T4 in our case) at compute_75 have a similar problem here?

@gunan
Copy link
Contributor

gunan commented Jul 9, 2020

No, the driver will be able to JIT compute_70 and use it for any compute capabilities 7.x.
the startup may be slow, but it will work.

@sanjoy
Copy link
Contributor

sanjoy commented Jul 9, 2020

I believe compute_70 implies sm_70 so we should not need to JIT PTX for T4, V100. @chsigg Is this correct?

(It would indeed be a problem if we have to JIT for V100 since that would add startup latency for a popular GPU.)

Edit: see https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/cuda_configure.bzl.

@xingyu-long
Copy link
Contributor

Actually, I have same error message after I run bazel tests with gpu benchmarks.
System info:

System: Ubuntu 18.04
Tensorflow version: 2.4.0-dev20200710
GPU: 2 x Tesla V100
Driver Version: 450.51.05
CUDA Version: 10.1.243, https://www.tensorflow.org/install/gpu#install_cuda_with_apt (Ubuntu 18.04, CUDA 10.1)
Bazel: 3.1.0 and 3.3.1

Input:
bazel run --config=cuda -c opt --copt="-mavx" eager_microbenchmarks_test -- --benchmarks=.
Output:

INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'run' from /home/xingyulong/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'run' from /home/xingyulong/tensorflow/.bazelrc:
  Inherited 'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=v2
INFO: Found applicable config definition build:v2 in file /home/xingyulong/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /home/xingyulong/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /home/xingyulong/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain
INFO: Found applicable config definition build:linux in file /home/xingyulong/tensorflow/.bazelrc: --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /home/xingyulong/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading:
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
DEBUG: /home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:5:
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (1 packages loaded, 0 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (12 packages loaded, 14 targets configured)
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule git_repository defined at:
  /home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/external/bazel_tools/tools/build_defs/repo/git.bzl:195:18: in <toplevel>
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (21 packages loaded, 35 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (85 packages loaded, 902 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (118 packages loaded, 1030 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (222 packages loaded, 5895 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (287 packages loaded, 8881 targets configured)
WARNING: /home/xingyulong/tensorflow/tensorflow/core/BUILD:1750:1: in linkstatic attribute of cc_library rule //tensorflow/core:lib_internal: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'cc_library', the error might have been caused by the macro implementation
WARNING: /home/xingyulong/tensorflow/tensorflow/core/BUILD:1775:1: in linkstatic attribute of cc_library rule //tensorflow/core:lib_headers_for_pybind: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'cc_library', the error might have been caused by the macro implementation
WARNING: /home/xingyulong/tensorflow/tensorflow/core/BUILD:2162:1: in linkstatic attribute of cc_library rule //tensorflow/core:framework_internal: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'tf_cuda_library', the error might have been caused by the macro implementation
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (306 packages loaded, 14342 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (348 packages loaded, 19532 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (351 packages loaded, 19917 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (356 packages loaded, 23062 targets configured)
Analyzing: target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (357 packages loaded, 23062 targets configured)
WARNING: /home/xingyulong/tensorflow/tensorflow/python/BUILD:4667:1: in py_library rule //tensorflow/python:standard_ops: target '//tensorflow/python:standard_ops' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`.
WARNING: /home/xingyulong/tensorflow/tensorflow/python/BUILD:115:1: in py_library rule //tensorflow/python:no_contrib: target '//tensorflow/python:no_contrib' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`.
INFO: Analyzed target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test (361 packages loaded, 30598 targets configured).
INFO: Found 1 target...
INFO: Deleting stale sandbox base /home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/sandbox
[0 / 226] [Prepa] BazelWorkspaceStatusAction stable-status.txt ... (2 actions, 0 running)
7,264 / 9,082] [Prepa] Writing file tensorflow/core/kernels/data/experimental/libignore_errors_dataset_op.pic.lo-2.params [for host]
[15,392 / 17,385] Executing genrule //tensorflow/python:framework/fast_tensor_util.pyx_cython_translation [for host]; 1s local
[22,092 / 23,859] Executing genrule //tensorflow/python:framework/fast_tensor_util.pyx_cython_translation; 1s local ... (2 actions running)
[30,382 / 30,594] [Prepa] Writing file tensorflow/python/gen_user_ops_py_wrappers_cc-2.params
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen:
2020-07-10 18:11:35.652291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2:
2020-07-10 18:11:35.652292: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1:
2020-07-10 18:11:35.862417: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
INFO: From Executing genrule //tensorflow:tf_python_api_gen_v2:
2020-07-10 18:11:35.859993: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Target //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test up-to-date:
  bazel-bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test
INFO: Elapsed time: 58.937s, Critical Path: 14.14s
INFO: 17 processes: 17 local.
INFO: Build completed successfully, 4672 total actions
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh tensorflow/python/keras/benchmarks/eager_microbenchmarks_test '--benchmarks=.'
INFO: Build completed successfully, 4672 total actions
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //tensorflow/python/keras/benchmarks:eager_microbenchmarks_test
-----------------------------------------------------------------------------
2020-07-10 18:11:41.266346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-10 18:11:42.762645: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-10 18:11:42.840961: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:42.841808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2020-07-10 18:11:42.841953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:42.842708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2020-07-10 18:11:42.842750: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-10 18:11:42.844793: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-10 18:11:42.846855: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-10 18:11:42.847239: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-10 18:11:42.849488: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-10 18:11:42.850725: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-10 18:11:42.850894: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-07-10 18:11:42.851032: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:42.851964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:42.852842: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:42.853678: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:42.854423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1
2020-07-10 18:11:42.854812: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-10 18:11:43.309452: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:43.310269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2020-07-10 18:11:43.310402: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:43.311151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2020-07-10 18:11:43.311184: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-10 18:11:43.311210: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-10 18:11:43.311224: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-10 18:11:43.311234: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-10 18:11:43.311244: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-10 18:11:43.311254: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-10 18:11:43.311262: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-07-10 18:11:43.311347: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:43.312197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:43.313078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:43.313907: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-10 18:11:43.314671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1
2020-07-10 18:11:43.314714: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.py", line 326, in <module>
    test.main()
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/test.py", line 58, in main
    return _googletest.main(argv)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/googletest.py", line 66, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/benchmark.py", line 476, in benchmarks_main
    app.run(lambda _: _run_benchmarks(regex), argv=argv)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/absl_py/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/absl_py/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/googletest.py", line 66, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/benchmark.py", line 476, in benchmarks_main
    app.run(lambda _: _run_benchmarks(regex), argv=argv)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/absl_py/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/absl_py/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/benchmark.py", line 476, in <lambda>
    app.run(lambda _: _run_benchmarks(regex), argv=argv)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/platform/benchmark.py", line 456, in _run_benchmarks
    instance_benchmark_fn()
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.py", line 156, in benchmark_layers_advanced_activations_elu_overhead
    x = tf.ones((1, 1))
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/ops/array_ops.py", line 3068, in ones
    tensor_shape.TensorShape(shape))
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/framework/constant_op.py", line 373, in _tensor_shape_tensor_conversion_function
    return constant(s_list, dtype=dtype, name=name)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/framework/constant_op.py", line 264, in constant
    allow_broadcast=True)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/framework/constant_op.py", line 97, in convert_to_eager_tensor
    ctx.ensure_initialized()
  File "/home/xingyulong/.cache/bazel/_bazel_root/3240dade0c8746999ed99d7076f401e9/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.runfiles/org_tensorflow/tensorflow/python/eager/context.py", line 549, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

But, when I run models instead of running bazel test, it works fine.

@sanjoy
Copy link
Contributor

sanjoy commented Jul 11, 2020

when I run models

Are you running these models also using a TF built from source, or are you running them using a pip installed binary?

@xingyu-long
Copy link
Contributor

xingyu-long commented Jul 11, 2020

when I run models

Are you running these models also using a TF built from source, or are you running them using a pip installed binary?

I think it should be pip installed binary.

import tensorflow as tf

inputs = tf.keras.xxxxx
xxxxx

@sanjoy
Copy link
Contributor

sanjoy commented Jul 11, 2020

I think it should be pip installed binary.

Ok, so that could explain the discrepancy -- the TF binary you built for tests probably does not include sm_70 or compute_70.

Can you grep for TF_CUDA_COMPUTE_CAPABILITIES in .tf_configure.bazelrc and share what you find there?

@xingyu-long
Copy link
Contributor

Can you grep for TF_CUDA_COMPUTE_CAPABILITIES in .tf_configure.bazelrc and share what you find there?

I searched .tf_configure.bazelrc in my tensorflow folder, it's not exist.

Let me clear my environment, I used pip to install tensorflow and download current tensorflow repo to run the bazel tests on tensorflow/tensorflow/python/keras/benchmarks/eager_microbenchmarks_test.py and found the error msg like I posted and it works fine when I was training model.

@xingyu-long
Copy link
Contributor

It works for me after run ./configure in TF folder.

@navganti
Copy link

navganti commented Aug 2, 2020

Hi all, I'm running into the same issue here. Both the Docker installation of Tensorflow and the local pip installation are giving me the same error:

tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

System information

  • Ubuntu 16.04
  • TensorFlow through Docker, local TF installation with pip.
  • TensorFlow version: 2.3.0
  • GPU model and memory: Quadro M1000M coreClock: 1.0715 GHz coreCount: 4, deviceMemorySize: 3.95 GiB deviceMemoryBandwidth: 74.65GiB/s
  • computeCapability: 5.0
  • GPU drivers: 418.87.00
  • CUDA version: 10.1
  • cudNN version: 7.6.5

I was able to reproduce this error in the docker container using the steps listed above, but also just following the steps listed here lead to the same error. The GPU is detected ok using tf.config.experimental.list_physical_devices('GPU') but as soon as I try to define the tf.constant it fails immediately.

2.2.0 works fine without any issues - both locally and through Docker.

@sanjoy
Copy link
Contributor

sanjoy commented Aug 2, 2020

@navganti PTAL here #41892 (comment).

@navganti
Copy link

navganti commented Aug 3, 2020

@sanjoy I see! Thank you for the update.

stu1130 added a commit to deepjavalibrary/djl that referenced this issue Aug 8, 2020
This reverts commit 43e9ccd.

Reason for revert: TensorFlow 2.3 don't work on Linux GPU
tensorflow/tensorflow#41976
tensorflow/tensorflow#41132

Change-Id: I819d10e8129aeaf57bf5f202600d0b5e1086000e
@motrek
Copy link

motrek commented Aug 28, 2020

I'm seeing this error when I try to call TF_LoadSessionFromSavedModel (C API). It works correctly via the non-GPU docker image, but fails with the GPU docker image with the following error:

CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

I'm using the latest docker image, i.e., tensorflow/tensorflow:latest-gpu built on 7/28/20.
My copy of the C API was downloaded a few days ago: libtensorflow-gpu-linux-x86_64-2.3.0
My GPU is a mobile Nvidia Geforce GTX 1660 Ti.
Nvidia driver version: 440.100
CUDA version 10.2
According to Wikipedia, my graphics chip has CUDA compute capability version 7.5, which seems to be the latest version of any chip that has been released, so I'm pretty sure the problem isn't my chip's CUDA capability.

@NeighborhoodCoding
Copy link

NeighborhoodCoding commented Sep 3, 2020

CUDA 10.1 and CUDNN 7.4 and 7.5 is failed. (TF 2.3)
should I try CUDA version 10.2?

@sanjoy
Copy link
Contributor

sanjoy commented Sep 3, 2020

@motrek TF 2.3 should indeed work on compute capability 7.5. However this suggests that it lacks tensor cores, it is possible that's why you get the error.

@motrek
Copy link

motrek commented Sep 3, 2020

@sanjoy You're right, the chip (1660 Ti) absolutely does lack Tensor Cores, but that shouldn't be a problem at all. No GTX chip has tensor cores either and the C API works fine on those.

Also, training via Python is GPU-accelerated on my laptop with this chip.

It's just the C API that's giving me this "device kernel image is invalid" error, which is clearly a bug somewhere. :(

@av8ramit
Copy link

av8ramit commented Sep 8, 2020

@angerson I'll send a fix internally that matches the pip package config.

@av8ramit
Copy link

av8ramit commented Sep 8, 2020

I have a commit that should fix this. I'll keep monitoring to see if there are any other issues.

@motrek
Copy link

motrek commented Sep 8, 2020

@av8ramit Very sorry, I'm new to all of this. Will the fix eventually be available in the "latest" docker image and the C API libraries that are posted to the web page that I linked to (above)? Do you have any idea when that might happen? Thanks in advance.

@av8ramit
Copy link

av8ramit commented Sep 8, 2020

Hello @motrek I'll trigger a new push to GCS. Hopefully this next one or by tomorrow's version we can have something working.

@av8ramit
Copy link

Unfortunately it appears my change did not work, digging into why today.

@parnham
Copy link

parnham commented Sep 16, 2020

We're seeing the same issue as @motrek :
We can train a model using tf.keras without a problem.

Then on the same system using TF_LoadSessionFromSavedModel from the C API and libtensorflow installed from https://www.tensorflow.org/install/lang_c we get
CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

If I hack our C++ application to link against _pywrap_tensorflow_internal.so instead then it seems to work. Although that's not a long term solution since the pywrap lib is considerably larger and we would also have to link our application against python.

OS: Ubuntu 20.04
GPU: RTX 2070
Driver: 440.100
Tensorflow: 2.3.0

@av8ramit
Copy link

Sorry I forgot to update this thread, but the latest GCS builds are built with the following computes:

sm_35,sm_50,sm_60,sm_70,sm_75,compute_80

@motrek
Copy link

motrek commented Sep 16, 2020

If I hack our C++ application to link against _pywrap_tensorflow_internal.so instead then it seems to work. Although that's not a long term solution since the pywrap lib is considerably larger and we would also have to link our application against python.

Sorry for the [probably very basic] question but how do you "link against python"? I found "_pywrap_tensorflow_internal.so" and tried linking with it but get a bunch of unresolved python symbols. (As one would expect from reading your post.)

@motrek
Copy link

motrek commented Sep 16, 2020

Sorry I forgot to update this thread, but the latest GCS builds are built with the following computes:

sm_35,sm_50,sm_60,sm_70,sm_75,compute_80

I'm working off of this page to get the libraries:

https://www.tensorflow.org/install/lang_c

There's a link to a "GCS bucket" but it goes to an XML file in my browser (Chrome) which seems like it's supposed to be read by a different piece of software that I don't have. I see that "GCS" is short for "Google Cloud Storage" but when I try to access (?) "Google Cloud Storage," I'm asked to sign up for a service (including entering payment/credit card details).

If somebody could point me in the right direction for accessing the files in this "GCS bucket" I would appreciate it.

(Also, looking at that XML file in my browser, all of the files that were modified on 09-15 are 'libtensorflow-cpu-linux-x86_64.tar.gz' ... I don't see any files that seem like they would have GPU support?)

@av8ramit
Copy link

So that link is a browser representation of the GCS bucket libtensorflow-nightly. I may be describing it badly, but regardless I think the file you're looking for is this which was built last night.

@motrek
Copy link

motrek commented Sep 16, 2020

So that link is a browser representation of the GCS bucket libtensorflow-nightly. I may be describing it badly, but regardless I think the file you're looking for is this which was built last night.

Thanks for building this and posting this link. I appreciate it. I installed it and tried it--I no longer got the error about "device kernel image is invalid" but I did get a bunch of errors about not being able to load libraries that seem related to CUDA 11.

I will have to consider whether or not I want to switch to CUDA 11, seems like a big change and I'm reluctant to touch anything since my training with GPU acceleration is already working well with my current setup.

@av8ramit
Copy link

Glad we were able to solve the first issue, sorry about the new issues you are facing. Do you mind uploading some logs so I can see if that's something we can fix on our end? The package was built with our CUDA 11 toolchain.

@parnham
Copy link

parnham commented Sep 16, 2020

Sorry for the [probably very basic] question but how do you "link against python"? I found "_pywrap_tensorflow_internal.so" and tried linking with it but get a bunch of unresolved python symbols. (As one would expect from reading your post.)

I used -l_pywrap_tensorflow -lpython3.8

If the issue has been fixed in the nightly which is TF 2.4 and not in the stable 2.3 release then I'm going to have the same dependency problem with CUDA that you are since the default CUDA version is 10.1 in the Ubuntu 20.04 package repos.

@motrek
Copy link

motrek commented Sep 18, 2020

Glad we were able to solve the first issue, sorry about the new issues you are facing. Do you mind uploading some logs so I can see if that's something we can fix on our end? The package was built with our CUDA 11 toolchain.

@av8ramit yeah, all the errors I'm seeing are to do with not being able to load CUDA 11 libraries, which I'm sure is expected if you built the libraries against CUDA 11 and I don't have CUDA 11 installed. I'm happy to post logs here but I can't imagine they would be useful to anybody. I will think about installing CUDA 11 later, but it might not be for a while, since I was able to get my code working by linking it against the Python libraries as parnham suggested.

@motrek
Copy link

motrek commented Sep 18, 2020

Sorry for the [probably very basic] question but how do you "link against python"? I found "_pywrap_tensorflow_internal.so" and tried linking with it but get a bunch of unresolved python symbols. (As one would expect from reading your post.)

I used -l_pywrap_tensorflow -lpython3.8

@parnham Thanks so much for this. I don't have a good understanding of how the linker works with shared libraries but I was able to hack something together:

  • I found the absolute paths to these .so files and added them to the end of my link command, which worked
  • When I ran the executable, though, it said it couldn't find the .so files
  • I added the paths to the .so files to /etc/ld.so.conf and also added the paths to $LD_LIBRARY_PATH, I'm not sure if it was necessary to do both but there you go
  • The executable ran this time but failed to do inference with a CUDA error because the "allow growth" GPU option was not set to true (I don't know what "allow growth" does, but everybody seems to run into this problem, why isn't it set to true by default?!?!)
  • I found some code posted here to set "allow growth" to true via the C API:
    how to limit GPU usage in this api? Neargye/hello_tf_c_api#21

So everything seems pretty fragile but IT'S WORKING. Thanks so much. My C API inference workload is now ~5x as fast running on my GPU vs. the CPU. Great speedup. This is for a private project so I don't mind that I'm linking against those python libraries.

If you can share what you did to allow gcc to link nicely against those libraries (not use absolute paths, etc.) that would be great, really appreciated.

Thanks again!

@parnham
Copy link

parnham commented Sep 18, 2020

Glad I could help a little @motrek
You should be able to link to python3.8 if you have the package libpython3.8-dev installed.

For linking to the _pywrap_tensorflow library I just created a symlink to it in /usr/local/lib

lrwxrwxrwx 1 root root    93 Sep 16 11:16  lib_pywrap_tensorflow.so -> /home/dan/.local/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so

and then ran ldconfig. At which point it can be linked and also found at runtime.

There's a slightly cleaner solution to setting the "allow growth" option by including the experimental header

#include <tensorflow/c/c_api_experimental.h>

and then use the TF_CreateConfig helper.

auto options = TF_NewSessionOptions();
auto config  = TF_CreateConfig(true, true, 8);

TF_SetConfig(options, config->data, config->length, this->status);
TF_DeleteBuffer(config);

Use the session options as normal.

@parnham
Copy link

parnham commented Oct 1, 2020

Apologies for adding more activity to this issue @av8ramit but we wanted to find out if there was going to be a point release of the TensorFlow C library v2.3 that has been patched with the correct CUDA capabilities?
I only ask because v2.3 is the current stable version, it works with the standard CUDA version in Ubuntu 20.04, and when installing tensorflow through python for training with keras it also uses the same version.

@av8ramit
Copy link

av8ramit commented Oct 1, 2020

Looping in the release manager. @geetachavan1 would we be able to patch the fix for libtensorflow and release new binaries with the correct CUDA capabilities. Happy to help get this done internally.

@frank-qcd-qk
Copy link

For Nvidia 3090, Ubuntu 20.04, Cuda 10.1, Cudnn 7.6, Nvidia GPU driver 455 have the same isseu

@mihaimaruseac
Copy link
Collaborator

Hi. We have uploaded the 2.3.1 libtensorflow binaries. Apologies for the delay, I missed them during the patch release

@parnham
Copy link

parnham commented Oct 3, 2020

Hi @mihaimaruseac

Using the link from https://www.tensorflow.org/install/lang_c#download and assuming that the new version was in the same location, I downloaded https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-2.3.1.tar.gz

Unfortunately that build seems to have the same issue:

2020-10-03 18:26:42.084881: E tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

In fact running a diff between the old and new libtensorflow.so.2.3.0 files shows them to be identical.

EDIT: The 2.3.0 and 2.3.1 tar.gz files have the same md5sum

@mihaimaruseac
Copy link
Collaborator

This is interesting. Even if our recent changes to the build script had no effects, I would have expected the binaries to differ based on the patch release changes.

@mihaimaruseac
Copy link
Collaborator

Apolgies. It seems our CI uploaded the wrong package under the new name after we refactored parts of the CI. I think it should be fixed now, can you give it a try please?

mihaimaruseac@ankh:/tmp$ sha256sum libtensorflow-gpu-linux-x86_64-2.3.*
5e4d934fd7602b6d002a89b972371151aa478ea99bf1a70987833e11c34c8875  libtensorflow-gpu-linux-x86_64-2.3.0.tar.gz
bdfb52677cf9793dcd7b66254b647c885c417967629f58af4a19b386fa7e7e0f  libtensorflow-gpu-linux-x86_64-2.3.1.tar.gz

@motrek
Copy link

motrek commented Oct 5, 2020

Apolgies. It seems our CI uploaded the wrong package under the new name after we refactored parts of the CI. I think it should be fixed now, can you give it a try please?

This now works for me. Thanks for sorting this out. Might want to update the links on the C API page to point to the new versions.

GPU: mobile GeForce GTX 1660 Ti

@parnham
Copy link

parnham commented Oct 5, 2020

Thank you so much @av8ramit and @mihaimaruseac - v2.3.1 is now working for us also!

@av8ramit
Copy link

av8ramit commented Oct 6, 2020

No problem! Hats off to @geetachavan1 and @mihaimaruseac who did the heavy lifting!

@av8ramit av8ramit closed this as completed Oct 6, 2020
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.3 Issues related to TF 2.3 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests