Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tensorrt] Failed Execution #3835

Closed
Davidnet opened this issue Apr 2, 2018 · 43 comments
Closed

[tensorrt] Failed Execution #3835

Davidnet opened this issue Apr 2, 2018 · 43 comments
Assignees

Comments

@Davidnet
Copy link

Davidnet commented Apr 2, 2018

System information

  • What is the top-level directory of the model you are using: models
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): xenial
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.7.0
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: 9.0/7.1
  • GPU model and memory: k80 11gb
  • Exact command to reproduce:
    python tensorrt.py --frozen_graph=resnetv2_imagenet_frozen_graph.pb --image_file=image.jpg --native --fp32 --fp16 --output_dir=output

Describe the problem

fresh os installation, try to use the tensorrt example.

Source code / logs

totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-04-02 10:05:26.213549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-02 10:05:26.603186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-02 10:05:26.603241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-04-02 10:05:26.603258: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-04-02 10:05:26.603571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5719 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 00
00:00:1e.0, compute capability: 3.7)
Running native graph
INFO:tensorflow:Starting execution
2018-04-02 10:05:27.325531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-02 10:05:27.325604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-02 10:05:27.325624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-04-02 10:05:27.325636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-04-02 10:05:27.325831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5719 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 00
00:00:1e.0, compute capability: 3.7)
INFO:tensorflow:Starting Warmup cycle
2018-04-02 10:05:28.814376: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000).  If using a binary install, up
grade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-04-02 10:05:28.815023: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Aborted (core dumped)
@karmel karmel added the stat:awaiting response Waiting on input from the contributor label Apr 3, 2018
@karmel
Copy link
Contributor

karmel commented Apr 3, 2018

I don't think K80s can run fp16. Does the fp32 loop work? If you have access to a P100 or V100, does running the command line above work?

@Davidnet
Copy link
Author

Davidnet commented Apr 3, 2018

Nada. Running:

 python tensorrt.py --frozen_graph=resnetv2_imagenet_frozen_graph.pb   --image_file=image.jpg --native --output_dir=output

Got:

2018-04-03 21:51:13.065768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-04-03 21:51:13.065799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-03 21:51:13.462560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-03 21:51:13.462625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-03 21:51:13.462646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-03 21:51:13.462971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5719 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Running native graph
INFO:tensorflow:Starting execution
2018-04-03 21:51:14.202213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-03 21:51:14.202290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-03 21:51:14.202311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-03 21:51:14.202329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-03 21:51:14.202539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5719 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
INFO:tensorflow:Starting Warmup cycle
2018-04-03 21:51:17.530559: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-04-03 21:51:17.531253: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

@Davidnet
Copy link
Author

Davidnet commented Apr 3, 2018

Oh boy this is going to be fun to debug: munmap is failing:

root@ad7dc5bee83d:~/models/research/tensorrt# python tensorrt.py --frozen_graph=resnetv2_imagenet_frozen_graph.pb   --image_file=image.jpg --fp32 --output_dir=output
/root/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it w$
ll be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /root/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.ba
se) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
2018-04-03 21:54:44.952446: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-03 21:54:45.040503: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA nod
e, so returning NUMA node zero
2018-04-03 21:54:45.040842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-04-03 21:54:45.040875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-03 21:54:45.432751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-03 21:54:45.432811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
2018-04-03 21:54:45.432832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
2018-04-03 21:54:45.433129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5719 MB memory) -> physical
 GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Running FP32 graph
2018-04-03 21:54:45.890110: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-04-03 21:54:46.543507: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 2
2018-04-03 21:54:46.737174: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2624] Max batch size= 128 max workspace size= 2132940928
2018-04-03 21:54:46.737232: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] starting build engine
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007ffe63a424e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7faa6f3a17e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7faa6f3ae698]
/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_has
hingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeISsLb1EEE+0xfc)[0x7faa447fea0c]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehas
h_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE9_M_insertIRKSsNS1_10_AllocNodeISaINS1_10_Hash_nodeISsLb1EEEEEEEESt4pairINS1_14_Node_iteratorISsLb1ELb1EEEbEOT_RKT0_St17integral_constantIbLb
1EE+0x96)[0x7fa9e79f6a26]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZNK8nvinfer17Network8validateERKNS_5cudnn15HardwareContextEbbi+0x1a6)[0x7fa9e79e4b36]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZN8nvinfer17builder11buildEngineERNS_21CudaEngineBuildConfigERKNS_5cudnn15HardwareContextERKNS_7NetworkE+0x46)[0x7fa9e79d1156]
/usr/lib/x86_64-linux-gnu/libnvinfer.so.4(_ZN8nvinfer17Builder15buildCudaEngineERNS_18INetworkDefinitionE+0x11)[0x7fa9e79bbe81]
/root/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(_ZN10tensorflow8tensorrt7convert32ConvertSubGraphToTensorRTNodeDefERNS1_14SubGraphParamsE+0x2020)
[0x7fa9e7271d90]
/root/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(_ZN10tensorflow8tensorrt7convert25ConvertGraphDefToTensorRTERKNS_8GraphDefERKSt6vectorISsSaISsEEm
mPS2_ii+0x200b)[0x7fa9e725188b]
/root/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(+0x4de8f)[0x7fa9e7248e8f]
/root/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(+0x4e51a)[0x7fa9e724951a]
python(_PyCFunction_FastCallDict+0x91)[0x5593f5425f11]
python(+0x19cbec)[0x5593f54b3bec]                                                                                                                                                  [607/2258]
python(_PyEval_EvalFrameDefault+0x30a)[0x5593f54d819a]
python(+0x1959a6)[0x5593f54ac9a6]
python(+0x196a11)[0x5593f54ada11]
python(+0x19ccc5)[0x5593f54b3cc5]
python(_PyEval_EvalFrameDefault+0x1021)[0x5593f54d8eb1]
python(+0x1959a6)[0x5593f54ac9a6]
python(+0x196a11)[0x5593f54ada11]
python(+0x19ccc5)[0x5593f54b3cc5]
python(_PyEval_EvalFrameDefault+0x30a)[0x5593f54d819a]
python(+0x1967db)[0x5593f54ad7db]
python(+0x19ccc5)[0x5593f54b3cc5]
python(_PyEval_EvalFrameDefault+0x30a)[0x5593f54d819a]
python(+0x1959a6)[0x5593f54ac9a6]
python(+0x196a11)[0x5593f54ada11]
python(+0x19ccc5)[0x5593f54b3cc5]
python(_PyEval_EvalFrameDefault+0x1021)[0x5593f54d8eb1]
python(PyEval_EvalCodeEx+0x329)[0x5593f54ae529]
python(PyEval_EvalCode+0x1c)[0x5593f54af2cc]
python(+0x214af4)[0x5593f552baf4]
python(PyRun_FileExFlags+0xa1)[0x5593f552bef1]
python(PyRun_SimpleFileExFlags+0x1c4)[0x5593f552c0f4]
python(Py_Main+0x648)[0x5593f552fc28]
python(main+0xee)[0x5593f53f771e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7faa6f34a830]
python(+0x1c7c98)[0x5593f54dec98]
======= Memory map: ========
10000000-10001000 rw-s 00000000 00:06 328                                /dev/nvidia0
10001000-10002000 rw-s 00000000 00:06 328                                /dev/nvidia0
10002000-10003000 rw-s 00000000 00:06 328                                /dev/nvidia0
10003000-10004000 rw-s 00000000 00:06 328                                /dev/nvidia0
10004000-10005000 rw-s 00000000 00:06 328                                /dev/nvidia0
10005000-10006000 rw-s 00000000 00:06 328                                /dev/nvidia0
10006000-10007000 rw-s 00000000 00:06 328                                /dev/nvidia0
10007000-10008000 rw-s 00000000 00:06 328                                /dev/nvidia0
10008000-10009000 rw-s 00000000 00:06 328                                /dev/nvidia0
10009000-1000a000 rw-s 00000000 00:06 328                                /dev/nvidia0
1000a000-1000b000 rw-s 00000000 00:06 328                                /dev/nvidia0
1000b000-1000c000 rw-s 00000000 00:06 328                                /dev/nvidia0
1000c000-1000d000 rw-s 00000000 00:06 328                                /dev/nvidia0
1000d000-1000e000 rw-s 00000000 00:06 328                                /dev/nvidia0
1000e000-1000f000 rw-s 00000000 00:06 328                                /dev/nvidia0
1000f000-10010000 rw-s 00000000 00:06 328                                /dev/nvidia0
10010000-20000000 ---p 00000000 00:00 0
200000000-200100000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200100000-200104000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200104000-200120000 ---p 00000000 00:00 0
200120000-200520000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200520000-200524000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200524000-200540000 ---p 00000000 00:00 0
200540000-200940000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200940000-200944000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200944000-200960000 ---p 00000000 00:00 0
200960000-200d60000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200d60000-200d64000 rw-s 00000000 00:06 327                              /dev/nvidiactl
200d64000-200d80000 ---p 00000000 00:00 0
200d80000-201180000 rw-s 00000000 00:06 327                              /dev/nvidiactl
201180000-201184000 rw-s 00000000 00:06 327                              /dev/nvidiactl
201184000-2011a0000 ---p 00000000 00:00 0
2011a0000-2015a0000 rw-s 00000000 00:06 327                              /dev/nvidiactl
2015a0000-2015a4000 rw-s 00000000 00:06 327                              /dev/nvidiactl
2015a4000-2015c0000 ---p 00000000 00:00 0
2015c0000-2019c0000 rw-s 00000000 00:06 327                              /dev/nvidiactl
2019c0000-2019c4000 rw-s 00000000 00:06 327                              /dev/nvidiactl
2019c4000-2019e0000 ---p 00000000 00:00 0
2019e0000-201de0000 rw-s 00000000 00:06 327                              /dev/nvidiactl
201de0000-201de4000 rw-s 00000000 00:06 327                              /dev/nvidiactl
201de4000-201e00000 ---p 00000000 00:00 0
201e00000-202200000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202200000-202204000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202204000-202220000 ---p 00000000 00:00 0
202220000-202620000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202620000-202624000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202624000-202640000 ---p 00000000 00:00 0
202640000-202a40000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202a40000-202a44000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202a44000-202a60000 ---p 00000000 00:00 0
202a60000-202e60000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202e60000-202e64000 rw-s 00000000 00:06 327                              /dev/nvidiactl
202e64000-202e80000 ---p 00000000 00:00 0
202e80000-203280000 rw-s 00000000 00:06 327                              /dev/nvidiactl
203280000-203284000 rw-s 00000000 00:06 327                              /dev/nvidiactl
203284000-2032a0000 ---p 00000000 00:00 0
2032a0000-2036a0000 rw-s 00000000 00:06 327                              /dev/nvidiactl
2036a0000-2036a4000 rw-s 00000000 00:06 327                              /dev/nvidiactl
2036a4000-2036c0000 ---p 00000000 00:00 0
2036c0000-203ac0000 rw-s 00000000 00:06 327                              /dev/nvidiactl
203ac0000-203ac4000 rw-s 00000000 00:06 327                              /dev/nvidiactl
203ac4000-203ae0000 ---p 00000000 00:00 0
203ae0000-203ee0000 rw-s 00000000 00:06 327                              /dev/nvidiactl

@Davidnet
Copy link
Author

Davidnet commented Apr 3, 2018

Anything that I can do/help to solve this.

@karmel
Copy link
Contributor

karmel commented Apr 3, 2018

It's possible this is a CUDA/CuDNN compatibility issue:

2018-04-02 10:05:28.814376: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000).  If using a binary install, up
grade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
  1. What happens if you install CuDNN 7.0 and retry?
  2. Just to make sure-- are you able to run standard TF using the GPU? For example, can you run official/mnist/mnist.py successfully?

@Davidnet
Copy link
Author

Davidnet commented Apr 3, 2018

Yes I can make the whole training and execution on the device, I already try to use cudnn 7005 with no success, bear in mind that the all these logs are generated (except the first one) with cuda 9 and cudnn 7.0.0. So yes, I have been trying to put the standard environment.

@karmel
Copy link
Contributor

karmel commented Apr 3, 2018

I'm a little confused in that case; based on the error message, it looks like maybe TF was build with 7.0, but you are currently running with 7.1. The initial env details above also indicate CuDNN 7.1. Can you provide the output to the following commands?

nvcc --version
echo $LD_LIBRARY_PATH
echo $CUDA_HOME
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR

(Note that the last of these may actually be a different path if your cuda path is non-standard.)

@Davidnet
Copy link
Author

Davidnet commented Apr 4, 2018

Oh I might have to clarify, the first error was using a standard aws instance which comes preinstalled with cuda and cudnn, then I created another instance on which I installed cuda 9.0 with cudnn 7.0 specific, so that I have the most standard configuration to replicate the bug.

@karmel
Copy link
Contributor

karmel commented Apr 4, 2018

I see. To debug, I would gather:

  • the output of nvidia-smi
  • the processor and available memory on the instance
  • whether --native mode runs without errors
  • what happens if you try different values of --workspace_size (1<<10, 1<<20)
  • can you get an instance with a P100? Does everything work in that scenario?

@Davidnet
Copy link
Author

Davidnet commented Apr 4, 2018

nvidia-smi: Reports the python process until it hits the stack trace. arounf 20 % usage.
the processor and available memory on the instance: Standard p2 aws, Intel Xeon E5, 64 gb ram
native flag: same output as fp32
what happens if you try different values of --workspace_size (1<<10, 1<<20)
Same with all possible paddings
P100 Not posible

@RegisGraptin
Copy link

I have the same error and i do a
sudo apt-get upgrade
To get the new version

@karmel
Copy link
Contributor

karmel commented Apr 4, 2018

It sounds like there is something about your local env/C/py/something that is perhaps causing problems.

  1. May as well try an apt-get upgrade as @rere-corporation suggests.
  2. If you just run the --native, it still fails? That is surprising; can you paste the error in the case of just running native mode?
  3. Did you build TF from source? In any case, try running in a fresh venv with pip install tf-nightly-gpu
  4. If that still fails, try running with Python 2; I wouldn't think that would matter, but worth a try.
  5. If it still fails... time to bisect and debug. Add tf.logging.info() statements to the tensorrt script to see where precisely the script fails; can you provide more detail about what section of the code is failing? The error trace implies during conversion of the graph, but that doesn't happen in --native mode, so if it's failing in native mode as well, must be something else.

@RegisGraptin
Copy link

When i reboot my pc, i had the error same error of you :
Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000). If using a binary install, up grade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
Finnaly I change the version of cuda for the 8.0 because one of my colleague said that the cuda 9.1 is not already stable.
Hoping that this will help you

@tfboyd
Copy link
Member

tfboyd commented Apr 4, 2018

@rere-corporation The error you are getting is because we had an issue where we were enforcing the minor cuDNN version TensorFlow is compiled with to match the minor version installed locally. In this case 7.0.5 (what it was compiled with) does not match 7.1.2 (what is installed) After working with NVIDIA they updated the documentation to indicating this should work fine and we updated TensorFlow. I believe the change is in the nightly builds and will be in TF 1.8 forward.

I did my testing on AWS p3 instances using docker. If you want to try my pip package (which is custom built) it was built with 9.0 and cuDNN 7.0.5 and will complain if you have 7.1 installed. This what I used for testing on March 18th and things have changed but it might be of interest so I am sharing.
https://s3-us-west-2.amazonaws.com/tf-benchmark/tf_binary/tensorflow-1.6.0rc1.6e20f3b_TRT_AVX-cp27-cp27mu-linux_x86_64.whl

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label Apr 6, 2018
@hariag
Copy link

hariag commented Apr 7, 2018

@Davidnet I seen libnvinfer.so.4 in your log, but tensorflow 1.7 require tensorrt-3.0.4, see here https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/tensorrt

@biery
Copy link

biery commented Apr 7, 2018

Had this same issue and pip install tf-nightly-gpu is what solved for me.

Was running tf 1.7, nvcc 9.0, and cudnn 7

@karmel karmel added the stat:awaiting response Waiting on input from the contributor label Apr 8, 2018
@karmel karmel unassigned k-w-w Apr 8, 2018
@neosinha
Copy link

neosinha commented Apr 9, 2018

Switching over to tf-nightly-gpu worked for me too.

@chasewinds
Copy link

I suffer from the same problem with cuda 9.1 and cudnn 7.1, tf-nightly-gpu can't help me.
Can I uninstall cuda and cudnn and install lower version without uninstall tensorflow?
My mistake information:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15037 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
INFO:tensorflow:Restoring parameters from log/inception/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Starting queue runners.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Saving checkpoint to path log/inception/model.ckpt
2018-04-12 12:55:33.659970: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

@chasewinds
Copy link

I solve my problem by reinstall cudnn 7.0 and reinstall tensorflow1.7

@fxblanco88
Copy link

Had this same issue and pip install tf-nightly-gpu is what solved for me.

Was running tf 1.7, nvcc 9.0, and cudnn 7.1.2 over debian 8

@abhisheksgumadi
Copy link

abhisheksgumadi commented Apr 13, 2018

Yes, I spinned off an Amazon V100 gpu instance with the Deep Learning AMI that comes with cudnn 9.0. It all runs fine if you install tf-nightly-gpu with Python 2.7 and run it without changing anything to cudnn or cuda or anything to do with AMI. The tf-nightly-gpu got installed with version 1.8

@weizh888
Copy link

weizh888 commented Apr 13, 2018

A new error appeared for the test, after pip install tf-nightly-gpu:

virtualenv/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so: undefined symbol: _ZNK10tensorflow17StringPieceHasherclENS_11StringPieceE

Running on Ubuntu 16.04, Python 3.5, TensorFlow-GPU-1.7, CUDA 9.0.176, cudnn v7

When running ./configure, set TensorRT path to the path of tar xzvf TensorRT-4.0.0.3

Can anyone help?

@Davidnet
Copy link
Author

I've demagled the symbol, it appears to be a: tensorflow::StringPieceHasher::operator()(tensorflow::StringPiece) are we using some hash on the test?

@viksit
Copy link

viksit commented Apr 14, 2018

Confirmed - on ubuntu 16.04/cuda9.0/cudnn7102 - a tf-nightly-gpu solved this problem for me.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label Apr 14, 2018
@bloodraven66
Copy link

tf-nightly gpu solved it!Found it after 6 hours of reinstalling everything multiple times and research,thanks a lot!

@donuzium
Copy link

tf-nightly-gpu solved this problem for me too.
CentOS 7.4.1708/cuda9.0/cudnn7102

@csindic
Copy link

csindic commented Apr 18, 2018

So I have a similar problem in combination with other problems of similar obscurity. Nothing on this feed seems to work :( I am running on Windows 10 (unfortunately). I have tried all combinations of CUDA toolkit and CuDNN (versions, that is). I am using anaconda and pip for most of my modules. Someone please help!

WARNING:tensorflow:From C:\Users\caleb\AppData\Local\Continuum\anaconda3\envs\gpu2\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
WARNING:tensorflow:From C:\Users\caleb\AppData\Local\Continuum\anaconda3\envs\gpu2\lib\site-packages\object_detection\trainer.py:228: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
WARNING:root:Variable [FeatureExtractor/MobilenetV1/Conv2d_0/BatchNorm/beta/ExponentialMovingAverage] is not available in checkpoint


A LOT more of the same warning for different items (~100)


WARNING:root:Variable [FeatureExtractor/MobilenetV1/Conv2d_9_pointwise/weights/RMSProp_1] is not available in checkpoint
WARNING:tensorflow:From C:\Users\caleb\AppData\Local\Continuum\anaconda3\envs\gpu2\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:736: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
WARNING:tensorflow:From C:\Users\caleb\AppData\Local\Continuum\anaconda3\envs\gpu2\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:736: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-04-17 21:32:14.570705: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-04-17 21:32:14.978433: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties:
name: Quadro P5000 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:73:00.0
totalMemory: 16.00GiB freeMemory: 13.38GiB
2018-04-17 21:32:15.000099: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-17 21:33:14.097334: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-17 21:33:14.110286: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2018-04-17 21:33:14.115284: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2018-04-17 21:33:14.120510: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 12974 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:73:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from ssd_mobilenet_v1_coco_11_06_2017/model.ckpt
INFO:tensorflow:Restoring parameters from ssd_mobilenet_v1_coco_11_06_2017/model.ckpt
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
2018-04-17 21:36:12.158184: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:396] Loaded runtime CuDNN library: 7103 (compatibility version 7100) but source was compiled with 7003 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-04-17 21:36:12.189298: F T:\src\github\tensorflow\tensorflow\core\kernels\conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)`


I should also note that this maxes out utilization of both of my Processors and all of my (very abundant) RAM.

@karmel
Copy link
Contributor

karmel commented Apr 18, 2018

@csindic This appears to be a different problem. I am going to close this issue, as the original problem seems to have been solved with fixing versioning across the libraries. @csindic , please resubmit and pay attention to the issue template (https://github.com/tensorflow/tensorflow/issues/new).

@karmel karmel closed this as completed Apr 18, 2018
@jerryhouuu
Copy link

@weizh888 hi, i had the same problem, do you solve it or can anyone help?

Traceback (most recent call last):
  File "tftrt_sample.py", line 24, in <module>
    import tensorflow.contrib.tensorrt as trt
  File "/home/jerry/.local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/__init__.py", line 25, in <module>
    from tensorflow.contrib.tensorrt.python import *
  File "/home/jerry/.local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/python/__init__.py", line 23, in <module>
    from tensorflow.contrib.tensorrt.python.trt_convert import calib_graph_to_infer_graph
  File "/home/jerry/.local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 23, in <module>
    from tensorflow.contrib.tensorrt.wrap_conversion import calib_convert
  File "/home/jerry/.local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/wrap_conversion.py", line 28, in <module>
    _wrap_conversion = swig_import_helper()
  File "/home/jerry/.local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/wrap_conversion.py", line 24, in swig_import_helper
    _mod = imp.load_module('_wrap_conversion', fp, pathname, description)
ImportError: /home/jerry/.local/lib/python2.7/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so: undefined symbol: _ZNK10tensorflow11StringPiece8containsES0_

@MacwinWin
Copy link

@Davidnet I meet the same problem about TensorRT, Have you solved it?

2018-05-08 16:11:39.979209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-08 16:11:39.979516: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-05-08 16:11:40.197749: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 100 max workspace size= 33554432
2018-05-08 16:11:40.197774: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
munmap_chunk(): invalid pointer
Aborted (core dumped)

System: Ubuntu 18.04
TensorRT version: 4.0.0.3
CUDA version: 9.0
cuDNN version: 7.1.3
TensorFlow version: 1.8.0
GPU: GTX 1080

@karmel
Copy link
Contributor

karmel commented May 8, 2018

@MacwinWin , @jerryhouuu These appear to be different issues; please open new issues to keep the responses clear and distinct.

@weizh888
Copy link

@jerryhouuu It was solved.
Install the TensorFlow GPU version, either from pip or source.

@bangxiangyong
Copy link

I had this same problem and for me, tf-nightly-gpu solved this problem! 💯

@Hao-Zhao
Copy link

@weizh888
Hello, Did you solved the problem 'undefined symbol....' only by reinstalling tensorflow-gpu? Or you used tf-nightly-gpu as well?

@estathop
Copy link

estathop commented Oct 4, 2018

2018-10-04 15:06:23.577089: E tensorflow/stream_executor/cuda/cuda_dnn.cc:343] Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.2.1.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Segmentation fault

@estathop
Copy link

estathop commented Oct 4, 2018

pip install tf-nightly-gpu
this fixed the error in my virtual environment created by anaconda

@zgbkdlm
Copy link

zgbkdlm commented Oct 9, 2018

I downgraded cudnn7.1.3 to 7.1.2 solved this problem.

@saskra
Copy link

saskra commented Oct 17, 2018

2018-10-04 15:06:23.577089: E tensorflow/stream_executor/cuda/cuda_dnn.cc:343] Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.2.1.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Segmentation fault

pip install tf-nightly-gpu
this fixed the error in my virtual environment created by anaconda

I have the same problem, but could not solve it with your solution. Strange thing: I have installed Cudnn 7.3.1 - still I get the same version numbers as you did in the error message.

@Hongyun1993
Copy link

cool, tf-nightly-gpu solved the problem !

@TulipDi
Copy link

TulipDi commented Dec 20, 2018

@saskra I have the same problem. Do you solve it?

@saskra
Copy link

saskra commented Dec 20, 2018

@saskra I have the same problem. Do you solve it?

Yes, with this tutorial: https://github.com/pplcc/ubuntu-tensorflow-pytorch-setup

@XunOuyang
Copy link

I have the same problem. It got resolved by convert the CUDNN from 7.1.3-cuda8.0_0 to 7.0.5-cuda8.0_0. (bash: conda install cudnn=7.0)

@Le0000000000n
Copy link

I have the same error message here. But I am using cudnn 7005 instead of 7500. I don't even have cudnn 7500 installed on my docker. May I ask what should I do?

2019-06-18 08:17:41.727989: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7501 (compatibility version 7500) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2019-06-18 08:17:41.731328: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests