Graph optimized using tf.contrib.tensorrt is not loadable with TF_GraphImportGraphDef #23853

yegord · 2018-11-19T12:42:38Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): Source.
TensorFlow version (use command below): v1.12
Python version: 2.7.12
Bazel version (if compiling from source): 0.19.0
GCC/Compiler version (if compiling from source): 5.4.0
CUDA/cuDNN version: 9.0/7.0.5
GPU model and memory: 1080 Ti

Describe the current behavior

I optimize a TensorFlow graph with

    precision_mode = 'FP32'  # "FP32","FP16" or "INT8"
    graph_def = trt.create_inference_graph(
        input_graph_def=graph_def,
        outputs=output_node_names,
        max_batch_size=num_cameras,
        max_workspace_size_bytes=4*10**9,
        precision_mode=precision_mode,
        minimum_segment_size=10,  # minimum number of nodes in an engine,
    )

, save the resulting graph, and try to load it in a C++ program using C API.

First, I call

TF_LoadLibrary("/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tensorrt/python/ops/_trt_engine_op.so", status)

and call TF_GraphImportGraphDef with the optimized graph.

I get the following error:

TF_GraphImportGraphDef: No shape inference function exists for op 'TRTEngineOp', did you forget to define it?

Describe the expected behavior

The call to TF_GraphImportGraphDef must succeed.

Code to reproduce the issue

It seems that the issue, although not in the bug tracker, should be already known to the authors: https://github.com/tensorflow/tensorflow/blob/v1.12.0/tensorflow/contrib/tensorrt/ops/trt_engine_op.cc#L46
However, I can make a minimal example to reproduce the problem on demand.

Other info / logs

It is a pain that a TRT-optimized graph cannot be used outside of python now.
I would be happy to know about a workaround, in case one exists.

The text was updated successfully, but these errors were encountered:

samikama · 2018-11-26T22:47:58Z

Hello @yegord,

Could you please link your application with trt_conversion.so and trt_engine_op_op_lib

sujitbiswas · 2018-11-27T01:08:17Z

@samikama

this is with respect to #23243

Where is the lib located, “trt_conversion.so” , can you please tell?

There is library named _wrap_conversion.so

TensorFlow.loadLibrary("/home/sujitb/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/python/ops/_trt_engine_op.so")
TensorFlow.loadLibrary("/home/sujitb/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so")

Exception in thread "main" java.lang.UnsatisfiedLinkError: /home/sujitb/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so: undefined symbol: _Py_NoneStruct
                at org.tensorflow.TensorFlow.loadLibrary(TensorFlow.java:47)
                at com.nvidia.tf.InspectModel2$.main(InspectModel2.scala:22)
                at com.nvidia.tf.InspectModel2.main(InspectModel2.scala)

asimshankar · 2018-11-27T01:47:54Z

So there seem to be two issues here:

That there is no shape inference function registered (the Python API uses a backdoor to be okay with that, but ideally we want all operations to have a shape inference function, even if that function says "Unknown Shape". And we want to get rid of that backdoor). I'll try a fix for that.
For reasons I'm not quite clear on (@aaroey @samikama may know), the TRTEngine operation's kernel is included in a Python specific target (//tensorflow/contrib/tensorrt:wrap_conversion) instead of in the shared library for the op. I suspect/hope this can be changed to make the kernel independent of Python. Will look into it.

yegord · 2018-11-27T16:32:22Z

@asimshankar Excellent summary, thanks!

@samikama So, I cherry-picked the patch enabling the shape function (4fbbeea).
I also applied the following changes:

diff --git a/tensorflow/BUILD b/tensorflow/BUILD
index 9b62a50..254ad51 100644
--- a/tensorflow/BUILD
+++ b/tensorflow/BUILD
@@ -443,6 +443,9 @@ tf_cc_shared_object(
         "//tensorflow/c:version_script.lds",
         "//tensorflow/c/eager:c_api",
         "//tensorflow/core:tensorflow",
+        "//tensorflow/contrib/tensorrt:trt_conversion",
+        "//tensorflow/contrib/tensorrt:trt_engine_op_op_lib",
+        "//tensorflow/contrib/tensorrt:trt_engine_op_kernel",
     ],
 )

(Somehow without trt_engine_op_kernel the kernel was not successfully registered.)

As a result, I get the expected performance boost of around 10% against vanilla TensorFlow graph.
However, my C++ application starts crashing after dozens of seconds running, with the following message:

2018-11-27 19:19:10.135505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-27 19:19:11.164065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 19:19:11.164123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-11-27 19:19:11.164136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-11-27 19:19:11.164648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-11-27 19:19:52.094600: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2018-11-27 19:19:52.094660: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
ssd_detection2:terminate_handler.cpp:25: terminate_handler(): abort
0. /usr/lib/libassert.so(+0x32d0) [0x7fcafecdb2d0]
1. /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fcad54544b0]
2. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7fcad5454428]
3. /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7fcad545602a]
4. /usr/lib/libtensorflow_framework.so(+0x6eeab7) [0x7fcaca33bab7]
5. /usr/lib/libtensorflow_framework.so(_ZN10tensorflow8EventMgr10PollEventsEbPN4absl13InlinedVectorINS0_5InUseELm4ESaIS3_EEE+0xf3) [0x7fcaca306633]
6. /usr/lib/libtensorflow_framework.so(_ZN10tensorflow8EventMgr8PollLoopEv+0xce) [0x7fcaca306dee]
7. /usr/lib/libtensorflow_framework.so(_ZN5Eigen26NonBlockingThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x241) [0x7fcaca30c441]
8. /usr/lib/libtensorflow_framework.so(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x37) [0x7fcaca30a007]
9. /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fcad5dc0c80]
10. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fcae428c6ba]
11. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fcad552641d]

The application runs TF_SessionRun on the same session from two threads in parallel.
If I disable one of the threads, the crash goes away.
So, it is either plain OOM (caught too late), or some data race.
Does it ring the bell to anybody?

aaroey · 2018-11-27T21:55:17Z

@pooyadavoodi may have an idea for the crash problem.
Also, @yegord do you have a repro for that? Thanks.

yegord · 2018-11-27T22:35:29Z

If you have a hypothesis — shoot, I will check it.
If the cause is not that clear, I will make a minimal example, but it will take another day or so.

samikama · 2018-11-27T23:14:42Z

@yegord I thought TF_SessionRun() is not thread-safe.

pooyadavoodi · 2018-11-28T00:01:44Z

Could you reduce max_workspace_size_bytes and also use allow_growth in the session config and see if the problem persists.

asimshankar · 2018-11-28T00:12:56Z

@samikama : TF_SessionRun is thread-safe (and op kernels are supposed to be too).

pooyadavoodi · 2018-11-28T04:23:47Z

@asimshankar Excellent summary, thanks!

@samikama So, I cherry-picked the patch enabling the shape function (4fbbeea).
I also applied the following changes:

diff --git a/tensorflow/BUILD b/tensorflow/BUILD
index 9b62a50..254ad51 100644
--- a/tensorflow/BUILD
+++ b/tensorflow/BUILD
@@ -443,6 +443,9 @@ tf_cc_shared_object(
         "//tensorflow/c:version_script.lds",
         "//tensorflow/c/eager:c_api",
         "//tensorflow/core:tensorflow",
+        "//tensorflow/contrib/tensorrt:trt_conversion",
+        "//tensorflow/contrib/tensorrt:trt_engine_op_op_lib",
+        "//tensorflow/contrib/tensorrt:trt_engine_op_kernel",
     ],
 )

(Somehow without trt_engine_op_kernel the kernel was not successfully registered.)

As a result, I get the expected performance boost of around 10% against vanilla TensorFlow graph.
However, my C++ application starts crashing after dozens of seconds running, with the following message:

2018-11-27 19:19:10.135505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-27 19:19:11.164065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 19:19:11.164123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-11-27 19:19:11.164136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-11-27 19:19:11.164648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-11-27 19:19:52.094600: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2018-11-27 19:19:52.094660: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
ssd_detection2:terminate_handler.cpp:25: terminate_handler(): abort
0. /usr/lib/libassert.so(+0x32d0) [0x7fcafecdb2d0]
1. /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fcad54544b0]
2. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7fcad5454428]
3. /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7fcad545602a]
4. /usr/lib/libtensorflow_framework.so(+0x6eeab7) [0x7fcaca33bab7]
5. /usr/lib/libtensorflow_framework.so(_ZN10tensorflow8EventMgr10PollEventsEbPN4absl13InlinedVectorINS0_5InUseELm4ESaIS3_EEE+0xf3) [0x7fcaca306633]
6. /usr/lib/libtensorflow_framework.so(_ZN10tensorflow8EventMgr8PollLoopEv+0xce) [0x7fcaca306dee]
7. /usr/lib/libtensorflow_framework.so(_ZN5Eigen26NonBlockingThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x241) [0x7fcaca30c441]
8. /usr/lib/libtensorflow_framework.so(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x37) [0x7fcaca30a007]
9. /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fcad5dc0c80]
10. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fcae428c6ba]
11. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fcad552641d]

The application runs TF_SessionRun on the same session from two threads in parallel.
If I disable one of the threads, the crash goes away.
So, it is either plain OOM (caught too late), or some data race.
Does it ring the bell to anybody?

@yegord Could you provide a repro. We need to look into the kernel registration that you mentioned above.

yegord · 2018-11-28T15:10:07Z

So, it is either plain OOM (caught too late), or some data race.

Does not look like an OOM, because reducing the input image size by a factor of many (6-fold or so) does not fix it.

use allow_growth in the session config

Already there.

reduce max_workspace_size_bytes

Reducing from 4*10**9 to 2*10**9 does not make the crash go away.

I'll be back with a repro then.

yegord · 2018-11-30T17:04:26Z

Please find the repro here: https://github.com/yegord/tf-trt-linking-and-data-race-example
make && ./main should reproduce the crash.

The error message that I personally observe is here: https://github.com/yegord/tf-trt-linking-and-data-race-example/blob/master/crash.txt

Tensorflow version being used (v1.12 with two patches: uncomment the shape function and link tensorrt operation into libtensorflow.so): https://github.com/yegord/tensorflow/tree/issue-23853

TensorFlow is installed with bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package //tensorflow:libtensorflow.so && bazel-bin/tensorflow/tools/pip_package/build_pip_package .. && sudo pip uninstall -y tensorflow; sudo pip install ../tensorflow*.whl && sudo cp bazel-bin/tensorflow/*.so /usr/lib && sudo mkdir -p /usr/lib/tensorflow/c && sudo cp tensorflow/c/c_api.h /usr/include/tensorflow/c

The repro demonstrates two points. First, there must be a way to link for an external user to link against something from TensorFlow (without patching TensorFlow like I did) and be able to start using TensorRT operation. Second, parallel calls to TensorRT operations should not crash the process.

yegord · 2018-11-30T17:20:56Z

As of the crash, it might happen because you seem call enqueue on the same nvinfer1::IExecutionContext instance in parallel:

tensorflow/tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc

Line 406 in 5d45c7e

auto ret = trt_execution_context_ptr->enqueue(num_batch, &buffers[0], *stream,

And, as my coworker noticed, this is not thread-safe: https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#thread-safety

samikama · 2018-11-30T18:19:43Z

@ yegord,
That is correct. adding a mutex around the call should solve it since enqueue is pretty lightweight. There were plans to move away from class member execution context but things are reprioritized. I will take a look at your example and get back to you.

aaroey · 2019-01-11T05:51:38Z

I can reproduce the error and I added a mutex which did solve the problem. I'll make a fix soon.

… of their operations are not thread safe. This fixed one of the issues mentioned in #23853 PiperOrigin-RevId: 228947504

… of their operations are not thread safe. This fixed one of the issues mentioned in tensorflow#23853 PiperOrigin-RevId: 228947504

1. the new calibration design. The current int8 calibration workflow depends on a global resouce manager singleton TRTResourceManager (in resources/trt_resource_manager.h). This has been: - violating the resource manager design: resource manager should be per-device - pollutes the BUILD dependencies, and makes the kernel implementation not able to be used in other language bindings (Issue #23853) 2. the custom backend offline mode, where we'll do the conversion during execution and provide an offline tool to get the serizlied engine PiperOrigin-RevId: 229654702

yegord · 2019-01-22T15:59:24Z

Apparently, there is also a data race leading to a crash during the parallel creation of multiple dynamic int8 engines. Could you have a look?

The repro is here: https://github.com/yegord/tf-trt-linking-and-data-race-example/tree/crash-with-int8-engines

The error message that I see: https://github.com/yegord/tf-trt-linking-and-data-race-example/blob/crash-with-int8-engines/crash.txt

The TensorFlow version used is 1.12 with few patches from you: https://github.com/yegord/tensorflow/tree/issue-23853-2

Build instructions are as above: #23853 (comment)

(I hope you do not mind that I pile up remotely related issues into a single ticket.)

Thanks!

aaroey · 2019-01-30T16:24:26Z

Thanks for the repro @yegord, I'll try it and get back to you.

…he trt grappler optimizer, op kernels, and ops. This library will be included in pip build, so users can use TF-TRT without building TF from source in C++. This solves an issue mention by #23853 (TF-TRT not loadable with TF_GraphImportGraphDef). PiperOrigin-RevId: 238140294

aaroey · 2019-03-13T15:16:42Z

The original problem in this issue is fixed: if we install the latest nightly pip package, we should see the TF-TRT shared library in site-packages/tensorflow/compiler/tf2tensorrt/python/ops/libtftrt.so.

@yegord, to solve your linking problem, I have an example in https://github.com/aaroey/tensorflow/blob/issue_repros/test/fixed-issue23853/Makefile#L14.

aaroey · 2019-03-13T18:34:33Z

For the INT8 calibration problem, I believe it's fixed at HEAD. Here is a (fixed) repo for it: https://github.com/aaroey/tensorflow/blob/issue_repros/test/fixed-issue23853-int8/Makefile

I'm closing this, feel free to let me know if there are any questions. Thanks.

sujitbiswas mentioned this issue Nov 27, 2018

Using TensorRT in TensorFlow from java #23243

Closed

aaroey self-assigned this Nov 27, 2018

tensorflow-copybara pushed a commit that referenced this issue Jan 11, 2019

Add a mutex for each (ICudaEngine, IExecutionContext) pair since some…

e51fa30

… of their operations are not thread safe. This fixed one of the issues mentioned in #23853 PiperOrigin-RevId: 228947504

aaroey mentioned this issue Jan 11, 2019

r1.13-rc0 cherry-pick request: fix races in TF-TRT integration #24862

Merged

aaroey closed this as completed Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph optimized using tf.contrib.tensorrt is not loadable with TF_GraphImportGraphDef #23853

Graph optimized using tf.contrib.tensorrt is not loadable with TF_GraphImportGraphDef #23853

yegord commented Nov 19, 2018

samikama commented Nov 26, 2018

sujitbiswas commented Nov 27, 2018

asimshankar commented Nov 27, 2018

yegord commented Nov 27, 2018 •

edited

aaroey commented Nov 27, 2018

yegord commented Nov 27, 2018

samikama commented Nov 27, 2018

pooyadavoodi commented Nov 28, 2018

asimshankar commented Nov 28, 2018

pooyadavoodi commented Nov 28, 2018

yegord commented Nov 28, 2018 •

edited

yegord commented Nov 30, 2018

yegord commented Nov 30, 2018 •

edited

samikama commented Nov 30, 2018 •

edited

aaroey commented Jan 11, 2019

yegord commented Jan 22, 2019 •

edited

aaroey commented Jan 30, 2019

aaroey commented Mar 13, 2019 •

edited

aaroey commented Mar 13, 2019

Graph optimized using tf.contrib.tensorrt is not loadable with TF_GraphImportGraphDef #23853

Graph optimized using tf.contrib.tensorrt is not loadable with TF_GraphImportGraphDef #23853

Comments

yegord commented Nov 19, 2018

samikama commented Nov 26, 2018

sujitbiswas commented Nov 27, 2018

asimshankar commented Nov 27, 2018

yegord commented Nov 27, 2018 • edited

aaroey commented Nov 27, 2018

yegord commented Nov 27, 2018

samikama commented Nov 27, 2018

pooyadavoodi commented Nov 28, 2018

asimshankar commented Nov 28, 2018

pooyadavoodi commented Nov 28, 2018

yegord commented Nov 28, 2018 • edited

yegord commented Nov 30, 2018

yegord commented Nov 30, 2018 • edited

samikama commented Nov 30, 2018 • edited

aaroey commented Jan 11, 2019

yegord commented Jan 22, 2019 • edited

aaroey commented Jan 30, 2019

aaroey commented Mar 13, 2019 • edited

aaroey commented Mar 13, 2019

yegord commented Nov 27, 2018 •

edited

yegord commented Nov 28, 2018 •

edited

yegord commented Nov 30, 2018 •

edited

samikama commented Nov 30, 2018 •

edited

yegord commented Jan 22, 2019 •

edited

aaroey commented Mar 13, 2019 •

edited