Inference slowdown of 3x with TF-TensorRT integration with C API #22425

dhingratul · 2018-09-20T21:13:49Z

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
The form below must be filled out.
It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): r1.11
Python version: 2.7
Bazel version (if compiling from source): 0.16.1
GCC/Compiler version (if compiling from source): 5.4.0
CUDA/cuDNN version:9/7.1
GPU model and memory: 1080ti/11gb
Exact command to reproduce:

Clone tensorflow repo, checkout r1.11 branch
Build from source as directed from documentation by disabling everything except cuda, tensorrt(4.0.1.6)
Create a tensorrt .pb file using following:

    trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode="FP32",  # TRT Engine precision "FP32","FP16" or "INT8"
    minimum_segment_size=2  # minimum number of nodes in an engine
    )
    f = open("trt.pb", 'w')
    f.write(trt_graph.SerializeToString())
    f.close()

Use Tensorflow C API to run infernce on the protobuf file

Issue: I see a slowdown of 3x for a model with C API, but this is not reproducible with Python API with the same model.

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

The text was updated successfully, but these errors were encountered:

tensorflowbutler · 2018-10-06T18:43:21Z

Nagging Assignee @aaroey: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

aaroey · 2018-10-07T04:51:59Z

Hi @dhingratul, is it possible for you to provide the .pb file from which you observe this slowdown? Also, could you provide the c API code that you used to load and run the file?

Thanks.

dhingratul · 2018-10-08T16:19:40Z

@aaroey The bug is resolved. Python API and C API now are consistent after increasing the max batch_size parameter from the log information

tensorflowbutler assigned Harshini-Gadige Sep 21, 2018

Harshini-Gadige assigned aaroey and unassigned Harshini-Gadige Sep 21, 2018

dhingratul closed this as completed Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference slowdown of 3x with TF-TensorRT integration with C API #22425

Inference slowdown of 3x with TF-TensorRT integration with C API #22425

dhingratul commented Sep 20, 2018 •

edited

tensorflowbutler commented Oct 6, 2018

aaroey commented Oct 7, 2018

dhingratul commented Oct 8, 2018

Inference slowdown of 3x with TF-TensorRT integration with C API #22425

Inference slowdown of 3x with TF-TensorRT integration with C API #22425

Comments

dhingratul commented Sep 20, 2018 • edited

System information

Describe the problem

Source code / logs

tensorflowbutler commented Oct 6, 2018

aaroey commented Oct 7, 2018

dhingratul commented Oct 8, 2018

dhingratul commented Sep 20, 2018 •

edited