TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

dhingratul · 2018-10-09T22:22:21Z

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
The form below must be filled out.
It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):n/a
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: n/a
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): r1.12
Python version: 3.5
Bazel version (if compiling from source): 0.16.1
GCC/Compiler version (if compiling from source): 5.4.0
CUDA/cuDNN version:9/7.1
GPU model and memory:1080ti
Exact command to reproduce:

Follow workflow from here, https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/
Error at the following piece of code
trt_graph=trt.calib_graph_to_infer_graph(calibGraph)
Log:
File "/home/dhingratul/.virtualenvs/tf_trt_source_trt5rc_tf1_12/local/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 349, in calib_graph_to_infer_graph
for n in calibration_graph_def.node:
AttributeError: 'Graph' object has no attribute 'node'

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

The text was updated successfully, but these errors were encountered:

samikama · 2018-10-11T19:08:08Z

@dhingratul, Are you passing a graph object or a graphdef object? The method expects a graphdef.

dhingratul · 2018-10-12T17:03:03Z

@samikama This is all what I have tried

Create int8 graphdef with trt.create_inference_graph and use trt.calib_graph_to_infer_graph(trt_graph)
--> FailedPrecondition Error -- Need to run graph with calib data
Create int8 graphdef as above, import as graph, run it with calib data, use trt.calib_graph_to_infer_graph(trt_graph)
--> 'Graph' object has no attribute 'node'
Create int8 graphdef as above, import as graph, run it with calib data, write it out as .pb file with tf.train.write_graph(), import graphdef and pass it to trt.calib_graph_to_infer_graph(trt_graph)
--> FailedPrecondition Error -- Need to run graph with calib data

For running the graph through calibration data, i need to import it as graph, I don't know if there is a right way to get back to graphdef apart from 3

samikama · 2018-10-12T17:22:12Z

@dhingratul, are you running calib_to_infer_graph in the same process? if you exit the process between calibration and baking of the calibration? Also you need to pass graph that returned to you from trt.create_inference_graph() in the first step to calib_graph_to_infer_graph() in the third step not the graph from tf.train.write_graph(). You need to import it as graph to be able to run.

create inference graph
run it with calibration data
pass the graph_def returned in 1 to calib_graph_to_infer_graph(). you can discard the graph you run in step 2, it is only used for collecting calibration data.

dhingratul · 2018-10-12T17:27:57Z

@samikama I get this workflow, but In order to run calibration on graphdef generated in 1, I need to import the graphdef generated as a graph and then use sess.run to complete calibration. Now I have a graph, not a graphdef, so the only way to pass graphdef to the calib_graph_to_infer_graph() is to export it as .pb and import graphdef. I donot know how to convert a graph to a graphdef to go from step 2 to step3

samikama · 2018-10-12T20:04:08Z

@dhingratul, You are trying to pass the graph in step 2 in to the step 3. Please pass the graphdef object that you created in step 1 and used to import into the graph, not the serialized graphdef of the graph in the second step.

dhingratul · 2018-10-12T20:19:29Z

@samikama Based on your recommendation, this is the workflow I have, but i am facing a weird CUBLAS issue now
2018-10-12 13:15:52.338802: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_INTERNAL_ERROR

fid = "model.pb"
output_nodenames = 'op1,op2,op3'
output_node = list(output_nodenames.split(","))
g = load_graph(fid)
with tf.Session(graph=g) as sess:
	# Step 1 -- Create Inference graph
    trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=20000,
    max_workspace_size_bytes=1 << 25,
    precision_mode="INT8",  # TRT Engine precision "FP32","FP16" or "INT8"
    minimum_segment_size=2  # minimum number of nodes in an engine
    )
    # Step 2 -- Calibration
    tf.import_graph_def(
    trt_graph,
    name='', #DEBUG
	)
    num_samples = 10
    np.random.seed(0)
    ip1_data = np.random.rand(num_samples,700,800,6).astype(np.float32)
    ip1 = tf.get_default_graph().get_tensor_by_name("ip1:0")

    ip2_data = np.random.rand(4).astype(np.float32)
    ip2 = tf.get_default_graph().get_tensor_by_name("ip2:0")

    ip3_data = np.random.rand(20000,6).astype(np.float32)
    ip3 = tf.get_default_graph().get_tensor_by_name("ip3:0")

    ip4_data = np.random.rand(20000,4).astype(np.float32)
    ip4 = tf.get_default_graph().get_tensor_by_name("ip4:0")
    out1 = tf.get_default_graph().get_tensor_by_name("op1:0")
    out2 = tf.get_default_graph().get_tensor_by_name("op2:0")
    out3 = tf.get_default_graph().get_tensor_by_name("op3:0")

    for i in range(num_samples):
        start_time = timeit.default_timer()
        _ = sess.run([out1, out2, out3], feed_dict={ip1:ip1_data[i], ip2:ip2_data, ip3:ip3_data, ip4:ip4_data})
    # Step 3 -- pass through calib_graph_to_infer_graph
    trt_graph = trt.calib_graph_to_infer_graph(trt_graph)
    with tf.gfile.GFile("trt.pb", "wb") as f:
        f.write(trt_graph.SerializeToString())

wt-huang · 2018-10-26T13:54:34Z

@dhingratul You may want to check whether there are other processes running on GPU.

dhingratul · 2018-10-26T17:23:29Z

@wt-huang No other processes running on the GPU

wt-huang · 2018-11-09T19:00:50Z

@dhingratul Could you try to install TensorFlow from binary instead? Also use cuDNN 7.3 and Python 3.6. Make sure that cuBLAS library are correctly installed. You can also provide your env by running the script in the issue template.

dhingratul · 2018-11-09T19:07:16Z

@wt-huang I have TensorRT tarballs, not .deb hence i don't know how to install tf with pip and provide the correct path to my TRT. Can you expand on this

benjamintanweihao · 2018-11-24T04:10:49Z

@dhingratul One thing I found very useful is to first check if an TRTEngineOp has been created:

    trt_engine_opts = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
    print('TRT Engine Op: {}'.format(trt_engine_opts))
    assert trt_engine_opts > 0, 'No TRT Engine Ops!'

dhingratul · 2018-11-29T17:57:55Z

@benjamintanweihao Had that been the case, i would have gotten error like this, #21850 (comment)

gsrujana89 · 2019-05-09T22:12:17Z

Hi, any resolutions to this? I am getting the same issue with nvidia docker 19.01-py2

ymodak · 2019-05-13T19:38:54Z

@dhingratul Is this still an issue?
@gsrujana89 Can you please post a new issue and provide all the information asked by the template? Thanks!

dhingratul · 2019-05-13T20:29:59Z

I will have to reproduce the issue with the TRT 5GA, it was still an issue until TRT 5RC

zhixuanli · 2019-09-29T07:51:38Z

Still having problem of CUBLAS_STATUS_INTERNAL_ERROR cublasSgemm

doublexxking · 2020-02-13T08:34:23Z

I have same problem with tf 1.15 tensorrt 6.
Calibrate step cost all of the memory and error as following:
E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger ../rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)

mohantym · 2021-11-25T00:41:46Z

Hi @dhingratul !
It seems you are using older versions(1.x versions) of Tensorflow. Have you checked these threads from latest versions(TF 2.6/2.7) yet?link1,link2. Thanks!

google-ml-butler · 2021-12-02T01:30:17Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-12-09T01:51:30Z

Closing as stale. Please reopen if you'd like to work on this further.

dhingratul changed the title ~~TensorRT issue with TF r1.12 and TRT 5RC~~ TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC Oct 9, 2018

tensorflowbutler assigned Harshini-Gadige Oct 10, 2018

Harshini-Gadige assigned wt-huang and unassigned Harshini-Gadige Oct 10, 2018

wt-huang added the comp:lite TF Lite related issues label Oct 19, 2018

wt-huang removed the comp:lite TF Lite related issues label Nov 8, 2018

wt-huang added the comp:model Model related issues label Nov 22, 2018

gsrujana89 unassigned wt-huang May 9, 2019

ymodak self-assigned this May 13, 2019

ymodak added comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author and removed comp:model Model related issues labels May 13, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 14, 2019

sanjoy added the comp:gpu:tensorrt Issues specific to TensorRT label Dec 26, 2019

ymodak removed their assignment Jun 2, 2020

mohantym added the TF 1.12 Issues related to TF 1.12 label Nov 25, 2021

mohantym self-assigned this Nov 25, 2021

mohantym added the stat:awaiting response Status - Awaiting response from author label Nov 25, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Dec 2, 2021

google-ml-butler bot closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

dhingratul commented Oct 9, 2018

samikama commented Oct 11, 2018

dhingratul commented Oct 12, 2018 •

edited

Loading

samikama commented Oct 12, 2018

dhingratul commented Oct 12, 2018 •

edited

Loading

samikama commented Oct 12, 2018

dhingratul commented Oct 12, 2018 •

edited

Loading

wt-huang commented Oct 26, 2018

dhingratul commented Oct 26, 2018

wt-huang commented Nov 9, 2018

dhingratul commented Nov 9, 2018

benjamintanweihao commented Nov 24, 2018

dhingratul commented Nov 29, 2018

gsrujana89 commented May 9, 2019

ymodak commented May 13, 2019

dhingratul commented May 13, 2019

zhixuanli commented Sep 29, 2019

doublexxking commented Feb 13, 2020

mohantym commented Nov 25, 2021

google-ml-butler bot commented Dec 2, 2021

google-ml-butler bot commented Dec 9, 2021

TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

Comments

dhingratul commented Oct 9, 2018

System information

Describe the problem

Source code / logs

samikama commented Oct 11, 2018

dhingratul commented Oct 12, 2018 • edited Loading

samikama commented Oct 12, 2018

dhingratul commented Oct 12, 2018 • edited Loading

samikama commented Oct 12, 2018

dhingratul commented Oct 12, 2018 • edited Loading

wt-huang commented Oct 26, 2018

dhingratul commented Oct 26, 2018

wt-huang commented Nov 9, 2018

dhingratul commented Nov 9, 2018

benjamintanweihao commented Nov 24, 2018

dhingratul commented Nov 29, 2018

gsrujana89 commented May 9, 2019

ymodak commented May 13, 2019

dhingratul commented May 13, 2019

zhixuanli commented Sep 29, 2019

doublexxking commented Feb 13, 2020

mohantym commented Nov 25, 2021

google-ml-butler bot commented Dec 2, 2021

google-ml-butler bot commented Dec 9, 2021

dhingratul commented Oct 12, 2018 •

edited

Loading

dhingratul commented Oct 12, 2018 •

edited

Loading

dhingratul commented Oct 12, 2018 •

edited

Loading