Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

Closed
dhingratul opened this issue Oct 9, 2018 · 20 comments
Closed

TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC #22854

dhingratul opened this issue Oct 9, 2018 · 20 comments
Assignees
Labels
comp:gpu:tensorrt Issues specific to TensorRT comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 1.12 Issues related to TF 1.12

Comments

@dhingratul
Copy link

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):n/a
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: n/a
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): r1.12
  • Python version: 3.5
  • Bazel version (if compiling from source): 0.16.1
  • GCC/Compiler version (if compiling from source): 5.4.0
  • CUDA/cuDNN version:9/7.1
  • GPU model and memory:1080ti
  • Exact command to reproduce:

Follow workflow from here, https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/
Error at the following piece of code
trt_graph=trt.calib_graph_to_infer_graph(calibGraph)
Log:
File "/home/dhingratul/.virtualenvs/tf_trt_source_trt5rc_tf1_12/local/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 349, in calib_graph_to_infer_graph
for n in calibration_graph_def.node:
AttributeError: 'Graph' object has no attribute 'node'

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

@dhingratul dhingratul changed the title TensorRT issue with TF r1.12 and TRT 5RC TensorRT INT8 calibration doesn't work with TF r1.12 and TRT 5RC Oct 9, 2018
@samikama
Copy link
Contributor

@dhingratul, Are you passing a graph object or a graphdef object? The method expects a graphdef.

@dhingratul
Copy link
Author

dhingratul commented Oct 12, 2018

@samikama This is all what I have tried

  1. Create int8 graphdef with trt.create_inference_graph and use trt.calib_graph_to_infer_graph(trt_graph)
    --> FailedPrecondition Error -- Need to run graph with calib data
  2. Create int8 graphdef as above, import as graph, run it with calib data, use trt.calib_graph_to_infer_graph(trt_graph)
    --> 'Graph' object has no attribute 'node'
  3. Create int8 graphdef as above, import as graph, run it with calib data, write it out as .pb file with tf.train.write_graph(), import graphdef and pass it to trt.calib_graph_to_infer_graph(trt_graph)
    --> FailedPrecondition Error -- Need to run graph with calib data

For running the graph through calibration data, i need to import it as graph, I don't know if there is a right way to get back to graphdef apart from 3

@samikama
Copy link
Contributor

@dhingratul, are you running calib_to_infer_graph in the same process? if you exit the process between calibration and baking of the calibration? Also you need to pass graph that returned to you from trt.create_inference_graph() in the first step to calib_graph_to_infer_graph() in the third step not the graph from tf.train.write_graph(). You need to import it as graph to be able to run.

  1. create inference graph
  2. run it with calibration data
  3. pass the graph_def returned in 1 to calib_graph_to_infer_graph(). you can discard the graph you run in step 2, it is only used for collecting calibration data.

@dhingratul
Copy link
Author

dhingratul commented Oct 12, 2018

@samikama I get this workflow, but In order to run calibration on graphdef generated in 1, I need to import the graphdef generated as a graph and then use sess.run to complete calibration. Now I have a graph, not a graphdef, so the only way to pass graphdef to the calib_graph_to_infer_graph() is to export it as .pb and import graphdef. I donot know how to convert a graph to a graphdef to go from step 2 to step3

@samikama
Copy link
Contributor

@dhingratul, You are trying to pass the graph in step 2 in to the step 3. Please pass the graphdef object that you created in step 1 and used to import into the graph, not the serialized graphdef of the graph in the second step.

@dhingratul
Copy link
Author

dhingratul commented Oct 12, 2018

@samikama Based on your recommendation, this is the workflow I have, but i am facing a weird CUBLAS issue now
2018-10-12 13:15:52.338802: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_INTERNAL_ERROR

fid = "model.pb"
output_nodenames = 'op1,op2,op3'
output_node = list(output_nodenames.split(","))
g = load_graph(fid)
with tf.Session(graph=g) as sess:
	# Step 1 -- Create Inference graph
    trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=20000,
    max_workspace_size_bytes=1 << 25,
    precision_mode="INT8",  # TRT Engine precision "FP32","FP16" or "INT8"
    minimum_segment_size=2  # minimum number of nodes in an engine
    )
    # Step 2 -- Calibration
    tf.import_graph_def(
    trt_graph,
    name='', #DEBUG
	)
    num_samples = 10
    np.random.seed(0)
    ip1_data = np.random.rand(num_samples,700,800,6).astype(np.float32)
    ip1 = tf.get_default_graph().get_tensor_by_name("ip1:0")

    ip2_data = np.random.rand(4).astype(np.float32)
    ip2 = tf.get_default_graph().get_tensor_by_name("ip2:0")

    ip3_data = np.random.rand(20000,6).astype(np.float32)
    ip3 = tf.get_default_graph().get_tensor_by_name("ip3:0")

    ip4_data = np.random.rand(20000,4).astype(np.float32)
    ip4 = tf.get_default_graph().get_tensor_by_name("ip4:0")
    out1 = tf.get_default_graph().get_tensor_by_name("op1:0")
    out2 = tf.get_default_graph().get_tensor_by_name("op2:0")
    out3 = tf.get_default_graph().get_tensor_by_name("op3:0")

    for i in range(num_samples):
        start_time = timeit.default_timer()
        _ = sess.run([out1, out2, out3], feed_dict={ip1:ip1_data[i], ip2:ip2_data, ip3:ip3_data, ip4:ip4_data})
    # Step 3 -- pass through calib_graph_to_infer_graph
    trt_graph = trt.calib_graph_to_infer_graph(trt_graph)
    with tf.gfile.GFile("trt.pb", "wb") as f:
        f.write(trt_graph.SerializeToString())

@wt-huang wt-huang added the comp:lite TF Lite related issues label Oct 19, 2018
@wt-huang
Copy link

@dhingratul You may want to check whether there are other processes running on GPU.

@dhingratul
Copy link
Author

@wt-huang No other processes running on the GPU

@wt-huang wt-huang removed the comp:lite TF Lite related issues label Nov 8, 2018
@wt-huang
Copy link

wt-huang commented Nov 9, 2018

@dhingratul Could you try to install TensorFlow from binary instead? Also use cuDNN 7.3 and Python 3.6. Make sure that cuBLAS library are correctly installed. You can also provide your env by running the script in the issue template.

@dhingratul
Copy link
Author

@wt-huang I have TensorRT tarballs, not .deb hence i don't know how to install tf with pip and provide the correct path to my TRT. Can you expand on this

@wt-huang wt-huang added the comp:model Model related issues label Nov 22, 2018
@benjamintanweihao
Copy link
Contributor

@dhingratul One thing I found very useful is to first check if an TRTEngineOp has been created:

    trt_engine_opts = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
    print('TRT Engine Op: {}'.format(trt_engine_opts))
    assert trt_engine_opts > 0, 'No TRT Engine Ops!'

@dhingratul
Copy link
Author

@benjamintanweihao Had that been the case, i would have gotten error like this, #21850 (comment)

@gsrujana89
Copy link

Hi, any resolutions to this? I am getting the same issue with nvidia docker 19.01-py2

@ymodak ymodak self-assigned this May 13, 2019
@ymodak ymodak added comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author and removed comp:model Model related issues labels May 13, 2019
@ymodak
Copy link
Contributor

ymodak commented May 13, 2019

@dhingratul Is this still an issue?
@gsrujana89 Can you please post a new issue and provide all the information asked by the template? Thanks!

@dhingratul
Copy link
Author

I will have to reproduce the issue with the TRT 5GA, it was still an issue until TRT 5RC

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 14, 2019
@zhixuanli
Copy link

Still having problem of CUBLAS_STATUS_INTERNAL_ERROR cublasSgemm

@sanjoy sanjoy added the comp:gpu:tensorrt Issues specific to TensorRT label Dec 26, 2019
@doublexxking
Copy link

I have same problem with tf 1.15 tensorrt 6.
Calibrate step cost all of the memory and error as following:
E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger ../rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)

@ymodak ymodak removed their assignment Jun 2, 2020
@mohantym mohantym added the TF 1.12 Issues related to TF 1.12 label Nov 25, 2021
@mohantym mohantym self-assigned this Nov 25, 2021
@mohantym
Copy link
Contributor

Hi @dhingratul !
It seems you are using older versions(1.x versions) of Tensorflow. Have you checked these threads from latest versions(TF 2.6/2.7) yet?link1,link2. Thanks!

@mohantym mohantym added the stat:awaiting response Status - Awaiting response from author label Nov 25, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Dec 2, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu:tensorrt Issues specific to TensorRT comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 1.12 Issues related to TF 1.12
Projects
None yet
Development

No branches or pull requests