Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading a TensorRT optimised graph #28854

Closed
satyajithj opened this issue May 20, 2019 · 16 comments
Closed

Error loading a TensorRT optimised graph #28854

satyajithj opened this issue May 20, 2019 · 16 comments
Assignees
Labels
comp:gpu:tensorrt Issues specific to TensorRT comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author type:support Support issues

Comments

@satyajithj
Copy link

I was able to convert a frozen model using the tensorRT API on a Nvidia Tesla P100 on Debian 9 using the command

trt_graph = trt.create_inference_graph(
    input_graph_def=saved_graph,
    outputs=output_names[0:1],
    max_batch_size=1,
    max_workspace_size_bytes=5000000000,
    precision_mode='FP16',
    is_dynamic_op=True
)

I am able to load the graph on the same system. However, when I try to load the graph on my local system which has an Nvidia GeForce GTX 1050M I get the following error.

  File "/home/fuzzybatman/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/fuzzybatman/.local/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 426, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access

tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'TRTEngineOp' in binary running on ceph. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

Is it because my GPU lacks support for TensorRT?

@achandraa achandraa self-assigned this May 21, 2019
@achandraa
Copy link

Just to verify did you get chance to have a look on #22360. Which TensorFlow version you are using?

@achandraa achandraa added the stat:awaiting response Status - Awaiting response from author label May 21, 2019
@satyajithj
Copy link
Author

Thank you for the response. I am using TF r.1.13
I am loading it in python and I did try adding import tensorflow.contrib.tensorrt

On a side note, I posted this question on nvidia devtalk and they answered

A generated TensorRT PLAN is valid for a specific GPU — more precisely, a specific CUDA Compute Capability. For example, if you generate a PLAN for an NVIDIA P4 (compute capability 6.1) you can’t use that PLAN on an NVIDIA Tesla V100 (compute capability 7.0).

This is quite confusing because there are articles online on optimising on one GPU and running on another.

@satyajithj
Copy link
Author

Just noticed that there is a TF version mismatch between the one on my system (1.13) and the one on the GCP VM (1.12). Does this affect the result?

@satyajithj
Copy link
Author

Tried again with a new model. Same error.

@achandraa
Copy link

achandraa commented May 21, 2019

Which CUDA/cuDNN version you are using ?

@satyajithj
Copy link
Author

On my local system it is CUDA 10.1 and cuDNN 7.4.2
On the VM it is CUDA 10.0 and cuDNN 7.4.1

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 22, 2019
@achandraa
Copy link

Please help us with some more info as in whether you are getting this error on TensorFlow installed on your GCP VM or on local system. Which operating system you are using and whether you have installed TensorFlow from source or binary. If you are unclear about the template, you can refer this link. Also kindly verify whether you have followed the instruction from TensorFlow website based on information provided in the template. Thanks!

@achandraa achandraa added comp:gpu GPU related issues type:support Support issues stat:awaiting response Status - Awaiting response from author labels May 23, 2019
@satyajithj
Copy link
Author

satyajithj commented May 23, 2019

Hi. I run the create_inference_graph method on the VM

  • Debian 9
  • CUDA 10.0
  • cuDNN 7.4.1
  • TF 1.13
  • Nvidia Tesla P100 [16GB] (compute capability 6.0)

I try loading the graph for inference on the VM and it works fine.

I try loading the graph on my local system

  • Fedora 30
  • CUDA 10.1
  • cuDNN 7.4.2
  • TF 1.13
  • Nvidia GeForce GTX1050M [4GB] (compute capability 6.1)

and get the error

tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'TRTEngineOp' in binary running on ceph. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

I did not build TF from source. I installed it using pip3 in the terminal.

pip3 install tensorflow-gpu --user

According to a moderator on the nvidia devtalk forum

A generated TensorRT PLAN is valid for a specific GPU — more precisely, a specific CUDA Compute Capability. For example, if you generate a PLAN for an NVIDIA P4 (compute capability 6.1) you can’t use that PLAN on an NVIDIA Tesla V100 (compute capability 7.0).

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 23, 2019
@achandraa achandraa assigned ymodak and unassigned achandraa May 24, 2019
@ymodak ymodak assigned aaroey and unassigned ymodak May 24, 2019
@ymodak ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 24, 2019
@aaroey
Copy link
Member

aaroey commented May 25, 2019

Hi @fuzzyBatman could you try adding:

from tensorflow.contrib.tensorrt.python.ops import trt_engine_op

to your loading script to see if it works?
Thanks.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 25, 2019
@satyajithj
Copy link
Author

@aaroey Same error.

Does the GPU choice not affect this?

@aaroey
Copy link
Member

aaroey commented May 29, 2019

@fuzzyBatman could you share your full script, I'll try and let you know.

@satyajithj
Copy link
Author

I have a TF frozen graph (.pb extension). I load it and run the create_inference_graph method on the GCP VM which has an Nvidia Tesla P100 (16 GB) GPU.

import tensorflow as tf
from tensorflow.python.framework import graph_io
from tensorflow.contrib import tensorrt as trt

def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""

    with tf.gfile.GFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    tf.import_graph_def(graph_def, name='')
    return graph_def

sess = tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.6)))
sess.run(tf.global_variables_initializer())

saved_graph = get_frozen_graph(pbfile)

# Comment the following when loading the TensorRT graph.
print('Creating trt inference graph')

trt_graph = trt.create_inference_graph(
    input_graph_def=saved_graph,
    outputs=output_names[0:1],
    max_batch_size=1,
    max_workspace_size_bytes=4000000000,
    precision_mode='FP16',
    minimum_segment_size=2
)

graph_io.write_graph(trt_graph, "./train_log/faster_rcnn_fpn/", "frcnn_trt.pb", as_text=False)

The above script provides the file frcnn_trt.pb. Now I use the same get_frozen_graph procedure as above, using frcnn_trt.pb, with rest of the code commented out. This works on the same VM but fails on my local system that has an Nvidia GeForce GTX1050M (4 GB) GPU.

@aaroey
Copy link
Member

aaroey commented Sep 17, 2019

@fuzzyBatman sorry I was not able to get to this. Thanks for the scripts, it looks legit to me. By This works on the same VM did you mean that in your VM you can run the TRT converted graph frcnn_trt.pb? By but fails on my local system did you mean it failed with TRTEngineOp not found error? I can imagine it'll fail with some error because TRT engines are not portable, meaning you'd better run the converted graph on a machine that has the same type of GPU as the one on which you ran the conversion.

Also, 1.15.0rc1 is out and 1.15.0 will be out soon, you may want to try with that. Also feel free to provide the pbfile and I'll try your script with that. Thanks.

@sanjoy sanjoy added the comp:gpu:tensorrt Issues specific to TensorRT label Dec 26, 2019
@kumariko kumariko self-assigned this Sep 1, 2021
@kumariko
Copy link

kumariko commented Sep 2, 2021

@fuzzyBatman We are checking to see if you still need help on this issue, as you are using an older version of tensorflow(1.x) which is officially considered as end of life. We recommend that you upgrade to 2.6 which is latest stable version of TF and let us know if the issue still persists in newer versions. we will get you the right help.Thanks!

@kumariko kumariko added the stat:awaiting response Status - Awaiting response from author label Sep 2, 2021
@satyajithj
Copy link
Author

Hi! I stopped working on that project a year ago

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu:tensorrt Issues specific to TensorRT comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author type:support Support issues
Projects
None yet
Development

No branches or pull requests

7 participants