Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT Can't identify the cuda device #21487

Closed
qinyao-he opened this issue Aug 8, 2018 · 8 comments · Fixed by #21508
Closed

TensorRT Can't identify the cuda device #21487

qinyao-he opened this issue Aug 8, 2018 · 8 comments · Fixed by #21508
Assignees

Comments

@qinyao-he
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
    source
  • TensorFlow version (use command below):
    ('v1.9.0-rc2-1924-g054b046', '1.10.0-rc1'). Current master branch
  • Python version: 2.7
  • Bazel version (if compiling from source): 0.15.2
  • GCC/Compiler version (if compiling from source): 5.4
  • CUDA/cuDNN version: CUDA 9.0 cuDNN 7.1.4
  • GPU model and memory: Titan V
  • Exact command to reproduce:
    First train the example small model use mnist.py.
    Then use tensorflow built-in tools to freeze the graph:
python -m tensorflow.python.tools.freeze_graph --input_graph log/graph.pbtxt --input_checkpoint log/model.ckpt-20000 --output_node_names softmax_tensor --output_graph log/freeze_graph.pb

Finally use tensorrt.py to optimize the graph use TensorRT engine.

Describe the problem

The log shows TensorRT could not find cuda devices. And the graph remains unchanged after the conversion.

Source code / logs

2018-08-08 13:47:27.322236: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-08-08 13:47:27.322317: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2018-08-08 13:47:27.322928: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-08-08 13:47:27.327805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:03:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-08-08 13:47:27.327827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1485] Adding visible gpu devices: 0
2018-08-08 13:47:27.694717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:966] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-08 13:47:27.694749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] 0
2018-08-08 13:47:27.694754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 0: N
2018-08-08 13:47:27.694984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10938 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:03:00.0, compute capability: 7.0)
2018-08-08 13:47:27.935823: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2923] Segment @scope '', converted to graph
2018-08-08 13:47:27.935851: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:415] Can't find a device placement for the op!
2018-08-08 13:47:27.946155: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:799] Cluster is set but device '' is not found in the cluster
2018-08-08 13:47:27.946194: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:916] Can't identify the cuda device. Running on device 0
2018-08-08 13:47:28.915936: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.916287: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.916332: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:933] Engine my_trt_op_0 creation for segment 0, composed of 22 nodes failed: Internal: Failed to build TensorRT engine. Skipping...
2018-08-08 13:47:28.967359: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2923] Segment @scope '', converted to graph
2018-08-08 13:47:28.967390: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:415] Can't find a device placement for the op!
2018-08-08 13:47:28.968228: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:799] Cluster is set but device '' is not found in the cluster
2018-08-08 13:47:28.968242: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:916] Can't identify the cuda device. Running on device 0
2018-08-08 13:47:28.974512: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.974935: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.974968: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:933] Engine my_trt_op_0 creation for segment 0, composed of 22 nodes failed: Internal: Failed to build TensorRT engine. Skipping...
2018-08-08 13:47:28.979975: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:198] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-08-08 13:47:28.981037: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:198] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-08-08 13:47:28.982158: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:403] Optimization results for grappler item: tf_graph
2018-08-08 13:47:28.982172: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 52 nodes (-11), 51 edges (-13), time = 72.462ms.
2018-08-08 13:47:28.982176: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] layout: Graph size after: 52 nodes (0), 51 edges (0), time = 9.414ms.
2018-08-08 13:47:28.982179: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 52 nodes (0), 51 edges (0), time = 1003.37ms.
2018-08-08 13:47:28.982183: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 52 nodes (0), 51 edges (0), time = 34.038ms.
2018-08-08 13:47:28.982186: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 52 nodes (0), 51 edges (0), time = 20.332ms.
2018-08-08 13:47:28.982193: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:403] Optimization results for grappler item: my_trt_op_0_native_segment
2018-08-08 13:47:28.982198: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 23 nodes (0), 22 edges (0), time = 1.062ms.
2018-08-08 13:47:28.982204: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] layout: Graph size after: 23 nodes (0), 22 edges (0), time = 0.657ms.
2018-08-08 13:47:28.982240: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 23 nodes (0), 22 edges (0), time = 0.16ms.
2018-08-08 13:47:28.982246: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 23 nodes (0), 22 edges (0), time = 0.898ms.
2018-08-08 13:47:28.982253: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 23 nodes (0), 22 edges (0), time = 0.14ms.
2018-08-08 13:47:29.038047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1485] Adding visible gpu devices: 0
2018-08-08 13:47:29.038105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:966] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-08 13:47:29.038120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] 0
2018-08-08 13:47:29.038131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 0: N
2018-08-08 13:47:29.038327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10938 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:03:00.0, compute capability: 7.0)

@qinyao-he
Copy link
Author

@aaroey

@aaroey aaroey self-assigned this Aug 8, 2018
@aaroey
Copy link
Member

aaroey commented Aug 9, 2018

@qinyao-he are you using tensorrt 4? The 1.9rc and 1.10 was built agains trt 3.0, so if you run with trt 4.0 there could be problem like this.

@aaroey
Copy link
Member

aaroey commented Aug 9, 2018

It should print the loaded trt version before the first log message Number of eligible GPUs (core count >= 8): 1 as shown above. If you want to use trt 4.0 would you please build from master? We plan to make trt 4.0 as the default in r1.11 but it'll take a while.

@qinyao-he
Copy link
Author

@aaroey I actually use TensorRT 4, and I indeed build from master myself.

@aaroey
Copy link
Member

aaroey commented Aug 9, 2018

Well I just managed to reproduce the problem using your script. It turns out the error was caused by another problem: the device is not set in the engine and the converter will use default cuda malloc for memory allocation during the conversion. I'll fix it.

As a work around, it should work if you add with graph.device('gpu:0') when building your model for training. Or you may read the ./log/freeze_graph.pb, import it inside a with graph.device('gpu:0') context, write it out as a new ./log/freeze_graph.pb, and use the new one to do the conversion.

@jiarenyf
Copy link

jiarenyf commented Aug 13, 2018

I use tensorrt 3.0.4, and came out with the error:

Can't determine the device, constructing an allocator at device 0

I have try to add with graph.device('gpu:0') when building model, then error came up as:

Non-OK-status: GpuIdManager::TfToCudaGpuId(tf_gpu_id, &cuda_gpu_id) status: Not found: TensorFlow device GPU:0 was not registered

hopelessness ...

@aaroey

@aaroey
Copy link
Member

aaroey commented Aug 13, 2018

I have try to add with graph.device('gpu:0') when building model, then error came up as:

Non-OK-status: GpuIdManager::TfToCudaGpuId(tf_gpu_id, &cuda_gpu_id) status: Not found: TensorFlow device GPU:0 was not registered

I think this is because the device is not initialized when you call trt.create_inference_graph(). Are you running with TF r1.10? Could you retry by initializing a session before calling trt.create_inference_graph()?

Actually I think this should be fixed in master by #20318. Could you also help to tried with master?

Thanks.

aaroey added a commit to aaroey/tensorflow that referenced this issue Aug 21, 2018
@MachineJeff
Copy link

MachineJeff commented Nov 1, 2019

maybe you should upgrade your cuda driver version.
You need to ensure that your driver version matches or exceeds your CUDA Toolkit version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants