TensorRT Can't identify the cuda device #21487

qinyao-he · 2018-08-08T20:49:46Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 16.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary):
source
TensorFlow version (use command below):
('v1.9.0-rc2-1924-g054b046', '1.10.0-rc1'). Current master branch
Python version: 2.7
Bazel version (if compiling from source): 0.15.2
GCC/Compiler version (if compiling from source): 5.4
CUDA/cuDNN version: CUDA 9.0 cuDNN 7.1.4
GPU model and memory: Titan V
Exact command to reproduce:
First train the example small model use mnist.py.
Then use tensorflow built-in tools to freeze the graph:

python -m tensorflow.python.tools.freeze_graph --input_graph log/graph.pbtxt --input_checkpoint log/model.ckpt-20000 --output_node_names softmax_tensor --output_graph log/freeze_graph.pb

Finally use tensorrt.py to optimize the graph use TensorRT engine.

Describe the problem

The log shows TensorRT could not find cuda devices. And the graph remains unchanged after the conversion.

Source code / logs

2018-08-08 13:47:27.322236: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-08-08 13:47:27.322317: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2018-08-08 13:47:27.322928: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-08-08 13:47:27.327805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:03:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-08-08 13:47:27.327827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1485] Adding visible gpu devices: 0
2018-08-08 13:47:27.694717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:966] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-08 13:47:27.694749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] 0
2018-08-08 13:47:27.694754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 0: N
2018-08-08 13:47:27.694984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10938 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:03:00.0, compute capability: 7.0)
2018-08-08 13:47:27.935823: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2923] Segment @scope '', converted to graph
2018-08-08 13:47:27.935851: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:415] Can't find a device placement for the op!
2018-08-08 13:47:27.946155: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:799] Cluster is set but device '' is not found in the cluster
2018-08-08 13:47:27.946194: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:916] Can't identify the cuda device. Running on device 0
2018-08-08 13:47:28.915936: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.916287: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.916332: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:933] Engine my_trt_op_0 creation for segment 0, composed of 22 nodes failed: Internal: Failed to build TensorRT engine. Skipping...
2018-08-08 13:47:28.967359: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2923] Segment @scope '', converted to graph
2018-08-08 13:47:28.967390: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:415] Can't find a device placement for the op!
2018-08-08 13:47:28.968228: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:799] Cluster is set but device '' is not found in the cluster
2018-08-08 13:47:28.968242: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:916] Can't identify the cuda device. Running on device 0
2018-08-08 13:47:28.974512: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.974935: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger runtime.cpp (16) - Cuda Error in allocate: 2
2018-08-08 13:47:28.974968: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:933] Engine my_trt_op_0 creation for segment 0, composed of 22 nodes failed: Internal: Failed to build TensorRT engine. Skipping...
2018-08-08 13:47:28.979975: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:198] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-08-08 13:47:28.981037: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:198] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-08-08 13:47:28.982158: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:403] Optimization results for grappler item: tf_graph
2018-08-08 13:47:28.982172: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 52 nodes (-11), 51 edges (-13), time = 72.462ms.
2018-08-08 13:47:28.982176: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] layout: Graph size after: 52 nodes (0), 51 edges (0), time = 9.414ms.
2018-08-08 13:47:28.982179: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 52 nodes (0), 51 edges (0), time = 1003.37ms.
2018-08-08 13:47:28.982183: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 52 nodes (0), 51 edges (0), time = 34.038ms.
2018-08-08 13:47:28.982186: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 52 nodes (0), 51 edges (0), time = 20.332ms.
2018-08-08 13:47:28.982193: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:403] Optimization results for grappler item: my_trt_op_0_native_segment
2018-08-08 13:47:28.982198: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 23 nodes (0), 22 edges (0), time = 1.062ms.
2018-08-08 13:47:28.982204: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] layout: Graph size after: 23 nodes (0), 22 edges (0), time = 0.657ms.
2018-08-08 13:47:28.982240: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 23 nodes (0), 22 edges (0), time = 0.16ms.
2018-08-08 13:47:28.982246: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] constant folding: Graph size after: 23 nodes (0), 22 edges (0), time = 0.898ms.
2018-08-08 13:47:28.982253: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:405] TensorRTOptimizer: Graph size after: 23 nodes (0), 22 edges (0), time = 0.14ms.
2018-08-08 13:47:29.038047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1485] Adding visible gpu devices: 0
2018-08-08 13:47:29.038105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:966] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-08 13:47:29.038120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] 0
2018-08-08 13:47:29.038131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 0: N
2018-08-08 13:47:29.038327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10938 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:03:00.0, compute capability: 7.0)

The text was updated successfully, but these errors were encountered:

qinyao-he · 2018-08-08T20:52:09Z

@aaroey

aaroey · 2018-08-09T04:37:06Z

@qinyao-he are you using tensorrt 4? The 1.9rc and 1.10 was built agains trt 3.0, so if you run with trt 4.0 there could be problem like this.

aaroey · 2018-08-09T04:39:22Z

It should print the loaded trt version before the first log message Number of eligible GPUs (core count >= 8): 1 as shown above. If you want to use trt 4.0 would you please build from master? We plan to make trt 4.0 as the default in r1.11 but it'll take a while.

qinyao-he · 2018-08-09T04:41:32Z

@aaroey I actually use TensorRT 4, and I indeed build from master myself.

aaroey · 2018-08-09T06:00:47Z

Well I just managed to reproduce the problem using your script. It turns out the error was caused by another problem: the device is not set in the engine and the converter will use default cuda malloc for memory allocation during the conversion. I'll fix it.

As a work around, it should work if you add with graph.device('gpu:0') when building your model for training. Or you may read the ./log/freeze_graph.pb, import it inside a with graph.device('gpu:0') context, write it out as a new ./log/freeze_graph.pb, and use the new one to do the conversion.

jiarenyf · 2018-08-13T09:51:27Z

I use tensorrt 3.0.4, and came out with the error:

Can't determine the device, constructing an allocator at device 0

I have try to add with graph.device('gpu:0') when building model, then error came up as:

Non-OK-status: GpuIdManager::TfToCudaGpuId(tf_gpu_id, &cuda_gpu_id) status: Not found: TensorFlow device GPU:0 was not registered

hopelessness ...

@aaroey

aaroey · 2018-08-13T15:11:02Z

I have try to add with graph.device('gpu:0') when building model, then error came up as:

Non-OK-status: GpuIdManager::TfToCudaGpuId(tf_gpu_id, &cuda_gpu_id) status: Not found: TensorFlow device GPU:0 was not registered

I think this is because the device is not initialized when you call trt.create_inference_graph(). Are you running with TF r1.10? Could you retry by initializing a session before calling trt.create_inference_graph()?

Actually I think this should be fixed in master by #20318. Could you also help to tried with master?

Thanks.

MachineJeff · 2019-11-01T08:04:06Z

maybe you should upgrade your cuda driver version.
You need to ensure that your driver version matches or exceeds your CUDA Toolkit version.

aaroey self-assigned this Aug 8, 2018

aaroey mentioned this issue Aug 9, 2018

Try to find an allocator when the engine is not assigned a device. #21508

Merged

tensorflow-copybara closed this as completed in #21508 Aug 21, 2018

aaroey added a commit to aaroey/tensorflow that referenced this issue Aug 21, 2018

fix tensorflow#21487

0af9236

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT Can't identify the cuda device #21487

TensorRT Can't identify the cuda device #21487

qinyao-he commented Aug 8, 2018

qinyao-he commented Aug 8, 2018

aaroey commented Aug 9, 2018

aaroey commented Aug 9, 2018

qinyao-he commented Aug 9, 2018

aaroey commented Aug 9, 2018

jiarenyf commented Aug 13, 2018 •

edited

aaroey commented Aug 13, 2018

MachineJeff commented Nov 1, 2019 •

edited

TensorRT Can't identify the cuda device #21487

TensorRT Can't identify the cuda device #21487

Comments

qinyao-he commented Aug 8, 2018

System information

Describe the problem

Source code / logs

qinyao-he commented Aug 8, 2018

aaroey commented Aug 9, 2018

aaroey commented Aug 9, 2018

qinyao-he commented Aug 9, 2018

aaroey commented Aug 9, 2018

jiarenyf commented Aug 13, 2018 • edited

aaroey commented Aug 13, 2018

MachineJeff commented Nov 1, 2019 • edited

jiarenyf commented Aug 13, 2018 •

edited

MachineJeff commented Nov 1, 2019 •

edited