Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN launch failure : input shape ([306,1,16,29]) #11

Closed
qianyunw opened this issue Jul 23, 2019 · 5 comments
Closed

cuDNN launch failure : input shape ([306,1,16,29]) #11

qianyunw opened this issue Jul 23, 2019 · 5 comments

Comments

@qianyunw
Copy link

Hi,

thanks for sharing! Sorry to bother you againT_T.

I am currently trying to run your code on my machine(Python 2.7& Tensorflow 1.12.0).
When I run command line "python run_voca.py", there are some problems, The following is the output, is there something wrong with my settings?

Thank you so much!!

<
python run_voca.py --tf_model_fname './model/gstep_52280.model' --ds_fname './ds_graph/output_graph.pb' --audio_fname './audio/test_sentence.wav' --template_fname './template/FLAME_sample.ply' --condition_idx 3 --out_path './animation_output'
2019-07-23 06:49:29.981799: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-07-23 06:49:34.971252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:18:00.0
totalMemory: 10.76GiB freeMemory: 1.87GiB
2019-07-23 06:49:35.165961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:3b:00.0
totalMemory: 10.76GiB freeMemory: 1.77GiB
2019-07-23 06:49:35.287032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:86:00.0
totalMemory: 10.76GiB freeMemory: 10.60GiB
2019-07-23 06:49:35.287328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2
2019-07-23 06:50:29.496448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-23 06:50:29.496516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2
2019-07-23 06:50:29.496526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N N
2019-07-23 06:50:29.496549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N N
2019-07-23 06:50:29.496557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N
2019-07-23 06:50:29.496791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1607 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5)
2019-07-23 06:50:31.759153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 1503 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2019-07-23 06:50:31.759597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10232 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5)
process subj - seq
2019-07-23 06:50:41.148658: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:41.449976: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:42.074583: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:42.386122: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:42.732171: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:51:52.237460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2
2019-07-23 06:51:52.237883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-23 06:51:52.237898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2
2019-07-23 06:51:52.237909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N N
2019-07-23 06:51:52.237916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N N
2019-07-23 06:51:52.237924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N
2019-07-23 06:51:52.238128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1607 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5)
2019-07-23 06:51:52.238453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 1503 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2019-07-23 06:51:52.238653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10232 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5)
2019-07-23 06:52:12.202266: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-07-23 06:52:12.202348: W ./tensorflow/stream_executor/stream.h:2093] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "run_voca.py", line 44, in
inference(tf_model_fname, ds_fname, audio_fname, template_fname, condition_idx, out_path)
File "/home/wangqianyun/voca/utils/inference.py", line 83, in inference
predicted_vertices = np.squeeze(session.run(output_decoder, feed_dict))
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([306,1,16,29])
[[node VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1 (defined at /home/wangqianyun/voca/utils/inference.py:65) = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1-0-TransposeNHWCToNCHW-LayoutOptimizer, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_1, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_2, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg/sub/Switch, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg_1/sub/Switch)]]

Caused by op u'VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1', defined at:
File "run_voca.py", line 44, in
inference(tf_model_fname, ds_fname, audio_fname, template_fname, condition_idx, out_path)
File "/home/wangqianyun/voca/utils/inference.py", line 65, in inference
saver = tf.train.import_meta_graph(tf_model_fname + '.meta')
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1674, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): cuDNN launch failure : input shape ([306,1,16,29])
[[node VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1 (defined at /home/wangqianyun/voca/utils/inference.py:65) = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1-0-TransposeNHWCToNCHW-LayoutOptimizer, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_1, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_2, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg/sub/Switch, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg_1/sub/Switch)]]

@TimoBolkart
Copy link
Owner

Hi,

can you please provide some information about the cuda and cudnn version that you are using?

@qianyunw
Copy link
Author

Hi,

sorry for getting back to you late. I am using cuda 9.0 and cudnn 7. ^_^

@TimoBolkart
Copy link
Owner

The code was tested with cuda 9.0 and cudnn 7.1
Sorry but I don't know what could cause your error

@qianyunw
Copy link
Author

qianyunw commented Aug 8, 2019

Hi,

Thank you so much for your reply ^_^
The problem is caused by gpu memory, I added the following code in inference.py, it works prefectly!
<
os.environ['CUDA_VISIBLE_DEVICES']='2'

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

@TimoBolkart
Copy link
Owner

Hi, great that it works now and thanks a lot for the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants