Simple graph invoking tf.complex() doesn't work on GPU, but works on CPU #38443

isaacgerg · 2020-04-10T20:25:56Z

Environment: Windows 10, Python 3.6, Tensorflow 2.1.0-rc2

The code below demonstrates a minimal working example of the bug. This code results in CUDA_ERROR_LAUNCH_FAILED when run on the GPU. But, if you run on the CPU, the code has no issues. I suspect the problem lies in the tensor coming out of tf.complex() as if I do not use that function, the issues seems to go away.

A small working example shows the error I get along with working code to reproduce on Windows 10.

2020-04-10 16:19:43.846387: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-10 16:19:44.860247: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-10 16:19:44.879431: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-10 16:19:45.231402: E tensorflow/stream_executor/cuda/cuda_driver.cc:948] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.231880: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.232665: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.233121: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.233532: E tensorflow/stream_executor/stream.cc:5452] Internal: Failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.233951: E tensorflow/stream_executor/cuda/cuda_driver.cc:613] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.234331: E tensorflow/stream_executor/cuda/cuda_driver.cc:618] error log buffer (1024 bytes): 
2020-04-10 16:19:45.234634: W tensorflow/core/kernels/gpu_utils.cc:68] Failed to check cudnn convolutions for out-of-bounds reads and writes with an error message: 'Failed to load PTX text as a module: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure'; skipping this check. This only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
2020-04-10 16:19:45.235499: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.235957: I tensorflow/stream_executor/stream.cc:4963] [stream=000001EA81FD2D60,impl=000001EA92405DC0] did not memzero GPU location; source: 0000008AE2D3C858
2020-04-10 16:19:45.236342: E tensorflow/stream_executor/cuda/cuda_driver.cc:613] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-04-10 16:19:45.236834: E tensorflow/stream_executor/cuda/cuda_driver.cc:618] error log buffer (1024 bytes): 
2020-04-10 16:19:45.237205: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: cuDNN launch failure : input shape([5,16,256,256]) filter shape([1,1,16,1])
	 [[{{node model/conv2d_1/Conv2D}}]]

import numpy as np
import tensorflow as tf
print(tf.__version__)

input_size = (256,256,1)
input_real = tf.keras.layers.Input(input_size)
input_imag = tf.keras.layers.Input(input_size)

# Get input into mag and phase 
cpx_input = tf.keras.layers.Lambda(lambda x: tf.complex(x[0], x[1]))([input_real, input_imag])    
abs_of_input = tf.math.abs(cpx_input)
phase_of_input =  tf.math.angle(cpx_input) 

# Add some trainiable weights
conv1 = tf.keras.layers.Conv2D(16, 5, padding = 'same')(abs_of_input)
mask = tf.keras.layers.Conv2D(1, 1)(conv1) 
filtered_freq = mask * abs_of_input
reconstructedFreq_dc_centered = tf.complex(mask, 0.0) * tf.math.exp(tf.complex(0.0,1.0)*tf.complex(phase_of_input, 0.0))  # I believe this is the offending line
tmp = tf.math.abs(reconstructedFreq_dc_centered)

model = tf.keras.models.Model([input_real, input_imag], tmp)

model.summary()

model.compile(optimizer='SGD', loss = 'mse')

x_real = np.random.randn(5, 256, 256, 1)
x_imag = np.random.randn(5, 256, 256, 1)

model.train_on_batch(x = [x_real, x_imag], y = x_real)

EDIT 1: Simplified code more.

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2020-04-12T17:14:09Z

@isaacgerg
i ran the code shared by you on tf-nightly and do not face any errors, please find the gist here on cpu same on gpu

isaacgerg · 2020-04-13T13:04:49Z

@Saduf2019
I updated to tf-nightly and the bug still exists. Can you rerun in a Windows 10 environment (that's the environment I mention in the first post of where the error occurs)?

gowthamkpr · 2020-04-14T15:59:02Z

As mentioned in the error message

Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. This message will be only logged once.

This is the reason for you running into the error on windows @isaacgerg

isaacgerg · 2020-04-14T16:21:50Z

@gowthamkpr Why doesnt the driver perform the ptx compilation then? The operation is simple, a FOIL multiply of complex numbers.

sanjoy · 2020-05-28T07:05:02Z

I think the message from redzone_allocator.cc is a red herring and that the CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure errors have some other root cause. Can you please attach the full log?

isaacgerg · 2020-05-28T13:09:40Z

Hi @sanjoy, the fully log is in the first post. Let me know if you need anything else.

isaacgerg · 2020-06-19T18:38:26Z

@sanjoy Any update on this? How can i help?

sanjoy · 2020-07-15T20:30:32Z

Hi @isaacgerg,

It is quite difficult to say much from

2020-04-10 16:19:45.231402: E tensorflow/stream_executor/cuda/cuda_driver.cc:948] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure

if that's all the logs say. Can you try running with CUDA_LAUNCH_BLOCKING=1 set? Maybe that will help narrow this down.

Saduf2019 · 2021-05-03T11:29:23Z

@isaacgerg
Could you please update with respect tot he above comment, or verify with later tf versions [2.4.1] if you still face the issue.

google-ml-butler · 2021-05-10T11:59:47Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-05-17T12:50:53Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2021-05-17T12:51:09Z

Are you satisfied with the resolution of your issue?
Yes
No

isaacgerg added the type:bug Bug label Apr 10, 2020

google-ml-butler bot assigned Saduf2019 Apr 10, 2020

Saduf2019 added the TF 2.1 for tracking issues in 2.1 release label Apr 12, 2020

Saduf2019 added comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author labels Apr 12, 2020

Saduf2019 assigned gowthamkpr and unassigned Saduf2019 Apr 14, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Apr 16, 2020

gowthamkpr assigned sanjoy and unassigned gowthamkpr May 28, 2020

gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 28, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 31, 2020

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label May 3, 2021

Saduf2019 self-assigned this May 3, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label May 10, 2021

google-ml-butler bot closed this as completed May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple graph invoking tf.complex() doesn't work on GPU, but works on CPU #38443

Simple graph invoking tf.complex() doesn't work on GPU, but works on CPU #38443

isaacgerg commented Apr 10, 2020 •

edited

Saduf2019 commented Apr 12, 2020

isaacgerg commented Apr 13, 2020

gowthamkpr commented Apr 14, 2020

isaacgerg commented Apr 14, 2020 •

edited

sanjoy commented May 28, 2020

isaacgerg commented May 28, 2020

isaacgerg commented Jun 19, 2020

sanjoy commented Jul 15, 2020

Saduf2019 commented May 3, 2021

google-ml-butler bot commented May 10, 2021

google-ml-butler bot commented May 17, 2021

google-ml-butler bot commented May 17, 2021

Simple graph invoking tf.complex() doesn't work on GPU, but works on CPU #38443

Simple graph invoking tf.complex() doesn't work on GPU, but works on CPU #38443

Comments

isaacgerg commented Apr 10, 2020 • edited

Saduf2019 commented Apr 12, 2020

isaacgerg commented Apr 13, 2020

gowthamkpr commented Apr 14, 2020

isaacgerg commented Apr 14, 2020 • edited

sanjoy commented May 28, 2020

isaacgerg commented May 28, 2020

isaacgerg commented Jun 19, 2020

sanjoy commented Jul 15, 2020

Saduf2019 commented May 3, 2021

google-ml-butler bot commented May 10, 2021

google-ml-butler bot commented May 17, 2021

google-ml-butler bot commented May 17, 2021

isaacgerg commented Apr 10, 2020 •

edited

isaacgerg commented Apr 14, 2020 •

edited