conv2d_transpose crashes on GPU with zero size batch #13643

peastman · 2017-10-11T19:54:10Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): binary (installed with conda install tensorflow-gpu)
TensorFlow version (use command below): b'unknown' 1.3.0
Python version: 3.5.2
Bazel version (if compiling from source): N/A
CUDA/cuDNN version: 8.0/6.0
GPU model and memory: GTX 980
Exact command to reproduce: See below

Describe the problem

Execute the script below. It works correctly when running on a CPU, but on a GPU it crashes with this error:

tensorflow/stream_executor/cuda/cuda_dnn.cc:430] could not convert BatchDescriptor {count: 0 feature_map_count: 1 spatial: 7 7  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM

The error happens when the first dimension of the input array is 0.

Source code / logs

import tensorflow as tf
import numpy as np

a = tf.placeholder(dtype=tf.float32, shape=(None, 7, 7, 1))
b = tf.contrib.layers.conv2d_transpose(a, num_outputs=16, kernel_size=5, stride=2)
session = tf.Session()
session.run(tf.global_variables_initializer())
print(session.run(b, feed_dict={a: np.zeros((0, 7, 7, 1))}))

The text was updated successfully, but these errors were encountered:

xiaoyongzhu · 2017-11-28T17:48:23Z

I encountered this in TensorFlow 1.4.0 - so looks like this fix is not included in 1.4.0 release. I installed TensorFlow nightly build (tf_nightly_gpu-1.5.0.dev20171127-cp35-cp35m-manylinux1_x86_64.whl) and the nightly build solves the issue.

hpnhxxwn · 2017-11-29T05:11:56Z

I can confirm the issue still exists in my case for the nightly build tf_nightly_gpu-1.5.0.dev20171127-cp35-cp35m-manylinux1_x86_64.whl... :(

Anyone can points us the good build that fixes the issue?

xiaoyongzhu · 2017-11-29T05:15:51Z

@hpnhxxwn OK you are right - it also exists in my app too. The above example runs successfully, but my tensorflow app just crashes:(

reedwm · 2017-11-29T18:44:17Z

The fix did not make it into 1.4, but should be in nightly.

@hpnhxxwn, does the above example not work or is it a different example that does not work? Can you post your CUDA/cudnn/Python/Ubuntu versions and your GPU? I'm not seeing the issue on Ubuntu 14.04, Python 2.7, Cuda 8 and cuDNN 6 with pip install tf-nightly-gpu.

@xiaoyaozhuzi what is the error message when your TensorFlow app crashes? If possible, can you post a small example that reproduces the problem?

hpnhxxwn · 2017-11-30T04:38:43Z

@reedwm

It is my code. It was working for CPU version, but for GPU it did not work and gives me such error. Mine is Ubuntu 16.04, Python 3.5, Cuda 8 and cuDNN 5. Actually I tried both tensorflow 1.14.0 with cuDNN 6 and local build from tenforflow latest source code with cnDNN 5, both have issue. I have also tried with the nightly build xiaoyongzhu pointed yesterday, also has the same issue with cuDNN6. The error message is really mysterious and I do not know where to look at. However, the code works on CPU tensorflow.

My code is below, could you suggest workaround?

def resnet_model(bin_multiple):

    #input and reshape
    inputs = Input(shape=input_shape)
    reshape = Reshape(input_shape_channels)(inputs)

    #normal convnet layer (have to do one initially to get 64 channels)
    conv = Conv2D(64,(1,bin_multiple*note_range),padding="same",activation='relu')(reshape)
    pool = MaxPooling2D(pool_size=(1,2))(conv)

    for i in range(int(np.log2(bin_multiple))-1):
        print i
        #residual block
        bn = BatchNormalization()(pool)
        re = Activation('relu')(bn)
        freq_range = (bin_multiple/(2**(i+1)))*note_range
        print freq_range
        conv = Conv2D(64,(1,freq_range),padding="same",activation='relu')(re)

        #add and downsample
        ad = add([pool,conv])
        pool = MaxPooling2D(pool_size=(1,2))(ad)

    flattened = Flatten()(pool)
    fc = Dense(1024, activation='relu')(flattened)
    do = Dropout(0.5)(fc)
    fc = Dense(512, activation='relu')(do)
    do = Dropout(0.5)(fc)
    outputs = Dense(note_range, activation='sigmoid')(do)

    model = Model(inputs=inputs, outputs=outputs)

    return model

... other code

model = resnet_model(bin_multiple)
init_lr = float(args['init_lr'])
    model.compile(loss='binary_crossentropy',
              optimizer=SGD(lr=init_lr,momentum=0.9), metrics=['accuracy', 'mae', 'categorical_accuracy'])
model.summary()

history = model.fit_generator(trainGen.next(),trainGen.steps(), epochs=epochs, verbose=1,validation_data=valGen.next(),validation_steps=valGen.steps(),callbacks=callbacks, workers=8, use_multiprocessing=True)

reedwm · 2017-11-30T17:11:45Z

@hpnhxxwn I'm not sure what the issue is. Can you post the full error message that occurs? Also if possible can you post the full example? Are you calling conv2d_transpose with zero batch size?

peastman mentioned this issue Oct 11, 2017

Make it easy to create GANs deepchem/deepchem#866

Merged

reedwm self-assigned this Oct 12, 2017

gunan closed this as completed in 7679a2e Oct 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conv2d_transpose crashes on GPU with zero size batch #13643

conv2d_transpose crashes on GPU with zero size batch #13643

peastman commented Oct 11, 2017

xiaoyongzhu commented Nov 28, 2017

hpnhxxwn commented Nov 29, 2017

xiaoyongzhu commented Nov 29, 2017

reedwm commented Nov 29, 2017

hpnhxxwn commented Nov 30, 2017 •

edited

reedwm commented Nov 30, 2017

conv2d_transpose crashes on GPU with zero size batch #13643

conv2d_transpose crashes on GPU with zero size batch #13643

Comments

peastman commented Oct 11, 2017

System information

Describe the problem

Source code / logs

xiaoyongzhu commented Nov 28, 2017

hpnhxxwn commented Nov 29, 2017

xiaoyongzhu commented Nov 29, 2017

reedwm commented Nov 29, 2017

hpnhxxwn commented Nov 30, 2017 • edited

reedwm commented Nov 30, 2017

hpnhxxwn commented Nov 30, 2017 •

edited