Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conv2d_transpose crashes on GPU with zero size batch #13643

Closed
peastman opened this issue Oct 11, 2017 · 6 comments
Closed

conv2d_transpose crashes on GPU with zero size batch #13643

peastman opened this issue Oct 11, 2017 · 6 comments
Assignees

Comments

@peastman
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary (installed with conda install tensorflow-gpu)
  • TensorFlow version (use command below): b'unknown' 1.3.0
  • Python version: 3.5.2
  • Bazel version (if compiling from source): N/A
  • CUDA/cuDNN version: 8.0/6.0
  • GPU model and memory: GTX 980
  • Exact command to reproduce: See below

Describe the problem

Execute the script below. It works correctly when running on a CPU, but on a GPU it crashes with this error:

tensorflow/stream_executor/cuda/cuda_dnn.cc:430] could not convert BatchDescriptor {count: 0 feature_map_count: 1 spatial: 7 7  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM

The error happens when the first dimension of the input array is 0.

Source code / logs

import tensorflow as tf
import numpy as np

a = tf.placeholder(dtype=tf.float32, shape=(None, 7, 7, 1))
b = tf.contrib.layers.conv2d_transpose(a, num_outputs=16, kernel_size=5, stride=2)
session = tf.Session()
session.run(tf.global_variables_initializer())
print(session.run(b, feed_dict={a: np.zeros((0, 7, 7, 1))}))
@xiaoyongzhu
Copy link

I encountered this in TensorFlow 1.4.0 - so looks like this fix is not included in 1.4.0 release. I installed TensorFlow nightly build (tf_nightly_gpu-1.5.0.dev20171127-cp35-cp35m-manylinux1_x86_64.whl) and the nightly build solves the issue.

@hpnhxxwn
Copy link

I can confirm the issue still exists in my case for the nightly build tf_nightly_gpu-1.5.0.dev20171127-cp35-cp35m-manylinux1_x86_64.whl... :(

Anyone can points us the good build that fixes the issue?

@xiaoyongzhu
Copy link

@hpnhxxwn OK you are right - it also exists in my app too. The above example runs successfully, but my tensorflow app just crashes:(

@reedwm
Copy link
Member

reedwm commented Nov 29, 2017

The fix did not make it into 1.4, but should be in nightly.

@hpnhxxwn, does the above example not work or is it a different example that does not work? Can you post your CUDA/cudnn/Python/Ubuntu versions and your GPU? I'm not seeing the issue on Ubuntu 14.04, Python 2.7, Cuda 8 and cuDNN 6 with pip install tf-nightly-gpu.

@xiaoyaozhuzi what is the error message when your TensorFlow app crashes? If possible, can you post a small example that reproduces the problem?

@hpnhxxwn
Copy link

hpnhxxwn commented Nov 30, 2017

@reedwm

It is my code. It was working for CPU version, but for GPU it did not work and gives me such error. Mine is Ubuntu 16.04, Python 3.5, Cuda 8 and cuDNN 5. Actually I tried both tensorflow 1.14.0 with cuDNN 6 and local build from tenforflow latest source code with cnDNN 5, both have issue. I have also tried with the nightly build xiaoyongzhu pointed yesterday, also has the same issue with cuDNN6. The error message is really mysterious and I do not know where to look at. However, the code works on CPU tensorflow.

My code is below, could you suggest workaround?

def resnet_model(bin_multiple):

    #input and reshape
    inputs = Input(shape=input_shape)
    reshape = Reshape(input_shape_channels)(inputs)

    #normal convnet layer (have to do one initially to get 64 channels)
    conv = Conv2D(64,(1,bin_multiple*note_range),padding="same",activation='relu')(reshape)
    pool = MaxPooling2D(pool_size=(1,2))(conv)

    for i in range(int(np.log2(bin_multiple))-1):
        print i
        #residual block
        bn = BatchNormalization()(pool)
        re = Activation('relu')(bn)
        freq_range = (bin_multiple/(2**(i+1)))*note_range
        print freq_range
        conv = Conv2D(64,(1,freq_range),padding="same",activation='relu')(re)

        #add and downsample
        ad = add([pool,conv])
        pool = MaxPooling2D(pool_size=(1,2))(ad)

    flattened = Flatten()(pool)
    fc = Dense(1024, activation='relu')(flattened)
    do = Dropout(0.5)(fc)
    fc = Dense(512, activation='relu')(do)
    do = Dropout(0.5)(fc)
    outputs = Dense(note_range, activation='sigmoid')(do)

    model = Model(inputs=inputs, outputs=outputs)

    return model

... other code

model = resnet_model(bin_multiple)
init_lr = float(args['init_lr'])
    model.compile(loss='binary_crossentropy',
              optimizer=SGD(lr=init_lr,momentum=0.9), metrics=['accuracy', 'mae', 'categorical_accuracy'])
model.summary()

history = model.fit_generator(trainGen.next(),trainGen.steps(), epochs=epochs, verbose=1,validation_data=valGen.next(),validation_steps=valGen.steps(),callbacks=callbacks, workers=8, use_multiprocessing=True)

@reedwm
Copy link
Member

reedwm commented Nov 30, 2017

@hpnhxxwn I'm not sure what the issue is. Can you post the full error message that occurs? Also if possible can you post the full example? Are you calling conv2d_transpose with zero batch size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants