New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed #6285
Comments
Just adding some information (we don't currently have a solution or cycles to look at this). We have seen issues like this in the past where it is either an ODR violation or a bug in nvcc. |
@zheng-xq, do you have any other thoughts. If we can't reproduce this, I will need to close it as unreproducible. Sorry :(. |
@kalkaneus Did you resolve the problem? I have a similar issue here. |
@endernewton unfortunately not :( |
Oh I actually got it working here. The reason is I am compiling on my centos server without sudo access. I need to use the customized gcc to compile blaze. I didn't set up blaze correctly. Once it was compiled with right gcc, the problem is gone. |
@kalkaneus A few suggestions: Perhaps you can try TensorFlow 1.0? Or take @endernewton 's advice and try to ensure you have your build environment set up to use the right gcc? Note that the build command you listed has both If you just want symbols in your binary, you should drop the Let us know whether that helps! |
@tatatodd @endernewton thank you for your suggestions. But as for now, I am doing something else, so later when I have extra time, I'll try to do that in my machine. I'll let you know the result afterwards And I want to have dbg option as I want thoroughly debug it. Without the dbg flag, everything works well so far. |
In that case I'll close this issue out. @kalkaneus feel free to file a new issue if you encounter new problems. Thanks! |
Environment: Ubuntu 14.04, CUDA 8, CuDNN 5.1, Tensorflow r0.12
Build command
bazel build -c opt --config cuda -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package
previously, following the installation of tensorflow (not in debug mode), my code works well. But, after I rebuild with above command, it shows this error in the middle of running the session.
the backtrace result from gdb when the error comes (seems from ReLU operator):
I haven't tried anything as I don't have any idea about this error. I tried with other architecture, it was working. The difference between this and other architecture is this architecture (the one with error) has depthwise layer and the other (the one without error) doesn't have depthwise layer. Seems not connected with the error from relu layer, so I don't have any idea. Both architectures were working in the non-debugging mode
The text was updated successfully, but these errors were encountered: