Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed #6285

Closed
aseuteurideu opened this issue Dec 13, 2016 · 8 comments
Closed
Labels

Comments

@aseuteurideu
Copy link

Environment: Ubuntu 14.04, CUDA 8, CuDNN 5.1, Tensorflow r0.12
Build command
bazel build -c opt --config cuda -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package

previously, following the installation of tensorflow (not in debug mode), my code works well. But, after I rebuild with above command, it shows this error in the middle of running the session.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcupti.so.8.0 locally
python: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:262: static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<const float, const float>, const Eigen::TensorMap<Eigen::Tensor<const float, 1, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, const Eigen::TensorMap<Eigen::Tensor<const float, 1, 1, long int>, 16, Eigen::MakePointer> > > >; bool Vectorizable = true]: Assertion `cudaGetLastError() == cudaSuccess' failed.

the backtrace result from gdb when the error comes (seems from ReLU operator):

(gdb) bt
#0 0x00007ff450df6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ff450dfa028 in __GI_abort () at abort.c:89
#2 0x00007ff450defbf6 in __assert_fail_base (
fmt=0x7ff450f403b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7ff42dc57260 "cudaGetLastError() == cudaSuccess",
file=file@entry=0x7ff42dc57210 "external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h", line=line@entry=262,
function=function@entry=0x7ff42dc58f80 <Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const&, Eigen::GpuDevice const&)::PRETTY_FUNCTION> "static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap"...) at assert.c:92
#3 0x00007ff450defca2 in __GI___assert_fail (
assertion=0x7ff42dc57260 "cudaGetLastError() == cudaSuccess",
file=0x7ff42dc57210 "external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h", line=262,
function=0x7ff42dc58f80 <Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const&, Eigen::GpuDevice const&)::PRETTY_FUNCTION> "static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap"...) at assert.c:101
#4 0x00007ff42ad39672 in Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run (expr=..., device=...)
at external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:260
#5 0x00007ff42ad37a6b in Eigen::TensorDevice<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::GpuDevice>::operator=<Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> > (
this=0x7ff35b7fcfe0, other=...)
at external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h:35
#6 0x00007ff42ad36cb0 in tensorflow::functor::Relu<Eigen::GpuDevice, float>::operator() (
this=0x7ff35b7fd04f, d=..., features=..., activations=...)
at ./tensorflow/core/kernels/relu_op_functor.h:35
#7 0x00007ff42acf48c1 in tensorflow::ReluOp<Eigen::GpuDevice, float>::Operate (this=0x59fde00,
context=0x7ff35b7fd8f0, input=..., output=0x7ff23ce68390)
at ./tensorflow/core/kernels/relu_op.h:40
#8 0x00007ff42acee479 in tensorflow::UnaryElementWiseOp<float, tensorflow::ReluOp<Eigen::GpuDevice, float> >::Compute (this=0x59fde00, context=0x7ff35b7fd8f0)
at ./tensorflow/core/framework/numeric_op.h:62
#9 0x00007ff42c18523c in tensorflow::BaseGPUDevice::Compute (this=0x5964560, op_kernel=0x59fde00,
context=0x7ff35b7fd8f0) at tensorflow/core/common_runtime/gpu/gpu_device.cc:385
#10 0x00007ff42c402c39 in tensorflow::(anonymous namespace)::ExecutorState::Process (
this=0x3d553c80, tagged_node=..., scheduled_usec=1481634999542636)
at tensorflow/core/common_runtime/executor.cc:1364
#11 0x00007ff42c404881 in tensorflow::(anonymous namespace)::ExecutorState::__lambda4::operator() (
__closure=0x3d557a70) at tensorflow/core/common_runtime/executor.cc:1746
#12 0x00007ff42c40ada4 in std::_Function_handler<void(), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(const TaggedNodeSeq&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::__lambda4>::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/4.8/functional:2071
#13 0x00007ff428bcc7a8 in std::function<void ()>::operator()() const (this=0x3d557ab0)
at /usr/include/c++/4.8/functional:2471
#14 0x00007ff42c6f0b80 in tensorflow::thread::EigenEnvironment::ExecuteTask (this=0x59921a8, t=...)
at tensorflow/core/lib/core/threadpool.cc:82
#15 0x00007ff42c6f29fb in Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::WorkerLoop (this=0x59921a0, thread_id=7)
at external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:167
#16 0x00007ff42c6f1508 in Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::NonBlockingThreadPoolTempl(int, tensorflow::thread::EigenEnvironment)::{lambda()#1}::operator()() const
() at external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:58
#17 0x00007ff42c6f3927 in std::_Function_handler<void (), Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::NonBlockingThreadPoolTempl(int, tensorflow::thread::EigenEnvironment)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/4.8/functional:2071
#18 0x00007ff428bcc7a8 in std::function<void ()>::operator()() const (this=0x59c3bf0)

I haven't tried anything as I don't have any idea about this error. I tried with other architecture, it was working. The difference between this and other architecture is this architecture (the one with error) has depthwise layer and the other (the one without error) doesn't have depthwise layer. Seems not connected with the error from relu layer, so I don't have any idea. Both architectures were working in the non-debugging mode

@andydavis1
Copy link
Contributor

Just adding some information (we don't currently have a solution or cycles to look at this). We have seen issues like this in the past where it is either an ODR violation or a bug in nvcc.

@aselle
Copy link
Contributor

aselle commented Dec 29, 2016

@zheng-xq, do you have any other thoughts. If we can't reproduce this, I will need to close it as unreproducible. Sorry :(.

@aselle aselle added type:bug Bug stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 29, 2016
@endernewton
Copy link

@kalkaneus Did you resolve the problem? I have a similar issue here.

@aseuteurideu
Copy link
Author

@endernewton unfortunately not :(

@endernewton
Copy link

Oh I actually got it working here. The reason is I am compiling on my centos server without sudo access. I need to use the customized gcc to compile blaze. I didn't set up blaze correctly. Once it was compiled with right gcc, the problem is gone.

@aselle aselle removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Feb 27, 2017
@tatatodd
Copy link
Contributor

tatatodd commented Mar 1, 2017

@kalkaneus A few suggestions:

Perhaps you can try TensorFlow 1.0? Or take @endernewton 's advice and try to ensure you have your build environment set up to use the right gcc?

Note that the build command you listed has both -c opt and -c dbg. That is an uncommon usage:
bazel build -c opt --config cuda -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package

If you just want symbols in your binary, you should drop the -c dbg:
bazel build -c opt --config cuda --strip=never //tensorflow/tools/pip_package:build_pip_package

Let us know whether that helps!

@tatatodd tatatodd added the stat:awaiting response Status - Awaiting response from author label Mar 1, 2017
@aseuteurideu
Copy link
Author

@tatatodd @endernewton thank you for your suggestions. But as for now, I am doing something else, so later when I have extra time, I'll try to do that in my machine. I'll let you know the result afterwards

And I want to have dbg option as I want thoroughly debug it. Without the dbg flag, everything works well so far.

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Mar 2, 2017
@tatatodd
Copy link
Contributor

tatatodd commented Mar 3, 2017

In that case I'll close this issue out. @kalkaneus feel free to file a new issue if you encounter new problems. Thanks!

@tatatodd tatatodd closed this as completed Mar 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants