Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU sync failed #1450

Closed
Duum opened this issue Mar 10, 2016 · 29 comments
Closed

GPU sync failed #1450

Duum opened this issue Mar 10, 2016 · 29 comments
Assignees
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@Duum
Copy link

Duum commented Mar 10, 2016

`I` tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 3.51GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x503dc0000 extends to 0x5e45e3000
E tensorflow/stream_executor/cuda/cuda_driver.cc:1099] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed

hi!
what's wrong with this, how i can solve this.I'm using cuda7.5 cudnn7.0 and all they are ok running on CPU. but when run GPU ,it occur wrong.
And I can local the operation which can't run on GPU

with tf.device("/cpu:0"):
 optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

when i remvoe "tf.device("/cpu:0"),it ocure the bug reported above

@aymericdamien
Copy link
Contributor

It might be because GTX970 has some memory issues if you are allocating more than 3.5Gb (see http://wccftech.com/nvidia-geforce-gtx-970-memory-issue-fully-explained/), you can try to allocate less than 3.5gb memory and check if it corrects the issue:

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

@vrv
Copy link

vrv commented Mar 10, 2016

Yikes. Good to know @aymericdamien, thanks!

@Duum
Copy link
Author

Duum commented Mar 11, 2016

it' make no sense.....
when i change to this

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
     sess.run(init)
     step=1

it had the same error:

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.1775
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.60GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 2.52GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x503dc0000 extends to 0x5a54564cc
E tensorflow/stream_executor/cuda/cuda_driver.cc:1099] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed

@vrv
Copy link

vrv commented Mar 11, 2016

Are you building from source, or did you install the pip package? What's your environment? E.g., all the information the template had but you removed :). If you built from source, what command line did you use?

@Duum
Copy link
Author

Duum commented Mar 11, 2016

I build from source, and I build a whl, install it by pip. and i have test ok on

https://github.com/Duum/TensorFlow-Examples/blob/master/examples/3%20-%20Neural%20Networks/convolutional_network.py

on GPU. but it's not ok work on my code.
my computer is ubuntu15.10 gcc version is 5.2.1 , my cuda is 7.5 ,and so i comment the error of gcc version error in cuda code
my cudnn verison is 7.0

 sudo cp lib64/* /usr/local/cuda/lib64/
 sudo cp include/cudnn.h /usr/local/cuda/include/

my command are:

Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 7.0
Please specify the location where cuDNN 7.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

And i live in china mainland, so I change some code in the WORKSPACE file:


 git_repository(
     name = "grpc",
     commit = "403cd6c", init_submodules = True,
    remote = "https://github.com/melody-rain/grpc.git",    )    

and i also change .gitmudles file:

   [submodule "google/protobuf"]
      path = google/protobuf
       url = https://github.com/google/protobuf.git
   [submodule "third_party/boringssl"]
       path = third_party/boringssl
      url = https://github.com/doubler/boringssl.git 

my build command are:

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
 pip install /tmp/tensorflow_pkg/tensorflow-0.7.1-py2-none-linux_x86_64.whl


@mrry mrry added the cuda label Mar 14, 2016
@girving
Copy link
Contributor

girving commented Jun 8, 2016

@zffchen78: Can you take a look? Is there any relationship between this and #2471?

@girving
Copy link
Contributor

girving commented Jun 9, 2016

@vrv: Reassigning to you per @zffchen78's request.

@girving girving assigned vrv and unassigned zffchen78 Jun 9, 2016
@vrv
Copy link

vrv commented Jun 9, 2016

Pretty sure this is going to be hard for us to debug without being able to reproduce this.

I would suggest:

  1. Upgrading your nvidia drivers
    2a) Updating cuda to 7.5 and cudnn v4 and installing TensorFlow r0.9 or
    2b) Updating cuda to 8.0 and cudnn v5 and installing TensorFlow from sources

and then try again.

@vrv vrv added the stat:awaiting response Status - Awaiting response from author label Jun 9, 2016
@aselle
Copy link
Contributor

aselle commented Jun 28, 2016

Automatically closing because there was no response. Please reopen if it is still an issue.

@aselle aselle closed this as completed Jun 28, 2016
@kbrems
Copy link
Contributor

kbrems commented Sep 15, 2016

I am getting the same error when I create a simple custom operator that operates on a list of input tensors of type int32. My input tensor is 5 elements, so this is clearly not a memory limitation issue.

Specifics:
ubuntu 14.04
GeForce GTX TITAN driver version 367.44
cuda 7.5, cudnn v4
binary pip install tensrflow gpu version 0.10.0rc0
python 2.7

Build and run the attached source code:
$ python cuda_op_unittest.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN
major: 3 minor: 5 memoryClockRate (GHz) 0.928
pciBusID 0000:05:00.0
Total memory: 5.94GiB
Free memory: 5.45GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:05:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN, pci bus id: 0000:05:00.0
I tensorflow/core/common_runtime/direct_session.cc:175] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN, pci bus id: 0000:05:00.0

int32: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] int32: /job:localhost/replica:0/task:0/gpu:0
int32/input_0: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] int32/input_0: /job:localhost/replica:0/task:0/gpu:0
*** running on GPU ***
E tensorflow/stream_executor/cuda/cuda_driver.cc:1140] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Aborted (core dumped)

Key info:
If I run this with tf.device('/cpu:0') it works.
If I make my inputs and outputs a single tensor instead of a list of 1 tensor it also works. (This took a while to figure out!). ie instead of .Input("input: in_types") in the REGISTER_OP use .Input("input: int32")

Notes:
based on the response to #4387, some research led me here: http://stackoverflow.com/questions/37439299/no-gpu-kernel-for-an-int32-variable-op. It seems that tensorflow does not really support GPU operators on integer tensors and adding that support is difficult. In the interim though, better documentation on integer tensor support and a meaningful error message would be preferable to a core dump :).

issue1450.zip

@vrv
Copy link

vrv commented Sep 15, 2016

Does it work if you define the input type as int64?

@kbrems
Copy link
Contributor

kbrems commented Sep 15, 2016

I can't build a custom operator with type int64:

karenbre@karenZ820:~/workspace/issue1450$ ./build.sh
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
In file included from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/type_traits.h:22:0,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/allocator.h:25,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:22,
from cuda_op_kernel.cc:17:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/types.h: In instantiation of ‘struct tensorflow::DataTypeToEnum’:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h:587:46: required from ‘typename tensorflow::TTypes<T, NDIMS>::ConstTensor tensorflow::Tensor::shaped(tensorflow::gtl::ArraySlice) const [with T = long int; long unsigned int NDIMS = 1ul; typename tensorflow::TTypes<T, NDIMS>::ConstTensor = Eigen::TensorMap<Eigen::Tensor<const long int, 1, 1, long int>, 16>]’
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h:354:40: required from ‘typename tensorflow::TTypes::ConstFlat tensorflow::Tensor::flat() const [with T = long int; typename tensorflow::TTypes::ConstFlat = Eigen::TensorMap<Eigen::Tensor<const long int, 1, 1, long int>, 16>]’
cuda_op_kernel.cc:64:45: required from ‘void AddOneOp::Compute(tensorflow::OpKernelContext_) [with Device = Eigen::ThreadPoolDevice]’
cuda_op_kernel.cc:80:80: required from here
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/types.h:136:3: error: static assertion failed: Specified Data Type not supported
static_assert(IsValidDataType::value, "Specified Data Type not supported");
^
In file included from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/device_base.h:23:0,
from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:25,
from cuda_op_kernel.cc:17:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h: In instantiation of ‘typename tensorflow::TTypes<T, NDIMS>::ConstTensor tensorflow::Tensor::shaped(tensorflow::gtl::ArraySlice) const [with T = long int; long unsigned int NDIMS = 1ul; typename tensorflow::TTypes<T, NDIMS>::ConstTensor = Eigen::TensorMap<Eigen::Tensor<const long int, 1, 1, long int>, 16>]’:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h:354:40: required from ‘typename tensorflow::TTypes::ConstFlat tensorflow::Tensor::flat() const [with T = long int; typename tensorflow::TTypes::ConstFlat = Eigen::TensorMap<Eigen::Tensor<const long int, 1, 1, long int>, 16>]’
cuda_op_kernel.cc:64:45: required from ‘void AddOneOp::Compute(tensorflow::OpKernelContext_) [with Device = Eigen::ThreadPoolDevice]’
cuda_op_kernel.cc:80:80: required from here
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h:587:46: error: ‘v’ is not a member of ‘tensorflow::DataTypeToEnum’
CheckTypeAndIsAligned(DataTypeToEnum::v());
^
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h: In instantiation of ‘typename tensorflow::TTypes<T, NDIMS>::Tensor tensorflow::Tensor::shaped(tensorflow::gtl::ArraySlice) [with T = long int; long unsigned int NDIMS = 1ul; typename tensorflow::TTypes<T, NDIMS>::Tensor = Eigen::TensorMap<Eigen::Tensor<long int, 1, 1, long int>, 16>]’:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h:284:40: required from ‘typename tensorflow::TTypes::Flat tensorflow::Tensor::flat() [with T = long int; typename tensorflow::TTypes::Flat = Eigen::TensorMap<Eigen::Tensor<long int, 1, 1, long int>, 16>]’
cuda_op_kernel.cc:70:57: required from ‘void AddOneOp::Compute(tensorflow::OpKernelContext*) [with Device = Eigen::ThreadPoolDevice]’
cuda_op_kernel.cc:80:80: required from here
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/framework/tensor.h:546:46: error: ‘v’ is not a member of ‘tensorflow::DataTypeToEnum’
CheckTypeAndIsAligned(DataTypeToEnum::v());
^

@kbrems
Copy link
Contributor

kbrems commented Sep 15, 2016

Note, it does work with int16.

@lhao0301
Copy link

I just met the same problem whether I installed tensorflow from source or official binary(the installation procedure was of no problems).
GPU: gtx titan x(12G)
cuda: 7.5 + cudnnv5.1
E tensorflow/stream_executor/cuda/cuda_driver.cc:1140] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Aborted (core dumped)

@matpalm
Copy link

matpalm commented Sep 22, 2016

also saw this last night; after 2hrs running at 80% GPU util

E tensorflow/stream_executor/cuda/cuda_driver.cc:1140] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed

TITAN X (Pascal)

$ ls -l /usr/local/cuda/lib64/libcud*
-rw-r--r-- 1 root root   560184 Sep 15 19:53 /usr/local/cuda/lib64/libcudadevrt.a
lrwxrwxrwx 1 root root       16 Sep 15 19:53 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.8.0
lrwxrwxrwx 1 root root       19 Sep 15 19:53 /usr/local/cuda/lib64/libcudart.so.8.0 -> libcudart.so.8.0.27
-rwxr-xr-x 1 root root   394472 Sep 15 19:53 /usr/local/cuda/lib64/libcudart.so.8.0.27
-rw-r--r-- 1 root root   737516 Sep 15 19:53 /usr/local/cuda/lib64/libcudart_static.a
-rwxr-xr-x 1 root root 79337624 Sep 15 20:08 /usr/local/cuda/lib64/libcudnn.so
-rwxr-xr-x 1 root root 79337624 Sep 15 20:08 /usr/local/cuda/lib64/libcudnn.so.5
-rwxr-xr-x 1 root root 79337624 Sep 15 20:08 /usr/local/cuda/lib64/libcudnn.so.5.1.5

$ cd ~/dev/tensorflow/
$ git rev-parse HEAD
503a202761877250f1b268041a5bab14dad2b2ca

$ bazel version
.
Build label: 0.3.1
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jul 29 09:09:52 2016 (1469783392)
Build timestamp: 1469783392
Build timestamp as int: 1469783392

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Sep 22, 2016
@aselle aselle reopened this Sep 22, 2016
@aselle aselle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 22, 2016
@yash0307
Copy link

yash0307 commented Sep 26, 2016

I am getting something similar during back-propagation. Bottleneck generation works fine.

E tensorflow/stream_executor/cuda/cuda_driver.cc:1140] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Aborted (core dumped)

@vrv
Copy link

vrv commented Oct 13, 2016

@matpalm: Has it happened consistently since? These kind of one off failures can happen if there's some GPU hardware issues. @yash0307 same question: does it happen immediately or only after a while?

@kbrems can you include the int64 code? int64 should definitely compile, and I can't figure out the error from the compiler output alone.

@vrv vrv added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Oct 13, 2016
@matpalm
Copy link

matpalm commented Oct 13, 2016

I've not seen it again & have been running similar jobs (i.e. in terms of GPU util & mem load) almost every night since

@kbrems
Copy link
Contributor

kbrems commented Oct 13, 2016

Here is the example with int64. I just pulled the latest source from tensorflow master this morning and tried again and it still does not compile.
issue1450int64.zip

@vrv
Copy link

vrv commented Oct 13, 2016

Your in_types looks to be int16, not int64, not sure if this is the only problem though. Other than that, this does seem like something we do all the time in other kernels, so I'm not sure why it's not compiling.

@kbrems
Copy link
Contributor

kbrems commented Oct 13, 2016

My search/replace failed to catch that. I changed the in_types to int64, but it still does not compile.

@vrv
Copy link

vrv commented Oct 14, 2016

Even though we typedef int64 to int64_t, I think you need to use int64, not int64_t. The following simpler code (which doesn't add one, but for illustration) compiled for me:

REGISTER_OP("AddOne")
    .Input("input: int64")
    .Output("output: int64")
    .Doc(R"doc(
Adds 1 to all elements of the tensor.

output: A Tensor.
  output = input + 1
)doc");

typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;

template <typename Device>
class AddOneOp : public OpKernel {
 public:
  explicit AddOneOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
    // Grab the input tensor
    const Tensor& input_tensor = context->input(0);
    auto input = input_tensor.flat<int64>();

    // Create an output tensor
    Tensor* output_tensor = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
                                                     &output_tensor));
    auto output = output_tensor->template flat<int64>();
    output = input;
  }
};

REGISTER_KERNEL_BUILDER(Name("AddOne").Device(DEVICE_CPU), AddOneOp<CPUDevice>);

@vrv vrv closed this as completed Oct 14, 2016
@kbrems
Copy link
Contributor

kbrems commented Oct 14, 2016

It seems that somewhere deep within Eigen, int64 is defined as a long long int, but on 64 bit ubuntu, int64_t is defined as a long int in stdint.h, so the 2 are not compatible. I can work around that in this simple example, but it means that all our custom cuda kernels would then have to depend on Eigen types instead of the standard types for linux. .

On the plus side, my original issue with in32_t generating the GPU sync error and core dump seems to have gone away with release 0.11.0rc0 (built from latest source). Although I have also upgraded to CUDA 8.0 since the original problem, so perhaps that fixed something.

@angup143
Copy link

angup143 commented Jun 19, 2017

I have a similar problem:
Cuda 8, cudnn v5.1 on a titan X
using keras with tensorflow-gpu==1.1.0

E tensorflow/stream_executor/cuda/cuda_driver.cc:1067] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available
2017-06-19 17:12:19.722285: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Aborted (core dumped)

It occurs intermittently on training (usually after a few epochs)

@felixthewhale
Copy link

felixthewhale commented Dec 4, 2017

Same problem
2017-12-04 03:27:19.316336: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_driver.cc:1110] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS :: No stack trace available Traceback (most recent call last): File "C:\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call return fn(*args) File "C:\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn status, run_metadata) File "C:\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
Code:


original =  tf.Variable(img, dtype = tf.float32)
x =         tf.Variable(img, dtype = tf.float32)
clique = x*x # tf.multiply(x,x) # tf.square(x,x)

optimizer = tf.train.GradientDescentOptimizer(1e-2)
train = optimizer.minimize(clique)
init = tf.global_variables_initializer()#tf.initialize_all_variables()
optimize()

It works perfectly on CPU (when cuda visible devices = -1). It is strange: when i use tf.add() it works on GPU, but tf.multiply(), tf.square() (not tested on another math functions) gives an error.

CUDA and CuDNN 8, Win10, 1050Ti, tensorflow 1.4 pip install.

@fmbao
Copy link

fmbao commented Apr 4, 2018

I met same problem ,and I final solve it by decreasing the size of batch.It's so strange that I can run this program with bigger batch size before

@AbhinavBijalwan
Copy link

with tf.device("/cpu:0"):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Q. What should be the learning_rate?

@wzhings
Copy link

wzhings commented Dec 9, 2019

I meet the same issue when I running codes with Keras with GPU. I solved it after release the memory. It is highly probably that you have no enough memory to be used. That is also why someone said to reduce the batch size will also work. Good luck.

@wuwu-0502
Copy link

I had the same problem. My batch-size is 64, and I changed it for 32. It had run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests