New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic Element-wise Complex Number Calculations Not Available On GPU #3624
Comments
Note: implementations using built-in Tensorflow functions as show above doesn't solve gradient issues caused by the handling of complex numbers:
This code will fail with the following error: E tensorflow/core/client/tensor_c_api.cc:485] Cannot assign a device to node 'gradients/Shape': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. |
It seems that support for complex64 types is piecemeal, by op-type and device-type. Bringing in @martinwicke for a comment on the policy. |
Yes, this is a good feature request bordering on a bug. Please check the Op registrations of the affected ops, and you'll probably find that the templates of many of them are not specialized for complex data types. It is a relatively simple thing to fix, and I'd love PRs that do it. |
@sbrodeur: Are you currently working on this? |
@ibab: I did not yet attempt a fix. I've looked a at little at Eigen: Thus, for the simple calculations here, should I expect Eigen to provide compatible functors, e.g. :
This code is in file cwise_ops.h Does this means the fix is similar to #2263, i.e. just adding the complex64 type when we register the kernels? |
Yes, you won't have to implement the operations themselves, you just need to enable them.
You would need to add complex64 and complex128 to the macro (and change it into REGISTER6 ).
You should make sure that the GPU tests are enabled for
|
Thanks for the information @ibab! I will attempt a fix myself and send a PR soon! |
So far, I can make it work with some operations (add, sub) by simply adding the complex data types when registering the kernels: e.g.
Compilation errors however occur for multiplication (and division), as seen below. It seems to solve this problem, people have been using reimplementations of the std:complex type (e.g. from thrust, cuda_complex or cusp) so that it can be used in device code: Would the Eigen library implementing something similar to what thrust uses solve the issue in Tensorflow? Compilation outputINFO: From Compiling tensorflow/core/kernels/cwise_op_gpu_mul.cu.cc: warning _FORTIFY_SOURCE requires compiling with optimization (-O)
In file included from /usr/local/cuda-7.5/include/host_config.h:161:0, warning _FORTIFY_SOURCE requires compiling with optimization (-O)
In file included from /usr/local/cuda-7.5/include/host_config.h:161:0, warning _FORTIFY_SOURCE requires compiling with optimization (-O)
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'. external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(136): error: a value of type "int" cannot be assigned to an entity of type "_ZNSt7complexIfE9_ComplexTE" 12 errors detected in the compilation of "/tmp/tmpxft_0000430f_00000000-12_cwise_op_gpu_mul.cu.compute_35.cpp2.i". |
Strange, your errors seem to be caused by the fact that Eigen is trying to assign values from an int I've tried enabling complex |
Here is my configuration: GPU: Tesla K40c I will try with a more recent gcc (e.g. 4.9.2) to see if the compilation problem disappears. |
I'm also using |
Sadly, I obtain the same errors if I clone and compile the fork ibab/tensorflow@8c3baae without any modifications. |
I'm running Scientific Linux 6, which should be virtually identical to Red Hat 6. |
On my side, I'll try a build on my laptop which runs the latest Debian 8 (Jessie). I don't have a Nvidia GPU but I should nevertheless be able to compile with CUDA. |
Okay, I've rebuilt tensorflow after a Edit: Btw, do you also get compiler warnings about calling |
I do not get compiler warnings about calling host functions from device code. I just tried to build on my laptop (Debian 8, up-to-date) with configuration; I obtained the same errors, so it does not seem related to gcc or distribution. I also tried to build with the latest eigen (3782cd1de9c4) on the Centos 7 machine, and that did not help either. I will try building with CUDA 8, after which I will be clueless about those compilation issues. Edit: same errors with CUDA 8. |
I've tried compiling with different compute capabilities, but it still compiled without errors. |
@iportillo - I will give it another try today. It would also significantly accelerate my experiments, since everything could run on the GPU. I'll try to see if it would be easy to use CUDABlas directly (rather than Eigen) for the basic math functions on complex numbers. tf.complex_abs is easy to implement on GPU right now:
By tf.exp(), do you mean converting from the Cartesian to the complex exponential form (angle and norm)? To calculate the angle, this means implementing the atan2 function (for complex x + iy):
It's not optimized but works well on GPU. |
@benoitsteiner: We're having some problems with implementing the product and div ops for
Maybe we would need to switch to something like |
I made some progress! I can make multiplication and division ops work for complex numbers if I specialized the templates in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/cwise_ops.h#L432
It seems more like a hack, but it doesn't involve changes in Eigen for now. Not sure what is wrong with nvcc using scalar_product_op in Eigen for complex numbers: However, it seems tightly related to using built-in * and / operators for std:complex types.
|
I can confirm that with the above trick, I can make work a lot of very useful functions for complex numbers on GPU (e.g. square, neg, div, mul, abs) This brings support for complex gradient computation on GPU:
Should I make a PR or should we investigate further the handling of std::complex by nvcc? |
I'm not a maintainer, but I think a PR would definitely be a good idea 👍 |
I've been privately writing GPU-based complex-valued ops for TF and decided to make my repository public. I think that more general support for computation of complex numbers on the GPU will be valuable to the community. However since my repository is in the early stages and isn't well tested, I think I'd like to develop it as a separate project and then port it as a TF pull request when it's more mature. Feel free to make contributions and/or suggestions. |
In C++14, std::complex methods are marked as constexpr. This will ensure that they can be used inside cuda kernels even though they're not marked as Unfortunately nvcc doesn't yet support c++14, but we can ask nvidia to start adding partial support for it starting with complex numbers. |
@iportillo ComplexAbs (and a few others) added here: f216420 |
After adding a workaround to Eigen: We were able to enable addition, subtraction, division, and multiplication kernels for complex types on GPU: 93f15d4 |
@sbrodeur Does TensorFlow now support all the operations you need on complex, or are there additional improvements we need to make ? |
@benoitsteiner Tensorflow now supports everything I need for handling complex numbers. |
Thanks, closing the issue. |
Is it possible to calculate a complex number divide a float number without type cast? |
Basic element-wise addition, subtraction, multiplication or division for any Tensor of type tf.complex64 is not implemented on GPU.
Environment info
Operating System: Centos 7, 3.10.0-327.22.2.el7.x86_64
Installed version of CUDA and cuDNN: CUDA 7.5 and cuDNN 7.0-v4
-rw-r--r--. 1 root root 189170 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudadevrt.a
lrwxrwxrwx. 1 root root 16 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so -> libcudart.so.7.5
lrwxrwxrwx. 1 root root 19 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so.7.5 -> libcudart.so.7.5.18
-rwxr-xr-x. 1 root root 311596 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so.7.5.18
-rw-r--r--. 1 root root 558020 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart_static.a
Tensorflow installed from source:
Build label: 0.3.0-2016-07-22 (@ca36b06)
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jul 22 19:23:10 2016 (1469215390)
Build timestamp: 1469215390
Build timestamp as int: 1469215390
Steps to reproduce
The code returns the following output if run on GPU (works well on CPU):
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x5168890
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: GeForce GT 610
major: 2 minor: 1 memoryClockRate (GHz) 1.62
pciBusID 0000:01:00.0
Total memory: 1023.19MiB
Free memory: 396.98MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:814] Ignoring gpu device (device: 1, name: GeForce GT 610, pci bus id: 0000:01:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5.
E tensorflow/core/client/tensor_c_api.cc:485] Cannot assign a device to node 'add': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: add = Add[T=DT_COMPLEX64, _device="/device:GPU:0"](Complex, Complex_1)]]
Traceback (most recent call last):
File "test_div_gpu_prob.py", line 12, in
c = sess.run(c)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 382, in run
run_metadata_ptr)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 655, in _run
feed_dict_string, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 723, in _do_run
target_list, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 743, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'add': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: add = Add[T=DT_COMPLEX64, _device="/device:GPU:0"](Complex, Complex_1)]]
Caused by op u'add', defined at:
File "test_div_gpu_prob.py", line 9, in
c = a + b
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 755, in binary_op_wrapper
return func(x, y, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 70, in add
result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2310, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1232, in init
self._traceback = _extract_stack()
What have you tried?
It would be nice to have such functions transparent with the built-in CPU implementations.
The text was updated successfully, but these errors were encountered: