New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU kernels for FFT (WIP) #9029
Conversation
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks really good.
This PR does not yet support RFFTs. These can in principle be implemented by slicing such that the negative frequency components of the FFT are removed. Maybe @benoitsteiner has some better ideas though.
Can I get you to tackle that in this PR? ;) Or were you thinking of leaving it as C2C only?
Does it make sense to split up the code into multiple files to avoid having #if GOOGLE_CUDA half way?
I'll let a TF person chime in on code organization. It'd be nice for FFT-functor to live in it's own file (as is common in core/kernels) and the GPU stream-executor code to also be pushed into an FFTFunctor specialization for DEVICE_GPU. That would allow more code sharing between the CPU and GPU kernels. I'm not sure that needs to happen in this PR though but I don't feel it's my call.
@rryan, because the tensor FFT in Eigen is templated, I couldn't avoid bloating the binary a bit as discussed via e-mail.
It looks fine to me as is.
tensorflow/core/kernels/fft_ops.cc
Outdated
} | ||
}; | ||
|
||
REGISTER_KERNEL_BUILDER(Name("FFT").Device(DEVICE_CPU), FFTCPU<true, false, 1>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some internal registrations of these kernels for DEVICE_CPU. I'm not sure what the best way to avoid conflict is. I think we can use PLATFORM_GOOGLE
, which should only be defined internally. @benoitsteiner / @rmlarsen? I don't see any other uses within core/kernels, so it'd be a first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this would be the right solution, but one way might be to just name these kernels FFT_V2 etc. and have internal Python wrappers calling the old versions while the third_party/tensorflow code would call the new ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yea that's an option, though it'd be nice for internal graphs with FFTs to be compatible with open source TF.
tensorflow/core/kernels/fft_ops.cc
Outdated
// TODO(tillahoffmann): Support RFFTs by slicing away the negative | ||
// frequency components | ||
OP_REQUIRES(ctx, !IsReal(), errors::Internal("Real FFT not supported.")); | ||
auto input = in.flat_inner_dims<complex64, FFTRank + 1>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd slightly prefer to avoid the above reinterpret_cast since I'm not sure converting a TensorMap<const T, ...>*
to a TensorMap<T, ...>*
will always be undefined-behavior free. Maybe someone more expert than me at interpreting the standard or Eigen can chime in :).
Since Tensor types are lightweight wrappers around reference-counted buffers, you can copy away the const-ness cheaply.
Tensor nonconst_in = in;
auto input = nonconst_in.flat_inner_dims<complex64, FFTRank + 1>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much better.
} // end namespace | ||
|
||
class FFTGPUBase : public OpKernel { | ||
class FFTBase : public OpKernel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking, can you go through the comments here and add periods? Our style guide prefers punctuation:
https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar
The (I)RFFT needs some amount of tensor manipulation. After playing around with the problem a bit, it seems like using the auto axes = Eigen::ArrayXi::LinSpaced(FFTRank, 1, FFTRank);
auto device = ctx->eigen_device<CPUDevice>();
auto input = ((Tensor) in).flat_inner_dims<float, FFTRank + 1>();
auto output = out->flat_inner_dims<complex64, FFTRank + 1>();
Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> startIndices;
auto op = (input.template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes))
.slice(startIndices, output.dimensions());
output.device(device) = op; and for the IRFFT auto axes = Eigen::ArrayXi::LinSpaced(FFTRank, 1, FFTRank);
auto device = ctx->eigen_device<CPUDevice>();
auto input = ((Tensor) in).flat_inner_dims<float, FFTRank + 1>();
auto output = out->flat_inner_dims<complex64, FFTRank + 1>();
// The first dimension contains the zero-frequency component which we
// do not want to duplicate. So we reconstruct the complex signal by
// (1) slicing from the second element, (2) reversing the order,
// (3) taking the complex conjugate, (4) concatenating with the original
// input. Note that for an even input length, the last element is the
// Nyquist frequency which we also do not want to duplicate.
Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> startIndices;
startIndices[FFTRank] = 1;
auto sizes = input.dimensions();
if (sizes[FFTRank] % 2 == 0) {
sizes[FFTRank] -= 1;
}
// Compute the complex conjugate and reverse.
auto cc = input.slice(startIndices, sizes).conjugate()
.reverse(FFTRank);
// Reconstruct the full FFT.
auto full_fft = input.concatenate(cc, FFTRank);
// Take the inverse
output.device(device) = full_fft.template fft<Eigen::RealPart, Eigen:FFT_FORWARD>(axes); Unfortunately, I can't get it to compile because of some const mismatch:
Here is a minimum example reproducing the same error. #include "unsupported/Eigen/CXX11/Tensor"
int main() {
// Create input tensor.
Eigen::Tensor<float, 2> input(5, 5);
input.setRandom();
// Create arrays for axes and slicing.
Eigen::ArrayXi axes = {0, 1};
Eigen::DSizes<Eigen::DenseIndex, 2> startIndices;
Eigen::DSizes<Eigen::DenseIndex, 2> sizes = input.dimensions();
// Construct the graph and evaluate.
auto op = input.template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes).slice(startIndices, sizes);
Eigen::Tensor<std::complex<float>, 2> output = op;
}
|
@rryan @benoitsteiner any guidance here? |
tensorflow/core/kernels/fft_ops.cc
Outdated
FFTCPU<false, false, 3>); | ||
|
||
#if GOOGLE_CUDA | ||
#include "tensorflow/core/platform/stream_executor.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we move this at the top of the file to avoid including stream_executor.h in the tensorflow namespace ?
|
||
# GPU/Forward | ||
self.assertAllClose(x_np, x_tf, rtol=1e-4, atol=1e-4) | ||
# GPU/Forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this comment, the code isn't GPU any anymore.
|
||
# GPU/Backward | ||
self.assertAllClose(x_np, x_tf, rtol=1e-4, atol=1e-4) | ||
# GPU/Backward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, this isn't limited to GPUs anymore.
Jenkins, test this please. |
tensorflow/core/kernels/fft_ops.cc
Outdated
auto output = out->flat_inner_dims<float, FFTRank + 1>(); | ||
FFTFunctor<CPUDevice, complex64, float, Eigen::RealPart, | ||
Eigen::FFT_REVERSE, FFTRank> functor; | ||
functor(ctx->eigen_device<CPUDevice>(), output, full_fft); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the compiler errors I'm seeing here:
https://ci.tensorflow.org/job/tensorflow-pull-requests-cpu/4430/console
they all seem to be from the IRFFT here (line 172):
tensorflow/core/kernels/fft_ops.cc:172:65: error: no match for call to
'(tensorflow::FFTFunctor<Eigen::ThreadPoolDevice, std::complex<float>, float, 0, 1, 3>) (const Eigen::ThreadPoolDevice&, Eigen::TensorMap<Eigen::Tensor<float, 4, 1, long int>, 16, Eigen::MakePointer>&, Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long int, 4>, const Eigen::DSizes<long int, 4>, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer> > > > >&)'
functor(ctx->eigen_device<CPUDevice>(), output, full_fft);
no known conversion for argument 3 from
'Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long int, 4>, const Eigen::DSizes<long int, 4>, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer> > > > >'
to
'tensorflow::TTypes<std::complex<float>, 4, long int>::Tensor {aka Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>}'
I think one problem is that Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long int, 4>, const Eigen::DSizes<long int, 4>, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer> > > > >
is not convertible to typename TTypes<TInput, FFTRank + 1>::Tensor
.
Maybe FFTFunctor needs to be changed to take expressions as arguments instead of TensorMaps?
I think it would also work to push the Eigen expressions here (which aren't convertible to TensorMap without an .eval()
?) into a template specialization of FFTFunctor for TInput=float TOutput=complex64 and TInput=complex64 TOutput=float ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rryan, thank you for the comments. Yes, all the compiler issues arise from the IRFFT implementation. I tried forcing an evaluation using .eval()
but got similar messages. I was wondering whether it makes sense to stick to the functor paradigm in this case because the functors are just a templated wrapper around Eigen which are not used anywhere else. What are your thoughts on the suggested implementation in #9029 (comment). And do you have any thoughts on the resulting compiler error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether it makes sense to stick to the functor paradigm in this case because the functors are just a templated wrapper around Eigen which are not used anywhere else.
I was thinking it would cut down on instantiations of the Eigen expression in the .o, but I just noticed that forward/backward is part of the TensorFFT template arguments, so we can't reuse the same C2C code path for forward and backward (unless we did the constant scaling for the backward outside of FFTFunctor and always did C2C forward inside of FFTFunctor). Dropping the functor is fine w/ me, but I'll defer to @benoitsteiner.
What are your thoughts on the suggested implementation in #9029 (comment). And do you have any thoughts on the resulting compiler error?
Argh, sorry -- not sure how I missed your comment, you were already a step ahead of me.
Off hand without trying to compile your example, that smells like TensorFFT doesn't like having a float tensor as its input.
Though from skimming TensorFFT, it kind of looks like it's meant to convert T
to std::complex<T>
on the fly.
Have you already tried casting?
auto op = input
.cast<std::complex<float>>()
.template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes)
.slice(startIndices, sizes);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, after reading TensorFFT a bit, I'm not sure if its TensorTraits are defined properly. It does not define Scalar
, only RealScalar
, ComplexScalar
, InputScalar
and OutputScalar
. So the inherited traits traits<XprType>
defines Scalar
. XprType
is Tensor<float>
in this case, so Scalar
is float
. Using TensorConversion as an example (since I'm not familiar enough with Eigen to know what this is supposed to be), it looks like it defines Scalar
as the type casted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the insight. I think I tried casting without any luck before but might give it another go.
Another option could be to allocate a temporary tensor, do all the preprocessing, assign to the temporary tensor to evaluate, compute the IFFT of the temporary tensor. It would lose some of the elegance of Eigen but might be an acceptable workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalar should denote the output type. Does it help if you rename OutputScalar to Scalar? (or add another typedef for it?
@tillahoffmann any updates on this PR? |
@vrv, there seems to be an issue with the types declared in the FFT operator of Eigen which means that compiling the (I)RFFT fails (see @rryan's comment above). Unfortunately, I'm not sufficiently familiar with Eigen to be able to fix it/find a work around. We could either see whether @benoitsteiner or @rmlarsen have some ideas or we could change this PR to only provide an implementation of the complex FFT. |
Okay, @benoitsteiner or @rmlarsen let us know if you have any ideas on next steps here. |
@b11094 are you suggesting a fix for the build failures? |
@rmlarsen @benoitsteiner can you provide some guidance here re: @rryan's findings? Would love to get this unblocked and into TF. |
We could also consider having the kernels be complex-valued, and insert casting operations on the input and output as appropriate. |
Can one of the admins verify this patch? |
Reconstructing the full FFT from the RFFT coefficients in Eigen is proving to be a bit tricky. The current PR implements the complex FFT, IFFT and the real FFT. The real IFFT is not implemented. I imagine someone with better Eigen knowledge than me would be better suited to implement it. |
@tillahoffmann Thanks a lot for this contribution! I think this is a great step forward. If @rryan approves, I think we can merge this and add the IRFFT in a followup. |
@tensorflow-jenkins test this please |
@rmlarsen, do you know why the GPU builds are failing? It's complaining that some symbols are not declared:
|
REGISTER_KERNEL_BUILDER(Name("RFFT3D").Device(DEVICE_CPU), | ||
FFTCPU<true, true, 3>); | ||
|
||
#if GOOGLE_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this conditional include should go at the top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this fails because you are including this inside namespace tensorflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, given that there is both CPU and GPU code, would the following work?
namespace tensorflow {
// CPU stuff
}
#if GOOGLE_CUDA
#include "tensorflow/core/platform/stream_executor.h"
namespace tensorflow {
// GPU stuff
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, but you still need to put the include at the top of the file.
// regular includes ...
#if GOOGLE_CUDA
#include "tensorflow/core/platform/stream_executor.h"
#endif // GOOGLE_CUDA
namespace tensorflow {
// CPU stuff
#if GOOGLE_CUDA
// GPU stuff
#endif // GOOGLE_CUDA
} // namespace tensorflow
@rmlarsen, let's hope this compiles. Btw, is there a way to get testing privileges on PRs? |
LGTM, thanks! Though I'm still curious about how to handle the internal/external issue. I think merging this will break internal code unless we ifdef the kernel to be opensource-only somehow. |
Jenkins, test this please. |
@rryan, do you think it would be possible to make the internal code available? |
Unfortunately we can't, and I'm not qualified to explain why since I'm not a lawyer. :( |
@tillahoffmann thanks! Our test resources are still somewhat limited, so we keep the privilege to trigger it to people in the team. |
@tillahoffmann thanks again for the contribution! |
thanks @tillahoffmann ! |
Could this get merged into r1.2? Seems like a feature that people are really excited about. |
Maybe supid question, but along which axis is the fft performed? Most other libraries have something like |
@beniroquai It's common to have to batch dimensions first and not last. |
Ahh! Nice! That's my error. So it transforms along 1st dimension? |
@beniroquai, yes the batch dimension (45 in your case) should be the first dimension. The n-dimensional fft will transform the last n dimension. I don't expect to see axis support in tensorflow any time soon because the GPU kernel implementation seems to rely on the transform being along the last dimensions--but I'm happy to stand corrected. |
I'd love to add an axis parameter. If someone from the community would like to add it I think it'd be welcomed by the team. I doubt it's high on anybody's TODO list at the moment though. I think the code is already generic enough to support it for GPU since we already specify the "advanced data layout" parameters to configure it to do the last N dimensions. In the worst case, we could perform multiple serial 1D FFTs. |
This PR adds CPU kernels for FFTs (cf #386). A few points to note:
#if GOOGLE_CUDA
half way?