Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU kernels for FFT (WIP) #9029

Merged
merged 5 commits into from May 17, 2017
Merged

CPU kernels for FFT (WIP) #9029

merged 5 commits into from May 17, 2017

Conversation

tillahoffmann
Copy link
Contributor

@tillahoffmann tillahoffmann commented Apr 6, 2017

This PR adds CPU kernels for FFTs (cf #386). A few points to note:

  • This PR does not yet support RFFTs. These can in principle be implemented by slicing such that the negative frequency components of the FFT are removed. Maybe @benoitsteiner has some better ideas though.
  • Does it make sense to split up the code into multiple files to avoid having #if GOOGLE_CUDA half way?
  • @rryan, because the tensor FFT in Eigen is templated, I couldn't avoid bloating the binary a bit as discussed via e-mail.

@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

Copy link
Member

@rryan rryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks really good.

This PR does not yet support RFFTs. These can in principle be implemented by slicing such that the negative frequency components of the FFT are removed. Maybe @benoitsteiner has some better ideas though.

Can I get you to tackle that in this PR? ;) Or were you thinking of leaving it as C2C only?

Does it make sense to split up the code into multiple files to avoid having #if GOOGLE_CUDA half way?

I'll let a TF person chime in on code organization. It'd be nice for FFT-functor to live in it's own file (as is common in core/kernels) and the GPU stream-executor code to also be pushed into an FFTFunctor specialization for DEVICE_GPU. That would allow more code sharing between the CPU and GPU kernels. I'm not sure that needs to happen in this PR though but I don't feel it's my call.

@rryan, because the tensor FFT in Eigen is templated, I couldn't avoid bloating the binary a bit as discussed via e-mail.

It looks fine to me as is.

}
};

REGISTER_KERNEL_BUILDER(Name("FFT").Device(DEVICE_CPU), FFTCPU<true, false, 1>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some internal registrations of these kernels for DEVICE_CPU. I'm not sure what the best way to avoid conflict is. I think we can use PLATFORM_GOOGLE, which should only be defined internally. @benoitsteiner / @rmlarsen? I don't see any other uses within core/kernels, so it'd be a first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would be the right solution, but one way might be to just name these kernels FFT_V2 etc. and have internal Python wrappers calling the old versions while the third_party/tensorflow code would call the new ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yea that's an option, though it'd be nice for internal graphs with FFTs to be compatible with open source TF.

// TODO(tillahoffmann): Support RFFTs by slicing away the negative
// frequency components
OP_REQUIRES(ctx, !IsReal(), errors::Internal("Real FFT not supported."));
auto input = in.flat_inner_dims<complex64, FFTRank + 1>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd slightly prefer to avoid the above reinterpret_cast since I'm not sure converting a TensorMap<const T, ...>* to a TensorMap<T, ...>* will always be undefined-behavior free. Maybe someone more expert than me at interpreting the standard or Eigen can chime in :).

Since Tensor types are lightweight wrappers around reference-counted buffers, you can copy away the const-ness cheaply.

Tensor nonconst_in = in;
auto input = nonconst_in.flat_inner_dims<complex64, FFTRank + 1>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better.

} // end namespace

class FFTGPUBase : public OpKernel {
class FFTBase : public OpKernel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nitpicking, can you go through the comments here and add periods? Our style guide prefers punctuation:
https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar

@tillahoffmann
Copy link
Contributor Author

tillahoffmann commented Apr 7, 2017

The (I)RFFT needs some amount of tensor manipulation. After playing around with the problem a bit, it seems like using the FFTFunctor is actually quite restrictive. Instead, it would make sense to build an operation tree as described here. So, for the RFFT I was thinking of

auto axes = Eigen::ArrayXi::LinSpaced(FFTRank, 1, FFTRank);
auto device = ctx->eigen_device<CPUDevice>();
auto input = ((Tensor) in).flat_inner_dims<float, FFTRank + 1>();
auto output = out->flat_inner_dims<complex64, FFTRank + 1>();
Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> startIndices;
auto op = (input.template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes))
  .slice(startIndices, output.dimensions());
output.device(device) = op;

and for the IRFFT

auto axes = Eigen::ArrayXi::LinSpaced(FFTRank, 1, FFTRank);
auto device = ctx->eigen_device<CPUDevice>();
auto input = ((Tensor) in).flat_inner_dims<float, FFTRank + 1>();
auto output = out->flat_inner_dims<complex64, FFTRank + 1>();
// The first dimension contains the zero-frequency component which we
// do not want to duplicate. So we reconstruct the complex signal by
// (1) slicing from the second element, (2) reversing the order,
// (3) taking the complex conjugate, (4) concatenating with the original
// input. Note that for an even input length, the last element is the
// Nyquist frequency which we also do not want to duplicate.
Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> startIndices;
startIndices[FFTRank] = 1;
auto sizes = input.dimensions();
if (sizes[FFTRank] % 2 == 0) {
  sizes[FFTRank] -= 1;
}
// Compute the complex conjugate and reverse.
auto cc = input.slice(startIndices, sizes).conjugate()
  .reverse(FFTRank);
// Reconstruct the full FFT.
auto full_fft = input.concatenate(cc, FFTRank);
// Take the inverse
output.device(device) = full_fft.template fft<Eigen::RealPart, Eigen:FFT_FORWARD>(axes);

Unfortunately, I can't get it to compile because of some const mismatch:

In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:125:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h:203:51: error: cannot initialize return object of type 'Scalar *' (aka 'float *') with an lvalue of type 'CoeffReturnType *const' (aka 'std::__1::complex<float> *const')
  EIGEN_DEVICE_FUNC Scalar* data() const { return m_data; }

Here is a minimum example reproducing the same error.

#include "unsupported/Eigen/CXX11/Tensor"

int main() {
    // Create input tensor.
    Eigen::Tensor<float, 2> input(5, 5);
    input.setRandom();
    // Create arrays for axes and slicing.
    Eigen::ArrayXi axes = {0, 1};
    Eigen::DSizes<Eigen::DenseIndex, 2> startIndices;
    Eigen::DSizes<Eigen::DenseIndex, 2> sizes = input.dimensions();
    // Construct the graph and evaluate.
    auto op = input.template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes).slice(startIndices, sizes);
    Eigen::Tensor<std::complex<float>, 2> output = op;
}

Alright, RFFT is added but I can't quite get the IRFFT to compile. In particular, the compiler isn't happy about passing the reconstructed full FFT (with negative frequencies) to the functor. Help from someone with more Eigen-knowledge would be great. Here's the compiler message.

tensorflow/core/kernels/fft_ops.cc:172:9: error: no matching function for call to object of type 'FFTFunctor<CPUDevice, complex64, float, Eigen::RealPart, Eigen::FFT_REVERSE, 3>' (aka 'FFTFunctor<Eigen::ThreadPoolDevice, complex<float>, float, Eigen::RealPart, Eigen::FFT_REVERSE, 3>')
        functor(ctx->eigen_device<CPUDevice>(), output, full_fft);
        ^~~~~~~
tensorflow/core/kernels/fft_ops.cc:198:25: note: in instantiation of member function 'tensorflow::FFTCPU<true, true, 3>::DoFFT' requested here
                        FFTCPU<true, true, 3>);
                        ^
tensorflow/core/kernels/fft_ops.cc:97:8: note: candidate function not viable: no known conversion from 'Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::__1::complex<float>, 4, 1, long>, 16, MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::__1::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long, 4>, const Eigen::DSizes<long, 4>, Eigen::TensorMap<Eigen::Tensor<std::__1::complex<float>, 4, 1, long>, 16, MakePointer> > > > >' to 'typename TTypes<complex<float>, 3 + 1>::Tensor' (aka 'TensorMap<Eigen::Tensor<complex<float>, 4, Eigen::RowMajor, long>, Eigen::Aligned>') for 3rd argument
  void operator()(const Device& d,

I've tried casting to typename TTypes<complex64, FFTRank + 1> to force an evaluation of the intermediate state but it wasn't happy with that either.

@drpngx
Copy link
Contributor

drpngx commented Apr 10, 2017

@rryan @benoitsteiner any guidance here?

FFTCPU<false, false, 3>);

#if GOOGLE_CUDA
#include "tensorflow/core/platform/stream_executor.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we move this at the top of the file to avoid including stream_executor.h in the tensorflow namespace ?


# GPU/Forward
self.assertAllClose(x_np, x_tf, rtol=1e-4, atol=1e-4)
# GPU/Forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this comment, the code isn't GPU any anymore.


# GPU/Backward
self.assertAllClose(x_np, x_tf, rtol=1e-4, atol=1e-4)
# GPU/Backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this isn't limited to GPUs anymore.

@benoitsteiner
Copy link
Contributor

Jenkins, test this please.

auto output = out->flat_inner_dims<float, FFTRank + 1>();
FFTFunctor<CPUDevice, complex64, float, Eigen::RealPart,
Eigen::FFT_REVERSE, FFTRank> functor;
functor(ctx->eigen_device<CPUDevice>(), output, full_fft);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the compiler errors I'm seeing here:
https://ci.tensorflow.org/job/tensorflow-pull-requests-cpu/4430/console
they all seem to be from the IRFFT here (line 172):

tensorflow/core/kernels/fft_ops.cc:172:65: error: no match for call to

'(tensorflow::FFTFunctor<Eigen::ThreadPoolDevice, std::complex<float>, float, 0, 1, 3>) (const Eigen::ThreadPoolDevice&, Eigen::TensorMap<Eigen::Tensor<float, 4, 1, long int>, 16, Eigen::MakePointer>&, Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long int, 4>, const Eigen::DSizes<long int, 4>, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer> > > > >&)'
         functor(ctx->eigen_device<CPUDevice>(), output, full_fft);

no known conversion for argument 3 from 

'Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long int, 4>, const Eigen::DSizes<long int, 4>, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer> > > > >'

to 

'tensorflow::TTypes<std::complex<float>, 4, long int>::Tensor {aka Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>}'

I think one problem is that Eigen::TensorConcatenationOp<const int, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer>, Eigen::TensorReverseOp<const int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_conjugate_op<std::complex<float> >, const Eigen::TensorSlicingOp<const Eigen::DSizes<long int, 4>, const Eigen::DSizes<long int, 4>, Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 4, 1, long int>, 16, Eigen::MakePointer> > > > > is not convertible to typename TTypes<TInput, FFTRank + 1>::Tensor.

Maybe FFTFunctor needs to be changed to take expressions as arguments instead of TensorMaps?

I think it would also work to push the Eigen expressions here (which aren't convertible to TensorMap without an .eval()?) into a template specialization of FFTFunctor for TInput=float TOutput=complex64 and TInput=complex64 TOutput=float ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rryan, thank you for the comments. Yes, all the compiler issues arise from the IRFFT implementation. I tried forcing an evaluation using .eval() but got similar messages. I was wondering whether it makes sense to stick to the functor paradigm in this case because the functors are just a templated wrapper around Eigen which are not used anywhere else. What are your thoughts on the suggested implementation in #9029 (comment). And do you have any thoughts on the resulting compiler error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether it makes sense to stick to the functor paradigm in this case because the functors are just a templated wrapper around Eigen which are not used anywhere else.

I was thinking it would cut down on instantiations of the Eigen expression in the .o, but I just noticed that forward/backward is part of the TensorFFT template arguments, so we can't reuse the same C2C code path for forward and backward (unless we did the constant scaling for the backward outside of FFTFunctor and always did C2C forward inside of FFTFunctor). Dropping the functor is fine w/ me, but I'll defer to @benoitsteiner.

What are your thoughts on the suggested implementation in #9029 (comment). And do you have any thoughts on the resulting compiler error?

Argh, sorry -- not sure how I missed your comment, you were already a step ahead of me.

Off hand without trying to compile your example, that smells like TensorFFT doesn't like having a float tensor as its input.

Though from skimming TensorFFT, it kind of looks like it's meant to convert T to std::complex<T> on the fly.

Have you already tried casting?

 auto op = input
  .cast<std::complex<float>>()
  .template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes)
  .slice(startIndices, sizes);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, after reading TensorFFT a bit, I'm not sure if its TensorTraits are defined properly. It does not define Scalar, only RealScalar, ComplexScalar, InputScalar and OutputScalar. So the inherited traits traits<XprType> defines Scalar. XprType is Tensor<float> in this case, so Scalar is float. Using TensorConversion as an example (since I'm not familiar enough with Eigen to know what this is supposed to be), it looks like it defines Scalar as the type casted to.

@benoitsteiner / @rmlarsen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the insight. I think I tried casting without any luck before but might give it another go.

Another option could be to allocate a temporary tensor, do all the preprocessing, assign to the temporary tensor to evaluate, compute the IFFT of the temporary tensor. It would lose some of the elegance of Eigen but might be an acceptable workaround.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar should denote the output type. Does it help if you rename OutputScalar to Scalar? (or add another typedef for it?

@drpngx drpngx added the stat:awaiting response Status - Awaiting response from author label Apr 15, 2017
@vrv
Copy link

vrv commented May 1, 2017

@tillahoffmann any updates on this PR?

@tillahoffmann
Copy link
Contributor Author

@vrv, there seems to be an issue with the types declared in the FFT operator of Eigen which means that compiling the (I)RFFT fails (see @rryan's comment above). Unfortunately, I'm not sufficiently familiar with Eigen to be able to fix it/find a work around.

We could either see whether @benoitsteiner or @rmlarsen have some ideas or we could change this PR to only provide an implementation of the complex FFT.

@vrv
Copy link

vrv commented May 6, 2017

Okay, @benoitsteiner or @rmlarsen let us know if you have any ideas on next steps here.

@vrv vrv added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels May 6, 2017
@drpngx
Copy link
Contributor

drpngx commented May 8, 2017

@b11094 are you suggesting a fix for the build failures?

@ebrevdo
Copy link
Contributor

ebrevdo commented May 9, 2017

@rmlarsen @benoitsteiner can you provide some guidance here re: @rryan's findings? Would love to get this unblocked and into TF.

@ebrevdo
Copy link
Contributor

ebrevdo commented May 9, 2017

We could also consider having the kernels be complex-valued, and insert casting operations on the input and output as appropriate.

@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@rmlarsen rmlarsen added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels May 15, 2017
@tillahoffmann
Copy link
Contributor Author

Reconstructing the full FFT from the RFFT coefficients in Eigen is proving to be a bit tricky. The current PR implements the complex FFT, IFFT and the real FFT. The real IFFT is not implemented. I imagine someone with better Eigen knowledge than me would be better suited to implement it.

@rmlarsen
Copy link
Member

@tillahoffmann Thanks a lot for this contribution! I think this is a great step forward. If @rryan approves, I think we can merge this and add the IRFFT in a followup.

@rmlarsen
Copy link
Member

@tensorflow-jenkins test this please

@tillahoffmann
Copy link
Contributor Author

@rmlarsen, do you know why the GPU builds are failing? It's complaining that some symbols are not declared:

In file included from ./tensorflow/stream_executor/kernel.h:80:0,
                 from ./tensorflow/stream_executor/trace_listener.h:22,
                 from ./tensorflow/stream_executor/platform.h:30,
                 from ./tensorflow/stream_executor/cuda/cuda_platform_id.h:19,
                 from ./tensorflow/core/platform/default/stream_executor.h:23,
                 from ./tensorflow/core/platform/stream_executor.h:24,
                 from tensorflow/core/kernels/fft_ops.cc:176:
./tensorflow/stream_executor/lib/array_slice.h:25:19: error: 'tensorflow::tensorflow::gtl' has not been declared
 using tensorflow::gtl::ArraySlice;

REGISTER_KERNEL_BUILDER(Name("RFFT3D").Device(DEVICE_CPU),
FFTCPU<true, true, 3>);

#if GOOGLE_CUDA
Copy link
Member

@rmlarsen rmlarsen May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conditional include should go at the top of the file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this fails because you are including this inside namespace tensorflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, given that there is both CPU and GPU code, would the following work?

namespace tensorflow {
// CPU stuff
}

#if GOOGLE_CUDA
#include "tensorflow/core/platform/stream_executor.h"

namespace tensorflow {
// GPU stuff
}
#endif

Copy link
Member

@rmlarsen rmlarsen May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but you still need to put the include at the top of the file.

// regular includes ...
#if GOOGLE_CUDA
#include "tensorflow/core/platform/stream_executor.h"
#endif  // GOOGLE_CUDA

namespace tensorflow {
// CPU stuff

#if GOOGLE_CUDA
// GPU stuff
#endif  // GOOGLE_CUDA

}  // namespace tensorflow

@tillahoffmann
Copy link
Contributor Author

@rmlarsen, let's hope this compiles. Btw, is there a way to get testing privileges on PRs?

@rryan
Copy link
Member

rryan commented May 17, 2017

LGTM, thanks! Though I'm still curious about how to handle the internal/external issue. I think merging this will break internal code unless we ifdef the kernel to be opensource-only somehow.

@drpngx drpngx added awaiting testing (then merge) and removed stat:awaiting response Status - Awaiting response from author labels May 17, 2017
@drpngx
Copy link
Contributor

drpngx commented May 17, 2017

Jenkins, test this please.

@tillahoffmann
Copy link
Contributor Author

@rryan, do you think it would be possible to make the internal code available?

@rryan
Copy link
Member

rryan commented May 17, 2017

@rryan, do you think it would be possible to make the internal code available?

Unfortunately we can't, and I'm not qualified to explain why since I'm not a lawyer. :(

@rmlarsen
Copy link
Member

rmlarsen commented May 17, 2017

@tillahoffmann thanks! Our test resources are still somewhat limited, so we keep the privilege to trigger it to people in the team.

@rmlarsen
Copy link
Member

@tillahoffmann thanks again for the contribution!

@drpngx drpngx merged commit 6192777 into tensorflow:master May 17, 2017
@tillahoffmann tillahoffmann deleted the cpufft2 branch May 17, 2017 17:18
@rryan
Copy link
Member

rryan commented May 17, 2017

thanks @tillahoffmann !

@Dref360
Copy link

Dref360 commented May 23, 2017

Could this get merged into r1.2? Seems like a feature that people are really excited about.

@beniroquai
Copy link

Maybe supid question, but along which axis is the fft performed?
Assume a 3D tensor with 2D-images along the 3rd dimension, now I'm doing a fft2d. How would I do it if I want to do it for each 2D slice seperatly?

Most other libraries have something like np.fft2d(image, s=None, axes=2) or so.
Would be nice to have this feature also in TensorFlow :)

@Androbin
Copy link
Contributor

Androbin commented Jun 15, 2017

@beniroquai It's common to have to batch dimensions first and not last.

@beniroquai
Copy link

Ahh! Nice! That's my error. So it transforms along 1st dimension?
Imagine you have an Image-stack of 128x128x45, where z=45; I should better permute it to 45x128x128?
Thank you! :)

@tillahoffmann
Copy link
Contributor Author

@beniroquai, yes the batch dimension (45 in your case) should be the first dimension. The n-dimensional fft will transform the last n dimension. I don't expect to see axis support in tensorflow any time soon because the GPU kernel implementation seems to rely on the transform being along the last dimensions--but I'm happy to stand corrected.

@rryan
Copy link
Member

rryan commented Jun 15, 2017

I don't expect to see axis support in tensorflow any time soon because the GPU kernel implementation seems to rely on the transform being along the last dimensions--but I'm happy to stand corrected.

I'd love to add an axis parameter. If someone from the community would like to add it I think it'd be welcomed by the team. I doubt it's high on anybody's TODO list at the moment though.

I think the code is already generic enough to support it for GPU since we already specify the "advanced data layout" parameters to configure it to do the last N dimensions.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/fft_ops.cc#L308
http://docs.nvidia.com/cuda/cufft/index.html#advanced-data-layout

In the worst case, we could perform multiple serial 1D FFTs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet