Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling a debug dll build under Windows (without CUDA, at least) #42307

Merged
merged 5 commits into from Aug 15, 2020

Conversation

MikhailStartsev
Copy link
Contributor

Modified SpaceToDepthOp and DepthToSpaceOp templated classes not to use a SpaceToDepthOpFunctor/DepthToSpaceOpFunctor structs with a template parameter Device=_GPU_Device in case the class itself is instantiated with Device=_CPU_Device. Added a partial template specialization for Device=GPUDevice to preserve the existing behaviour in all cases.

This at least partially (i.e. when bazel is configured not to use CUDA) fixes issue #41118.

…se a SpaceToDepthOpFunctor/DepthToSpaceOpFunctor struct with a template parameter Device=GPUDevice in case the class itself is instantiated with Device=CPUDevice. Added a partial template specialization for Device=GPUDevice to preserve the behaviour in all cases.
@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Aug 13, 2020
@@ -34,12 +33,14 @@ limitations under the License.
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/util/tensor_format.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of includes changed according to clang-formatting tool with -style=google

@MikhailStartsev
Copy link
Contributor Author

@rthadur Re-opening PR #42268 with master branch this time.

@gbaned gbaned self-assigned this Aug 13, 2020
@gbaned gbaned added comp:core issues related to core part of tensorflow type:build/install Build and install issues labels Aug 13, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Aug 13, 2020
@gbaned gbaned requested a review from tatianashp August 13, 2020 11:13
@tatianashp tatianashp requested a review from sanjoy August 14, 2020 02:25
PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Aug 14, 2020
@@ -143,6 +127,92 @@ class DepthToSpaceOp : public OpKernel {
TensorFormat data_format_;
};

// Template specialization for GPUDevice, explicit referncing GPUDevice in code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End comment with period, also typo in referncing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;
}
}

// NOTE: Assumes data_format_ == FORMAT_NHWC here, since we have rejected
// (CPU && data_format_ != FORMAT_NHWC) in the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specialize just this part that's different between GPU and CPU? Same comment for SpaceToDepthOp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I tried to do initially - specialize only the Compute() function, but AFAIU there is no way to specialize part of the class definition only: The template signature of the function has to match that of the class, so to partially specialize the function you need to partially specialize the class.

Taking the GPU-specialized code into a separate function would result in the same situation - the class still needs to be specialized.

One could probably get around this with some inheritance trickery, but this IMO is making this messier than it should be. Not that I like this code duplication much more...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something like:

template <class Device>
struct DepthToSpaceFunctorWrapper {
  void operator(params) {
    // .. generic implementation
  }
};

template <>
struct DepthToSpaceFunctorWrapper<GPUDevice> {
  void operator(params) {
    // .. GPU implementation
  }
};

void Compute(...) {
  // Common stuff ..
  auto Toutput = outputs_tensor->tensor<T, kDims>();
  DepthToSpaceFunctorWrapper<Device>{}(params as needed);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, isn't there a much simpler solution? In the if clause here, if one is inside it, it is guaranteed that Device is GPUDevice. So why not replace functor::DepthToSpaceOpFunctor<GPUDevice, T, FORMAT_NCHW> functor; with functor::DepthToSpaceOpFunctor<Device, T, FORMAT_NCHW> functor; inside this clause? This should be identical, no? If the condition is false, we never get into the code with the "wrong" device, if it is true - the code is literally the same.

return;
}
}

// NOTE: Assumes data_format_ == FORMAT_NHWC here, since we have rejected
// (CPU && data_format_ != FORMAT_NHWC) in the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something like:

template <class Device>
struct DepthToSpaceFunctorWrapper {
  void operator(params) {
    // .. generic implementation
  }
};

template <>
struct DepthToSpaceFunctorWrapper<GPUDevice> {
  void operator(params) {
    // .. GPU implementation
  }
};

void Compute(...) {
  // Common stuff ..
  auto Toutput = outputs_tensor->tensor<T, kDims>();
  DepthToSpaceFunctorWrapper<Device>{}(params as needed);
}

…to not use a SpaceToDepthOpFunctor/DepthToSpaceOpFunctor struct with a template parameter Device=GPUDevice in case the class itself is instantiated with Device=CPUDevice. Added a partial template specialization for Device=GPUDevice to preserve the behaviour in all cases."

This reverts commit 132e8af.
…unctor::SpaceToDepthOpFunctor<GPUDevice, ...> in case functor::SpaceToDepthOp<CPUDevice, ...> is compiled
@MikhailStartsev
Copy link
Contributor Author

MikhailStartsev commented Aug 14, 2020

Please check out the much more concise change version :)

UPD: Hm. It's more concise alright, but the debug build now does not work again. It used to complain about missing symbols for the () operator of SpaceToDepthOpFunctor<GPUDevice, ...>, now I got same linker errors for SpaceToDepthOpFunctor<CPUDevice, ..., FORMAT_NCHW> because the CPUDevice-functor is only instantiated for FORMAT_NHWC as the last template argument...

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Aug 14, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 14, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 14, 2020
@tensorflow-copybara tensorflow-copybara merged commit c33dc04 into tensorflow:master Aug 15, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Aug 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:M CL Change Size: Medium type:build/install Build and install issues
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

6 participants