New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling a debug dll build under Windows (without CUDA, at least) #42307
Enabling a debug dll build under Windows (without CUDA, at least) #42307
Conversation
…se a SpaceToDepthOpFunctor/DepthToSpaceOpFunctor struct with a template parameter Device=GPUDevice in case the class itself is instantiated with Device=CPUDevice. Added a partial template specialization for Device=GPUDevice to preserve the behaviour in all cases.
@@ -34,12 +33,14 @@ limitations under the License. | |||
#include "tensorflow/core/platform/logging.h" | |||
#include "tensorflow/core/platform/types.h" | |||
#include "tensorflow/core/util/tensor_format.h" | |||
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of includes changed according to clang-formatting tool with -style=google
@@ -143,6 +127,92 @@ class DepthToSpaceOp : public OpKernel { | |||
TensorFormat data_format_; | |||
}; | |||
|
|||
// Template specialization for GPUDevice, explicit referncing GPUDevice in code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End comment with period, also typo in referncing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return; | ||
} | ||
} | ||
|
||
// NOTE: Assumes data_format_ == FORMAT_NHWC here, since we have rejected | ||
// (CPU && data_format_ != FORMAT_NHWC) in the constructor. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specialize just this part that's different between GPU and CPU? Same comment for SpaceToDepthOp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I tried to do initially - specialize only the Compute() function, but AFAIU there is no way to specialize part of the class definition only: The template signature of the function has to match that of the class, so to partially specialize the function you need to partially specialize the class.
Taking the GPU-specialized code into a separate function would result in the same situation - the class still needs to be specialized.
One could probably get around this with some inheritance trickery, but this IMO is making this messier than it should be. Not that I like this code duplication much more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean something like:
template <class Device>
struct DepthToSpaceFunctorWrapper {
void operator(params) {
// .. generic implementation
}
};
template <>
struct DepthToSpaceFunctorWrapper<GPUDevice> {
void operator(params) {
// .. GPU implementation
}
};
void Compute(...) {
// Common stuff ..
auto Toutput = outputs_tensor->tensor<T, kDims>();
DepthToSpaceFunctorWrapper<Device>{}(params as needed);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, isn't there a much simpler solution? In the if clause here, if one is inside it, it is guaranteed that Device is GPUDevice. So why not replace functor::DepthToSpaceOpFunctor<GPUDevice, T, FORMAT_NCHW> functor;
with functor::DepthToSpaceOpFunctor<
Device, T, FORMAT_NCHW> functor;
inside this clause? This should be identical, no? If the condition is false, we never get into the code with the "wrong" device, if it is true - the code is literally the same.
return; | ||
} | ||
} | ||
|
||
// NOTE: Assumes data_format_ == FORMAT_NHWC here, since we have rejected | ||
// (CPU && data_format_ != FORMAT_NHWC) in the constructor. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean something like:
template <class Device>
struct DepthToSpaceFunctorWrapper {
void operator(params) {
// .. generic implementation
}
};
template <>
struct DepthToSpaceFunctorWrapper<GPUDevice> {
void operator(params) {
// .. GPU implementation
}
};
void Compute(...) {
// Common stuff ..
auto Toutput = outputs_tensor->tensor<T, kDims>();
DepthToSpaceFunctorWrapper<Device>{}(params as needed);
}
This reverts commit 7bf4aa5.
…to not use a SpaceToDepthOpFunctor/DepthToSpaceOpFunctor struct with a template parameter Device=GPUDevice in case the class itself is instantiated with Device=CPUDevice. Added a partial template specialization for Device=GPUDevice to preserve the behaviour in all cases." This reverts commit 132e8af.
…unctor::SpaceToDepthOpFunctor<GPUDevice, ...> in case functor::SpaceToDepthOp<CPUDevice, ...> is compiled
Please check out the much more concise change version :) UPD: Hm. It's more concise alright, but the debug build now does not work again. It used to complain about missing symbols for the () operator of |
Modified SpaceToDepthOp and DepthToSpaceOp templated classes not to use a SpaceToDepthOpFunctor/DepthToSpaceOpFunctor structs with a template parameter Device=_GPU_Device in case the class itself is instantiated with Device=_CPU_Device. Added a partial template specialization for Device=GPUDevice to preserve the existing behaviour in all cases.
This at least partially (i.e. when bazel is configured not to use CUDA) fixes issue #41118.