Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XLA GPU] int8 convolution on CUDA #30771

Merged

Conversation

yongfeng-nv
Copy link
Contributor

This is a breakdown of previous PR (#29158) per @timshen91 's suggestion. It doesn't depend on #30761 or #30762.

  1. Allow convolution with integer input/kernel and float output/bias/scaling/side and disallow int8-to-int8 convolution node in XLA.
  2. Add a new traversal to cudnn_fused_convolution_rewriter to fuse clamping and data type conversion nodes with convolution custom-call node for int8 convolution.
  3. Set convolution layout constraints to NHWC for integer convolution.

@tensorflow-bot tensorflow-bot bot added the size:M CL Change Size: Medium label Jul 16, 2019
@gbaned gbaned self-assigned this Jul 17, 2019
@gbaned gbaned added the comp:xla XLA label Jul 17, 2019
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Jul 17, 2019
@gbaned gbaned requested a review from timshen91 July 17, 2019 04:08
timshen91
timshen91 previously approved these changes Jul 17, 2019
tensorflow/compiler/xla/service/gpu/cudnn_conv_rewriter.cc Outdated Show resolved Hide resolved
@tensorflow-bot tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Jul 17, 2019
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Jul 17, 2019
@tensorflow-bot tensorflow-bot bot added the ready to pull PR ready for merge process label Jul 17, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 17, 2019
PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Jul 18, 2019
@yongfeng-nv
Copy link
Contributor Author

@timshen91 I have updated the PR to address your comments. Please review and approve it again.

CHECK_EQ(conv->opcode(), HloOpcode::kConvolution);
// Helper function to create a custom_call instruction to replace the given
// conv instruction
static StatusOr<HloInstruction*> CreateCustomCall(HloInstruction* conv) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull the lambda out to make a regular helper function to it with TF_ASSIGN_OR_RETURN, so that error status will return to the caller.

@gbaned
Copy link
Contributor

gbaned commented Jul 24, 2019

@yongfeng-nv Could you please resolve the conflicts? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Jul 31, 2019

@yongfeng-nv gentle ping to resolve the conflicts. Thanks!

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Jul 31, 2019
@rthadur rthadur requested a review from timshen91 August 12, 2019 19:12
@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 13, 2019
instr->operand(0)->shape().element_type() == X);
};
HloInstruction* convert = match.convert_or_clamp->users()[0];
if (match.conv->operand_count() < 4 &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard-coded 4 is not ideal. I am open to define it in a proper file.

@@ -260,9 +260,7 @@ StatusOr<std::unique_ptr<HloInstruction>> TryRewriteToCudnnForwardRelu(

// Fuse bias/scaling/ReLU with convolution custom call with floating point
// output
StatusOr<bool> RunFuseBiasSideActivation(
HloModule* module,
std::unordered_set<const HloInstruction*>& tracked_custom_calls) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need tracked_custom_calls. In the previous commit, I used it to track partially matched convolution, especially those with float output.

@@ -319,22 +312,23 @@ absl::optional<ConvWithConvertOrClamp> FindConvWithClamp(
using match::Op;

// The pattern we want to match:
// clamp(broadcast(-128), (get_tuple_element(custom_call(int8_x,
// int8_w, ...)), broadcast(127);
// convert<int8>(clamp(broadcast(-128), (get_tuple_element(custom_call(int8_x,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since only int8 output needs clamp and it must come with convert, the pattern includes both of them.

@yongfeng-nv
Copy link
Contributor Author

The previous implementation requires a clamp<-128,127> on output for int8-to-float convolution. This matches the current behavior of CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM. However, the clamp is not supposed to be there for float output. This commit removes the clamp from the corresponding patterns.

cheshire
cheshire previously approved these changes Sep 10, 2019
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Sep 10, 2019
@tensorflow-bot tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Sep 10, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 10, 2019
@gbaned
Copy link
Contributor

gbaned commented Sep 11, 2019

@yongfeng-nv Can you please check build failures? Thanks!

@gbaned gbaned added stat:awaiting response Status - Awaiting response from author and removed ready to pull PR ready for merge process labels Sep 11, 2019
PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Sep 11, 2019
@yongfeng-nv
Copy link
Contributor Author

@yongfeng-nv Can you please check build failures? Thanks!

@gbaned I have submitted a fix to the failure under "Ubuntu Sanity — Internal CI build failed". Please let me know if anything else to fix.

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Sep 11, 2019
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 11, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 11, 2019
@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Sep 12, 2019
@gbaned gbaned added the kokoro:force-run Tests on submitted change label Sep 12, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 12, 2019
tensorflow-copybara pushed a commit that referenced this pull request Sep 12, 2019
@tensorflow-copybara tensorflow-copybara merged commit f04619b into tensorflow:master Sep 12, 2019
PR Queue automation moved this from Approved by Reviewer to Merged Sep 12, 2019
@cheshire
Copy link
Member

@yongfeng-nv Merged, but this required a few changes: please try to take care of compiler warnings before submitting. I'll try to upgrade those to errors on the presubmission testing bots.

@yongfeng-nv
Copy link
Contributor Author

@yongfeng-nv Merged, but this required a few changes: please try to take care of compiler warnings before submitting. I'll try to upgrade those to errors on the presubmission testing bots.

Can you show me the log with warnings? I will fix them.

DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/tensorflow that referenced this pull request Mar 17, 2020
DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/tensorflow that referenced this pull request Jul 9, 2020
DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/tensorflow that referenced this pull request Oct 2, 2020
nouiz pushed a commit to nouiz/tensorflow that referenced this pull request Dec 14, 2020
for (int64 i = 0; i < conv->operand_count(); ++i) {
check_size_increase(conv->operand(i)->shape(), new_input_shapes[i]);
}
check_size_increase(result_shape, new_result_shape);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a bad bug. This lambda doesn't return anything. Therefore in the lines below, we don't check anything. And apparently there are no unit tests that covered this case.

This went undetected for ~9 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:xla XLA ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet