[XLA GPU] int8 convolution on CUDA #30771

yongfeng-nv · 2019-07-16T20:13:01Z

This is a breakdown of previous PR (#29158) per @timshen91 's suggestion. It doesn't depend on #30761 or #30762.

Allow convolution with integer input/kernel and float output/bias/scaling/side and disallow int8-to-int8 convolution node in XLA.
Add a new traversal to cudnn_fused_convolution_rewriter to fuse clamping and data type conversion nodes with convolution custom-call node for int8 convolution.
Set convolution layout constraints to NHWC for integer convolution.

tensorflow/compiler/xla/service/gpu/cudnn_conv_rewriter.cc

yongfeng-nv · 2019-07-18T14:46:08Z

@timshen91 I have updated the PR to address your comments. Please review and approve it again.

tensorflow/compiler/xla/service/gpu/cudnn_conv_rewriter.cc

yongfeng-nv · 2019-07-19T03:29:12Z

tensorflow/compiler/xla/service/gpu/cudnn_conv_rewriter.cc

-  CHECK_EQ(conv->opcode(), HloOpcode::kConvolution);
+// Helper function to create a custom_call instruction to replace the given
+// conv instruction
+static StatusOr<HloInstruction*> CreateCustomCall(HloInstruction* conv) {


Pull the lambda out to make a regular helper function to it with TF_ASSIGN_OR_RETURN, so that error status will return to the caller.

gbaned · 2019-07-24T09:22:37Z

@yongfeng-nv Could you please resolve the conflicts? Thanks!

gbaned · 2019-07-31T09:44:13Z

@yongfeng-nv gentle ping to resolve the conflicts. Thanks!

yongfeng-nv · 2019-08-13T18:07:59Z

tensorflow/compiler/xla/service/gpu/cudnn_fused_conv_rewriter.cc

+            instr->operand(0)->shape().element_type() == X);
+  };
+  HloInstruction* convert = match.convert_or_clamp->users()[0];
+  if (match.conv->operand_count() < 4 &&


The hard-coded 4 is not ideal. I am open to define it in a proper file.

yongfeng-nv · 2019-08-16T04:28:40Z

tensorflow/compiler/xla/service/gpu/cudnn_fused_conv_rewriter.cc

@@ -260,9 +260,7 @@ StatusOr<std::unique_ptr<HloInstruction>> TryRewriteToCudnnForwardRelu(

 // Fuse bias/scaling/ReLU with convolution custom call with floating point
 // output
-StatusOr<bool> RunFuseBiasSideActivation(
-    HloModule* module,
-    std::unordered_set<const HloInstruction*>& tracked_custom_calls) {


We don't need tracked_custom_calls. In the previous commit, I used it to track partially matched convolution, especially those with float output.

yongfeng-nv · 2019-08-16T04:33:06Z

tensorflow/compiler/xla/service/gpu/cudnn_fused_conv_rewriter.cc

@@ -319,22 +312,23 @@ absl::optional<ConvWithConvertOrClamp> FindConvWithClamp(
  using match::Op;

  // The pattern we want to match:
-  //   clamp(broadcast(-128), (get_tuple_element(custom_call(int8_x,
-  //   int8_w, ...)), broadcast(127);
+  //   convert<int8>(clamp(broadcast(-128), (get_tuple_element(custom_call(int8_x,


Since only int8 output needs clamp and it must come with convert, the pattern includes both of them.

tensorflow/compiler/xla/service/gpu/cudnn_fused_conv_rewriter.cc

yongfeng-nv · 2019-08-16T05:10:17Z

The previous implementation requires a clamp<-128,127> on output for int8-to-float convolution. This matches the current behavior of CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM. However, the clamp is not supposed to be there for float output. This commit removes the clamp from the corresponding patterns.

gbaned · 2019-09-11T10:47:43Z

@yongfeng-nv Can you please check build failures? Thanks!

yongfeng-nv · 2019-09-11T15:08:23Z

@yongfeng-nv Can you please check build failures? Thanks!

@gbaned I have submitted a fix to the failure under "Ubuntu Sanity — Internal CI build failed". Please let me know if anything else to fix.

PiperOrigin-RevId: 268764354

cheshire · 2019-09-12T21:37:04Z

@yongfeng-nv Merged, but this required a few changes: please try to take care of compiler warnings before submitting. I'll try to upgrade those to errors on the presubmission testing bots.

yongfeng-nv · 2019-09-12T21:48:53Z

@yongfeng-nv Merged, but this required a few changes: please try to take care of compiler warnings before submitting. I'll try to upgrade those to errors on the presubmission testing bots.

Can you show me the log with warnings? I will fix them.

…-phase3 PiperOrigin-RevId: 268764354

jlebar · 2021-07-27T04:28:11Z

tensorflow/compiler/xla/service/gpu/cudnn_pad_for_convolutions.cc

+    for (int64 i = 0; i < conv->operand_count(); ++i) {
+      check_size_increase(conv->operand(i)->shape(), new_input_shapes[i]);
+    }
+    check_size_increase(result_shape, new_result_shape);
  }


This is kind of a bad bug. This lambda doesn't return anything. Therefore in the lines below, we don't check anything. And apparently there are no unit tests that covered this case.

This went undetected for ~9 months.

tensorflow-bot bot added the size:M CL Change Size: Medium label Jul 16, 2019

googlebot added the cla: yes label Jul 16, 2019

yongfeng-nv mentioned this pull request Jul 17, 2019

Phase 4 of XLA int8 convolution on CUDA #30783

Closed

gbaned self-assigned this Jul 17, 2019

gbaned added the comp:xla XLA label Jul 17, 2019

gbaned added this to Assigned Reviewer in PR Queue via automation Jul 17, 2019

gbaned requested a review from timshen91 July 17, 2019 04:08

yongfeng-nv mentioned this pull request Jul 17, 2019

XLA int8 convolution #29158

Closed

timshen91 previously approved these changes Jul 17, 2019

View reviewed changes

tensorflow/compiler/xla/service/gpu/cudnn_conv_rewriter.cc Outdated Show resolved Hide resolved

tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Jul 17, 2019

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Jul 17, 2019

tensorflow-bot bot added the ready to pull PR ready for merge process label Jul 17, 2019

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 17, 2019

yongfeng-nv dismissed timshen91’s stale review via 9ecdc2d July 18, 2019 03:19

PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Jul 18, 2019

timshen91 requested changes Jul 18, 2019

View reviewed changes

tensorflow/compiler/xla/service/gpu/cudnn_conv_rewriter.cc Outdated Show resolved Hide resolved

yongfeng-nv commented Jul 19, 2019

View reviewed changes

gbaned added the stat:awaiting response Status - Awaiting response from author label Jul 31, 2019

yongfeng-nv force-pushed the int8-convolution-phase3 branch from 47997dd to 7e32aa5 Compare August 12, 2019 04:31

yongfeng-nv mentioned this pull request Aug 12, 2019

Phase 1 of XLA int8 convolution on CUDA. #30761

Merged

rthadur requested a review from timshen91 August 12, 2019 19:12

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 13, 2019

yongfeng-nv commented Aug 13, 2019

View reviewed changes

yongfeng-nv commented Aug 16, 2019

View reviewed changes

tensorflow/compiler/xla/service/gpu/cudnn_fused_conv_rewriter.cc Show resolved Hide resolved

cheshire previously approved these changes Sep 10, 2019

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Sep 10, 2019

tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Sep 10, 2019

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 10, 2019

gbaned added stat:awaiting response Status - Awaiting response from author and removed ready to pull PR ready for merge process labels Sep 11, 2019

yongfeng-nv dismissed cheshire’s stale review via f04619b September 11, 2019 15:05

PR Queue automation moved this from Approved by Reviewer to Reviewer Requested Changes Sep 11, 2019

Update BUILD with buildifiered version.

f04619b

cheshire approved these changes Sep 11, 2019

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Sep 11, 2019

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 11, 2019

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 11, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Sep 12, 2019

gbaned added the kokoro:force-run Tests on submitted change label Sep 12, 2019

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 12, 2019

tensorflow-copybara pushed a commit that referenced this pull request Sep 12, 2019

Merge pull request #30771 from yongfeng-nv:int8-convolution-phase3

eb94f41

PiperOrigin-RevId: 268764354

tensorflow-copybara merged commit f04619b into tensorflow:master Sep 12, 2019

PR Queue automation moved this from Approved by Reviewer to Merged Sep 12, 2019

DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/tensorflow that referenced this pull request Mar 17, 2020

Merge pull request tensorflow#30771 from yongfeng-nv:int8-convolution…

fb9d51e

…-phase3 PiperOrigin-RevId: 268764354

DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/tensorflow that referenced this pull request Jul 9, 2020

Merge pull request tensorflow#30771 from yongfeng-nv:int8-convolution…

e7fa4c5

…-phase3 PiperOrigin-RevId: 268764354

DEKHTIARJonathan pushed a commit to DEKHTIARJonathan/tensorflow that referenced this pull request Oct 2, 2020

Merge pull request tensorflow#30771 from yongfeng-nv:int8-convolution…

a19407f

…-phase3 PiperOrigin-RevId: 268764354

nouiz pushed a commit to nouiz/tensorflow that referenced this pull request Dec 14, 2020

Merge pull request tensorflow#30771 from yongfeng-nv:int8-convolution…

e91fd6c

…-phase3 PiperOrigin-RevId: 268764354

jlebar reviewed Jul 27, 2021

View reviewed changes

jmc128 mentioned this pull request Feb 2, 2023

Status of int8 dot/conv with XLA+CUDA #59530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XLA GPU] int8 convolution on CUDA #30771

[XLA GPU] int8 convolution on CUDA #30771

yongfeng-nv commented Jul 16, 2019

yongfeng-nv commented Jul 18, 2019

yongfeng-nv Jul 19, 2019

gbaned commented Jul 24, 2019

gbaned commented Jul 31, 2019

yongfeng-nv Aug 13, 2019

yongfeng-nv Aug 16, 2019

yongfeng-nv Aug 16, 2019

yongfeng-nv commented Aug 16, 2019

gbaned commented Sep 11, 2019

yongfeng-nv commented Sep 11, 2019

cheshire commented Sep 12, 2019

yongfeng-nv commented Sep 12, 2019

jlebar Jul 27, 2021

[XLA GPU] int8 convolution on CUDA #30771

[XLA GPU] int8 convolution on CUDA #30771

Conversation

yongfeng-nv commented Jul 16, 2019

yongfeng-nv commented Jul 18, 2019

yongfeng-nv Jul 19, 2019

Choose a reason for hiding this comment

gbaned commented Jul 24, 2019

gbaned commented Jul 31, 2019

yongfeng-nv Aug 13, 2019

Choose a reason for hiding this comment

yongfeng-nv Aug 16, 2019

Choose a reason for hiding this comment

yongfeng-nv Aug 16, 2019

Choose a reason for hiding this comment

yongfeng-nv commented Aug 16, 2019

gbaned commented Sep 11, 2019

yongfeng-nv commented Sep 11, 2019

cheshire commented Sep 12, 2019

yongfeng-nv commented Sep 12, 2019

jlebar Jul 27, 2021

Choose a reason for hiding this comment