Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF:TRT] Enable TensorRT explicit precision (QDQ/QAT) support #52248

Merged

Conversation

christopherbate
Copy link
Contributor

@christopherbate christopherbate commented Oct 4, 2021

Adds TensorRT QDQ support ("explicit precision mode"), also sometimes referred to as "QAT support", referring to the training algorithm where QDQ nodes are used. From here on, we refer to the existing non-explicit precision pathway in the code base as the "dynamic range INT8" mode (DR INT8) and the new mode as the QDQ INT8 mode.

In the new mode, TF QuantizeAndDequantize operations are converted to TensorRT quantization scaling layers. Both the new QDQ mode logic as well as existing DR mode logic (ConvertQuantize) are moved into the file convert/ops/quantization_ops.cc.

In addition, in QDQ mode it is necessary to prevent the existing Grappler optimizations invoked in trt_convert.py on the loaded SavedModel from folding frozen QuantizeAndDequantizeV2 operations between weighted ops (Conv, Matmul ,etc) and the weight constants. Thus, we depend on the experimental Grappler rewriter config option experimental_disable_folding_quantization_emulation and will be affected if it is removed. The alternative is to allow Grappler folding of the QDQ and constant weights and inserting identity QDQ scale factors manually during TensorRT network construction, but the logic becomes extremely verbose .

A test suite is added in convert/ops/quantization_ops_test.cc. It builds a variety of sub-graph patterns and tests for conversion success. Because TRT QDQ mode has evolved significantly in terms of robustness and features between TRT 7 and TRT8, a set of test waive/skip policies are added indicating which patterns of use are appropriate for TRT7 vs TRT8

@google-ml-butler google-ml-butler bot added the size:XL CL Change Size:Extra Large label Oct 4, 2021
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Oct 4, 2021
@google-cla google-cla bot added the cla: yes label Oct 4, 2021
@gbaned gbaned self-assigned this Oct 5, 2021
@gbaned gbaned added the comp:gpu:tensorrt Issues specific to TensorRT label Oct 5, 2021
@christopherbate christopherbate force-pushed the tf-trt-ep-support branch 2 times, most recently from 50cfcc3 to d076898 Compare October 5, 2021 21:39
@christopherbate christopherbate force-pushed the tf-trt-ep-support branch 3 times, most recently from ea649e0 to dad56b9 Compare October 6, 2021 03:04
@sanjoy sanjoy requested review from bixia1 and removed request for sanjoy October 6, 2021 05:12
@christopherbate christopherbate force-pushed the tf-trt-ep-support branch 4 times, most recently from b71e338 to 7e88f0b Compare October 7, 2021 21:55
@gbaned
Copy link
Contributor

gbaned commented Oct 28, 2021

@christopherbate Can you please resolve conflicts? Thanks!

@gbaned gbaned added stat:awaiting response Status - Awaiting response from author and removed awaiting review Pull request awaiting review labels Oct 28, 2021
@christopherbate christopherbate force-pushed the tf-trt-ep-support branch 2 times, most recently from 25ba28e to 4083baf Compare November 1, 2021 22:21
@christopherbate
Copy link
Contributor Author

Conflicts resolved

@gbaned gbaned removed the stat:awaiting response Status - Awaiting response from author label Nov 2, 2021
@gbaned gbaned removed the request for review from bixia1 November 2, 2021 13:28
@christopherbate
Copy link
Contributor Author

All done. Let me know when you want me to squash.

@christopherbate christopherbate force-pushed the tf-trt-ep-support branch 2 times, most recently from e6482aa to dac3bc4 Compare November 30, 2021 22:50
use_calibration_ = false;
}

const bool use_explicit_precision = GraphDefHasQDQNodes(item.graph);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions regarding this:
(1) we don't consider the TensorRT version to enable use_explicit_precision, this looks wrong to me.
For example, a user has a model that uses QuantizeAndDequantizeV2 + TensorRT 7. It was working but now with this PR, he sees an error produced by TensorRT, similar to that in quantization_ops_test.cc line 435 & 440. Am I right here?
(2) A user has a model that uses all the OPs in kQuantizationOpNames + TensorRT8. It was working fine. But now with your change, we only convert one out of the four OPs in kQuantizationOpNames . Will this be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yes, I let me disable this for TRT < 8.0. We can use TRT 7 for very limited use cases, but likely EP mode won't be useful in the context of TF-TRT with TRT7 at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@christopherbate
Copy link
Contributor Author

Completely disabled explicit QDQ tests and grappler options (disable_emulate_quantization_folding) for TRT < 8.0.

@christopherbate
Copy link
Contributor Author

@bixia1 Good to squash?

@gbaned
Copy link
Contributor

gbaned commented Dec 3, 2021

@christopherbate Can you please resolve conflicts? Thanks!

@gbaned gbaned added stat:awaiting response Status - Awaiting response from author and removed awaiting review Pull request awaiting review labels Dec 3, 2021
@christopherbate
Copy link
Contributor Author

@christopherbate Can you please resolve conflicts? Thanks!

rebased, thanks

@bixia1
Copy link
Contributor

bixia1 commented Dec 3, 2021

@christopherbate please squash and I will approve.

@christopherbate
Copy link
Contributor Author

squashed

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 6, 2021
@gbaned gbaned requested review from bixia1 and removed request for bixia1 December 6, 2021 08:34
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Dec 6, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 6, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 6, 2021
@copybara-service copybara-service bot merged commit 7103c2c into tensorflow:master Dec 6, 2021
@google-ml-butler google-ml-butler bot removed awaiting review Pull request awaiting review ready to pull PR ready for merge process labels Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu:tensorrt Issues specific to TensorRT size:XL CL Change Size:Extra Large
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants