-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TF:TRT] Enable TensorRT explicit precision (QDQ/QAT) support #52248
[TF:TRT] Enable TensorRT explicit precision (QDQ/QAT) support #52248
Conversation
50cfcc3
to
d076898
Compare
ea649e0
to
dad56b9
Compare
b71e338
to
7e88f0b
Compare
@christopherbate Can you please resolve conflicts? Thanks! |
25ba28e
to
4083baf
Compare
Conflicts resolved |
All done. Let me know when you want me to squash. |
e6482aa
to
dac3bc4
Compare
tensorflow/compiler/tf2tensorrt/convert/ops/quantization_ops_test.cc
Outdated
Show resolved
Hide resolved
use_calibration_ = false; | ||
} | ||
|
||
const bool use_explicit_precision = GraphDefHasQDQNodes(item.graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two questions regarding this:
(1) we don't consider the TensorRT version to enable use_explicit_precision, this looks wrong to me.
For example, a user has a model that uses QuantizeAndDequantizeV2 + TensorRT 7. It was working but now with this PR, he sees an error produced by TensorRT, similar to that in quantization_ops_test.cc line 435 & 440. Am I right here?
(2) A user has a model that uses all the OPs in kQuantizationOpNames + TensorRT8. It was working fine. But now with your change, we only convert one out of the four OPs in kQuantizationOpNames . Will this be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yes, I let me disable this for TRT < 8.0. We can use TRT 7 for very limited use cases, but likely EP mode won't be useful in the context of TF-TRT with TRT7 at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Completely disabled explicit QDQ tests and grappler options (disable_emulate_quantization_folding) for TRT < 8.0. |
@bixia1 Good to squash? |
@christopherbate Can you please resolve conflicts? Thanks! |
6400408
to
d8f9bac
Compare
rebased, thanks |
@christopherbate please squash and I will approve. |
squashed |
d8f9bac
to
a96b152
Compare
Adds TensorRT QDQ support ("explicit precision mode"), also sometimes referred to as "QAT support", referring to the training algorithm where QDQ nodes are used. From here on, we refer to the existing non-explicit precision pathway in the code base as the "dynamic range INT8" mode (DR INT8) and the new mode as the QDQ INT8 mode.
In the new mode, TF
QuantizeAndDequantize
operations are converted to TensorRT quantization scaling layers. Both the new QDQ mode logic as well as existing DR mode logic (ConvertQuantize
) are moved into the fileconvert/ops/quantization_ops.cc
.In addition, in QDQ mode it is necessary to prevent the existing Grappler optimizations invoked in
trt_convert.py
on the loaded SavedModel from folding frozenQuantizeAndDequantizeV2
operations between weighted ops (Conv, Matmul ,etc) and the weight constants. Thus, we depend on the experimental Grappler rewriter config optionexperimental_disable_folding_quantization_emulation
and will be affected if it is removed. The alternative is to allow Grappler folding of the QDQ and constant weights and inserting identity QDQ scale factors manually during TensorRT network construction, but the logic becomes extremely verbose .A test suite is added in
convert/ops/quantization_ops_test.cc
. It builds a variety of sub-graph patterns and tests for conversion success. Because TRT QDQ mode has evolved significantly in terms of robustness and features between TRT 7 and TRT8, a set of test waive/skip policies are added indicating which patterns of use are appropriate for TRT7 vs TRT8