-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per-channel quantized INT8 weights unpacking in XNNPACK delegate #50875
Support per-channel quantized INT8 weights unpacking in XNNPACK delegate #50875
Conversation
@multiverse-tf Could you review this please? |
@dev0x13 Can you please resolve conflicts? Thanks! |
@gbaned Done |
@multiverse-tf Could you review this please? |
It's been more than a month since I've created this MR. Should I close it now due to the lack of review? |
Sorry for not catching this review request earlier. In the past month, I think we've also added native per-channel quantized INT8 op support in XNNPACK. @Maratyszcza, could you shed more light on the support? Thx |
@multiverse-tf XNNPACK does not support per-channel dynamic range quantization. This MR adds the support. |
Closing due to the lack of response from maintainers. |
…ate (MR review fixes) 1. Minor cosmetic fixes suggested by MR reviewer are applied to XNNPACK delegate test set. 2. Tests for per-channel quantized INT8 weights unpacking in XNNPACK delegate are refactored to perform proper per-channel quantization parameters computation at runtime instead of using random-initialized quantized tensors. 3. Added per-tensor and per-channel quantized weights unpacking tests for TRANSPOSE_CONV op in XNNPACK delegate.
@Maratyszcza Thank you for the review! All the changes are made. |
@@ -66,7 +66,7 @@ void BinaryElementwiseTester::Test(tflite::BuiltinOperator binary_op, | |||
if (Input1Static()) { | |||
ASSERT_FALSE(Input2Static()); | |||
} | |||
if (FP16Weights() || INT8Weights()) { | |||
if (FP16Weights() || INT8Weights() || INT8ChannelWiseWeights()) { | |||
ASSERT_TRUE(Input1Static() || Input2Static()); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that if channelwise weights are used, the static input has at least one dimension (i.e. isn't a scalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be if (INT8ChannelWiseWeights() && Input1Static())
... + the same for input2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad. Fixed
std::bind(QuantizeInt8, std::placeholders::_1, 0, input1_scale)); | ||
input1_scales.resize(1); | ||
input1_zero_points.resize(1, 0); | ||
input1_scales[0] = GetInt8QuantizationScale(input1_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine with resize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::vector<int> current_dim(num_dims, 0); | ||
|
||
do { | ||
size_t offset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const size_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
current_dim.data(), 0, nullptr); | ||
const int channel_idx = current_dim[quantized_dimension]; | ||
const float val = data[offset]; | ||
if (has_min_max_value[channel_idx]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize min to -std::numeric_limits<float>::infinity()
and max to +std::numeric_limits<float>::infinity()
and remove these checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG overall. Please revert the change in README and simplify the code per comments, and we're good to go.
…ate (MR review fixes) 1. Reverted changes in XNNPACK delegate README. 2. Made some adjustments for XNNPACK tests code suggested by the MR reviewer.
@Maratyszcza Thank you! All the suggested changes are made. |
current_dim.data(), 0, nullptr); | ||
const int channel_idx = current_dim[quantized_dimension]; | ||
const float val = data[offset]; | ||
if (min[channel_idx] > val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines can be further simplified using std::min
, std::max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…ate (MR review fixes)
LGTM. @gbaned could you merge? |
PiperOrigin-RevId: 411934765 Change-Id: Ida9cd3723742a3b92139345fccc29b60d5383be0
I noticed that the PR was rolled back right after merge. Is there something wrong with it? |
|
@Maratyszcza Fixed this in my fork, but it seems that this PR cannot be reopened. What do I need to do in order to submit this fix to make my changes merged again? Thank you in advance! |
@Maratyszcza I am really sorry for bothering you, but could you clarify what I need to do in order to submit this fix to make my changes merged again? |
|
Re-landed in a6d352f |
Thank you! |
This MR expands INT8 weights unpacking for FP32 inference in XNNPACK delegate support added in the previous MR for per-channel dynamic range quantized model. Previous changes have only supported per-tensor quantization mode, which is obsolete in recent TensorFlow releases.
I have not added proper testing yet, because I'd like to get some suggestions from a maintainer, i.e. the best way to organize such testing while keeping the codebase clean (specifically, how to change testers). Thank you in advance!