Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INTEL oneDNN][Bug Fix] Add index-validity check for min-max tensors for quantized oneDNN ops and related tests #59581

Merged

Conversation

gzmkl
Copy link
Contributor

@gzmkl gzmkl commented Feb 6, 2023

This is a follow-up PR of the merged PR

#59437

for all oneDNN quantization ops:

The PR adds boundary (rank) check for min-max tensors of all oneDNN quantized ops, which will prevent NPE (null-pointer exception).
With the added check, index access to an element can happen only after the "index" passes the validity check.

@gzmkl gzmkl requested a review from penpornk as a code owner February 6, 2023 21:02
@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label Feb 6, 2023
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Feb 6, 2023
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Feb 6, 2023
@kokoro-team kokoro-team removed kokoro:force-run Tests on submitted change labels Feb 6, 2023
Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fixes! I think there might be two minor mistakes

@@ -1903,18 +1902,25 @@ class MklQuantizedConvOp
if (std::is_same<Toutput, quint8>::value ||
std::is_same<Toutput, qint8>::value) {
const float min_input =
context->input(min_input_idx_).template flat<float>()(0);
context->input(min_input_idx_).template scalar<float>()(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
context->input(min_input_idx_).template scalar<float>()(0);
context->input(min_input_idx_).template scalar<float>()();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

tensorflow/core/kernels/mkl/mkl_conv_ops.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Feb 8, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 8, 2023
Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again!

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 9, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 9, 2023
@gbaned gbaned added the comp:mkl MKL related issues label Feb 9, 2023
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Feb 9, 2023
@gzmkl
Copy link
Contributor Author

gzmkl commented Feb 13, 2023

@penpornk Hi, I notice that there is a failure with "feedback/copybara" check but I could not access the "Details".

Please let me know if there is anything I need to do. Thanks!

@penpornk
Copy link
Member

Hi @gzmkl,

It was failing internal Windows CPU presubmit (not sure why the github one didn't fail..)
The test that failed is //tensorflow/core/kernels/mkl:mkl_quantized_conv_ops_test

[  FAILED  ] 11 tests, listed below:
[  FAILED  ] QuantizedConvTest.BiasAddFusion
[  FAILED  ] QuantizedConvTest.BiasAddRequantizeFusion
[  FAILED  ] QuantizedConvTest.BiasAddReluRequantizeFusion
[  FAILED  ] QuantizedConvTest.UnsignedInputBiasAddReluRequantizeFusion
[  FAILED  ] QuantizedConvTest.DWBiasAddFusion
[  FAILED  ] QuantizedConvTest.DWBiasAddRequantizeFusion
[  FAILED  ] QuantizedConvTest.DWBiasAddReluRequantizeFusion
[  FAILED  ] QuantizedConvTest.DWUnsignedInputBiasAddReluRequantizeFusion
[  FAILED  ] QuantizedConvTest.BiasAddSumReluRequantizeFusion
[  FAILED  ] QuantizedConvTest.BiasAddSumReluRequantizeFusionSignedSummand
[  FAILED  ] QuantizedConvTest.BiasAddSumReluFusionFloatSummand

Test command:

<path_to_bazel.exe> test //tensorflow/core/kernels/mkl:mkl_quantized_conv_ops_test --test_output=all

Example test failure log:

[ RUN      ] QuantizedConvTest.BiasAddFusion
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (-25 not close to -39.473743438720703)
Expected: true
i = 0 Tx[i] = -25 Ty[i] = -39.473743438720703
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (-5 not close to 5.9996280670166016)
Expected: true
i = 1 Tx[i] = -5 Ty[i] = 5.9996280670166016
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (19 not close to 17.189409255981445)
Expected: true
i = 2 Tx[i] = 19 Ty[i] = 17.189409255981445
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (14 not close to 30.283836364746094)
Expected: true
i = 3 Tx[i] = 14 Ty[i] = 30.283836364746094
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (10 not close to -7.0947980880737305)
Expected: true
i = 4 Tx[i] = 10 Ty[i] = -7.0947980880737305
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (25 not close to 22.093868255615234)
Expected: true
i = 5 Tx[i] = 25 Ty[i] = 22.093868255615234
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (-8 not close to -10.99931812286377)
Expected: true
i = 6 Tx[i] = -8 Ty[i] = -10.99931812286377
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (0 not close to 1.9046437740325928)
Expected: true
i = 7 Tx[i] = 0 Ty[i] = 1.9046437740325928
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (12 not close to 34.1883544921875)
Expected: true
i = 9 Tx[i] = 12 Ty[i] = 34.1883544921875
tensorflow/core/framework/tensor_testutil.cc(184): error: Value of: IsClose(Tx[i], Ty[i], typed_atol, typed_rtol)
  Actual: false (-15 not close to -23.284269332885742)
Expected: true
i = 10 Tx[i] = -15 Ty[i] = -23.284269332885742
tensorflow/core/framework/tensor_testutil.cc(187): error: Expected: (num_failures) < (max_failures), actual: 10 vs 10

I'm trying to get to this by tomorrow, but it would be great if you could help debug on your side too (if you could reproduce it). Thank you!

@gbaned gbaned removed the ready to pull PR ready for merge process label Feb 15, 2023
@penpornk
Copy link
Member

Hi @gzmkl,

I realized that Github's Windows Bazel presubmit on this PR didn't run any mkl_ tests at all. So it's not surprising that it passed.

Were you able to reproduce the test failure on your system? If it's helpful, the CPU we used has up to AVX2.

Since it doesn't seem like some simple merging issue, but could be actual issues with the PR, I'll leave the debugging to you. If we can merge this PR by this Wednesday, I can try to cherry-pick this into TF 2.12. Otherwise, we can wait for the patch release (v2.12.1) once this PR is merged. Thank you!

@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Feb 21, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 21, 2023
@gzmkl
Copy link
Contributor Author

gzmkl commented Feb 21, 2023

@penpornk Thank you for the inputs. Yes, some INT8 (mkl_quanttest*) does not run on Windows. TF Windows team (Mayank) and Ramesh also gave me the same feedback.

I just made code change by adding back " &&defined(ENABLE_MKL)" guard in related tests and pushed the commit.
The code change has passed internal Windows UT tests.

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This should work.
We should probably remove ENABLE_MKL and guard with Windows-specific term after we are done with cherry-picks. All these quantize ops should have proper tests regularly running.

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Feb 21, 2023
@google-ml-butler google-ml-butler bot added the kokoro:force-run Tests on submitted change label Feb 21, 2023
@google-ml-butler google-ml-butler bot added the ready to pull PR ready for merge process label Feb 21, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 21, 2023
@copybara-service copybara-service bot merged commit 149a989 into tensorflow:master Feb 21, 2023
13 of 14 checks passed
PR Queue automation moved this from Approved by Reviewer to Merged Feb 21, 2023
@gzmkl
Copy link
Contributor Author

gzmkl commented Feb 22, 2023

Exactly. This has been included in our TODO list as we want to enable some of the tests on Windows along with stock TF.

learning-to-play added a commit that referenced this pull request Mar 1, 2023
Merge pull request #59581 from Intel-tensorflow:security_fix_quantiz
nitins17 added a commit that referenced this pull request Mar 2, 2023
r2.12 cherry-pick: [INTEL oneDNN][Bug Fix] Add index-validity check for min-max tensors for quantized oneDNN ops and related tests #59581
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review Pull request awaiting review comp:mkl MKL related issues ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants