Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR] Fix verifier for TF::Conv2DOp and TF::Conv3DOp. #44022

Merged
merged 5 commits into from
Dec 15, 2020
Merged

[MLIR] Fix verifier for TF::Conv2DOp and TF::Conv3DOp. #44022

merged 5 commits into from
Dec 15, 2020

Conversation

pr4tgpt
Copy link
Contributor

@pr4tgpt pr4tgpt commented Oct 14, 2020

This commit addresses a bug where in tf.conv2D and tf.conv3D, incorrect sizes of output shape was not verified. Added changes to verify the output shape and emit error for incorrect output shapes.

Signed-off-by: Prateek Gupta prateek@polymagelabs.com

@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Oct 14, 2020
@google-cla google-cla bot added the cla: yes label Oct 14, 2020
@pr4tgpt pr4tgpt changed the title [MLIR]Add changes in verifier for TF::Conv2DOp and TF::Conv3DOp. [MLIR]Fix verifier for TF::Conv2DOp and TF::Conv3DOp. Oct 14, 2020
@bondhugula
Copy link
Contributor

@smit-hinsu for visibility - as the author of the surrounding code.

@pr4tgpt pr4tgpt changed the title [MLIR]Fix verifier for TF::Conv2DOp and TF::Conv3DOp. [MLIR] Fix verifier for TF::Conv2DOp and TF::Conv3DOp. Oct 14, 2020
@smit-hinsu
Copy link
Contributor

Thanks for the improvements!

Could you make use the new inferReturnTypes trait for shape inference?

See this commit for an example. edeb469

@smit-hinsu smit-hinsu self-requested a review October 14, 2020 18:50
@gbaned gbaned self-assigned this Oct 15, 2020
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 15, 2020

Thanks for the improvements!

Could you make use the new inferReturnTypes trait for shape inference?

See this commit for an example. edeb469

It isn't immediately clear from the other revision as to how inferReturnTypes should be used/incorporated. Could you please expand a bit more on what changes/additions should be made here? Thanks

@smit-hinsu
Copy link
Contributor

Once we declare InferTypeOpInterface interface and implement inferReturnTypes methods for these ops, we don't need to explicitly call shape verification. You should see that shapes are being verified automatically without explicit call to inferReturnTypes. The benefit of this generic approach is that the interface can be shared between buillder, verifier and allows other uses as all the ops follow the same mechanism.

Checkout interfaces documentation at https://mlir.llvm.org/docs/Interfaces/.

Let me know if you need further explanation or specific details.

@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 16, 2020

Once we declare InferTypeOpInterface interface and implement inferReturnTypes methods for these ops, we don't need to explicitly call shape verification. You should see that shapes are being verified automatically without explicit call to inferReturnTypes. The benefit of this generic approach is that the interface can be shared between buillder, verifier and allows other uses as all the ops follow the same mechanism.

Checkout interfaces documentation at https://mlir.llvm.org/docs/Interfaces/.

Let me know if you need further explanation or specific details.

@smit-hinsu , there is an issue in using the InferTypeOpInterface. Previously, when I was using the verifier to check the output type all other parameters like non-negative strides, a correct number of input dimensions/strides etc were checked before only(as I kept my check at the end of the verifier).
But now if I use the inferReturnTypes method, the output type is inferred before doing other sanity checks resulting in a segmentation fault. The output type in the case of tf.conv2D and tf.conv3D can only be inferred if the attributes are correct.
When I run my code using inferReturnTypes with correct attributes, it runs fine. But if there are incorrect attributes they have to be checked first
If I emit errors in those instances, then there will be no use of a verifier. So could you suggest something here?

@smit-hinsu
Copy link
Contributor

We need to emit error in that case and fail the type inference. See the following op inference in HLO dialect for an example,
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/hlo_ops.cc#L1687

@gbaned
Copy link
Contributor

gbaned commented Oct 19, 2020

@pr4tgpt Can you please check @smit-hinsu's comments and keep us posted ? Thanks!

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Oct 19, 2020
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 19, 2020

@pr4tgpt Can you please check @smit-hinsu's comments and keep us posted ? Thanks!

Yes I am working on smit's suggestion. Thanks.

@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 19, 2020

We need to emit error in that case and fail the type inference. See the following op inference in HLO dialect for an example,
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/hlo_ops.cc#L1687

Hi @smit-hinsu I have made changes as requested by you. Please review them. Thanks.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 21, 2020
Copy link
Contributor

@smit-hinsu smit-hinsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in the response. I somehow missed the update mail. Please ping on the review if you don't get an update within 24 hours.

This looks quite good but I have a few suggestions.

tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc Outdated Show resolved Hide resolved
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 24, 2020

Thank you @smit-hinsu for suggestions. Will update the code with relevant changes.

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Oct 27, 2020
@gbaned
Copy link
Contributor

gbaned commented Oct 29, 2020

@pr4tgpt Can you please check @smit-hinsu's comments and keep us posted ? Thanks!

@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 30, 2020

@pr4tgpt Can you please check @smit-hinsu's comments and keep us posted ? Thanks!

I have checked and resolved the comments. Thanks.

@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Oct 30, 2020

Hi @smit-hinsu , I have made changes as requested by you. Please review the changes and feel free to post suggestions/improvements. Thanks!

Copy link
Contributor

@smit-hinsu smit-hinsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work here. We are almost there!

Let me know if any of the comments are not clear and you need further pointer/clarification.

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 30, 2020
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Nov 18, 2020
@smit-hinsu
Copy link
Contributor

Could you run all the tests in these directories and fix the failing ones?

tensorflow/compiler/mlir
tensorflow/compiler/tests
tensorflow/lite/python
tensorflow/lite/testing/model_coverage

@gbaned gbaned removed the ready to pull PR ready for merge process label Nov 25, 2020
@gbaned
Copy link
Contributor

gbaned commented Nov 25, 2020

@pr4tgpt Can you please check @smit-hinsu's comments and keep us posted ? Thanks!

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Nov 27, 2020
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Dec 3, 2020

Could you run all the tests in these directories and fix the failing ones?

tensorflow/compiler/mlir
tensorflow/compiler/tests
tensorflow/lite/python
tensorflow/lite/testing/model_coverage

Yes I am looking into this. Thanks!

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 5, 2020
@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Dec 7, 2020
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Dec 8, 2020

Could you run all the tests in these directories and fix the failing ones?

tensorflow/compiler/mlir
tensorflow/compiler/tests
tensorflow/lite/python
tensorflow/lite/testing/model_coverage

Hi @smit-hinsu I checked the tests you asked.

The tests in tensorflow/compiler/mlir specific to tf_to_gpu and tf_to_kernel pass fail. These tests have no dependency to my patch.
The python tests in the tensorflow/compiler/tests are failing due to reasons I can't determine(mostly ModuleNameError). I find no dependency to my patch here also.
The python tests in the tensorflow/lite/python are failing due to reasons I can't determine(mostly ImportError). I find no dependency to my patch here also.
The tests in tensorflow/lite/testing/model_coverage pass successfully.

Could you see the failing tests once and tell me what could be the possible reason for them to fail? Thanks!

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 10, 2020
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Dec 10, 2020

Could you run all the tests in these directories and fix the failing ones?
tensorflow/compiler/mlir
tensorflow/compiler/tests
tensorflow/lite/python
tensorflow/lite/testing/model_coverage

Hi @smit-hinsu I checked the tests you asked.

The tests in tensorflow/compiler/mlir specific to tf_to_gpu and tf_to_kernel pass fail. These tests have no dependency to my patch.
The python tests in the tensorflow/compiler/tests are failing due to reasons I can't determine(mostly ModuleNameError). I find no dependency to my patch here also.
The python tests in the tensorflow/lite/python are failing due to reasons I can't determine(mostly ImportError). I find no dependency to my patch here also.
The tests in tensorflow/lite/testing/model_coverage pass successfully.

Could you see the failing tests once and tell me what could be the possible reason for them to fail? Thanks!

Hi @smit-hinsu could you help with this? Thanks!

@smit-hinsu
Copy link
Contributor

All these tests depend on your change. tensorflow/compiler/mlir/tensorflow/tests uses tf-opt tool which links everything andd tensorflow/lite/python links the TF dialect so there are dependencies.

There are many tests failing. Following are some of these:

tensorflow/compiler/mlir/tensorflow/tests:tf-ops.mlir.test
tensorflow/compiler/mlir/lite/tests:prepare-tf.mlir.test
tensorflow/compiler/tests:conv3d_mlir_bridge_test_cpu
tensorflow/lite/python:lite_test
tensorflow/lite/testing/model_coverage:model_coverage_test_user_vgg_tflite_builtins_select_tf_ops

Please take a look. You can use 'tensorflow/compiler/mlir/tensorflow/tests:all' to test all tests in the directory.

@bondhugula
Copy link
Contributor

bondhugula commented Dec 10, 2020

These failures are all clearly related to this PR. An example from tensorflow/compiler/mlir/tensorflow/tests/mlir/tensorflow/tests/tf_optimize.mlir.test:

2020-12-10 21:49:06.560035: F ./tensorflow/core/util/tensor_format.h:266] Check failed: format == FORMAT_OIHW_VECT_I (0 vs. 3)
TensorFlow crashed, please file a bug on https://github.com/tensorflow/tensorflow/issues with the trace below.
Stack dump:
0.      Program arguments: /home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt /home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir -tf-optimize 
1.      Program arguments: /home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt /home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir -tf-optimize 
 #0 0x00000000122c096f llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0x122c096f)
 #1 0x00000000122be5ed llvm::sys::RunSignalHandlers() (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0x122be5ed)
 #2 0x00000000122bee6d SignalHandler(int) (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0x122bee6d)
 #3 0x00007fe1f5214dd0 __restore_rt (/lib64/libpthread.so.0+0x12dd0)
 #4 0x00007fe1f4c5f70f raise (/lib64/libc.so.6+0x3770f)
 #5 0x00007fe1f4c49b25 abort (/lib64/libc.so.6+0x21b25)
 #6 0x00000000111d2b10 virtual thunk to tensorflow::internal::LogMessageFatal::~LogMessageFatal() (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-o
 #7 0x000000000fe767ec mlir::LogicalResult mlir::TF::inferConvReturnTypes<mlir::TF::Conv2DOpAdaptor, (void*)0>(mlir::TF::Conv2DOpAdaptor, llvm::SmallVectorImpl<mlir::Type>&, llvm::Optional<mlir::Location>, llvm::ArrayRef<mlir::Attribute>) (.cold.3446) (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0xfe767ec)
 #8 0x000000000fe77224 mlir::TF::Conv2DOp::inferReturnTypes(mlir::MLIRContext*, llvm::Optional<mlir::Location>, mlir::ValueRange, mlir::DictionaryAttr, mlir::RegionRange, llvm::SmallVectorImpl<mlir::Type>&) (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0xfe77224)
 #9 0x0000000011eeb238 mlir::detail::verifyInferredResultTypes(mlir::Operation*) (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0x11eeb238)
#10 0x000000000fc884d9 mlir::Op<mlir::TF::Conv2DOp, mlir::OpTrait::ZeroRegion, mlir::OpTrait::OneResult, mlir::OpTrait::ZeroSuccessor, mlir::OpTrait::NOperands<2u>::Impl, mlir::InferTypeOpInterface::Trait, mlir::MemoryEffectOpInterface::Trait, mlir::TF::LayoutSensitiveInterface::Trait, mlir::DerivedAttributeOpInterface::Trait>::verifyInvariants(mlir::Operation*) (/home/uday/.cache/bazel/_bazel_uday/97da8b8d2f64c0a54677e92074ce1d42/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/tests/tf_optimize.mlir.test.runfiles/org_tensorflow/tensorflow/compiler/mlir/tf-opt+0xfc884d9)


This commit addresses a bug where in tf.conv2D and tf.conv3D, incorrect sizes of output shape was not verified. Added changes to verify the output shape and emit error for incorrect output shapes.

Signed-off-by: Prateek Gupta <prateek@polymagelabs.com>
In this commit, some of the dead code is removed. Also a duplicate function for verifying convolution attributes is also removed.

Signed-off-By: Prateek Gupta<prateek@polymagelabs.com>
Some of the test cases were using wrong output sizes for tf.conv2D. This commit corrects those test cases.

Signed-off-by: Prateek Gupta<prateek@polymagelabs.com>
This commit addresses a bug where incorrect filter dimensions were being read and used. Corrected the bug and the relevant test cases were also checked.

Signed-off-by: Prateek Gupta <prateek@polymagelabs.com>
@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Dec 14, 2020

All these tests depend on your change. tensorflow/compiler/mlir/tensorflow/tests uses tf-opt tool which links everything andd tensorflow/lite/python links the TF dialect so there are dependencies.

There are many tests failing. Following are some of these:

tensorflow/compiler/mlir/tensorflow/tests:tf-ops.mlir.test
tensorflow/compiler/mlir/lite/tests:prepare-tf.mlir.test
tensorflow/compiler/tests:conv3d_mlir_bridge_test_cpu
tensorflow/lite/python:lite_test
tensorflow/lite/testing/model_coverage:model_coverage_test_user_vgg_tflite_builtins_select_tf_ops

Please take a look. You can use 'tensorflow/compiler/mlir/tensorflow/tests:all' to test all tests in the directory.

Hi @smit-hinsu, @bondhugula I have fixed the code. The filter dimensions were fetched incorrectly. The tensorflow/compiler/mlir/tensorflow/tests are now passing.
Looking at other failing tests also. Thanks! Please review.

@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Dec 14, 2020

All these tests depend on your change. tensorflow/compiler/mlir/tensorflow/tests uses tf-opt tool which links everything andd tensorflow/lite/python links the TF dialect so there are dependencies.
There are many tests failing. Following are some of these:
tensorflow/compiler/mlir/tensorflow/tests:tf-ops.mlir.test
tensorflow/compiler/mlir/lite/tests:prepare-tf.mlir.test
tensorflow/compiler/tests:conv3d_mlir_bridge_test_cpu
tensorflow/lite/python:lite_test
tensorflow/lite/testing/model_coverage:model_coverage_test_user_vgg_tflite_builtins_select_tf_ops
Please take a look. You can use 'tensorflow/compiler/mlir/tensorflow/tests:all' to test all tests in the directory.

Hi @smit-hinsu, @bondhugula I have fixed the code. The filter dimensions were fetched incorrectly. The tensorflow/compiler/mlir/tensorflow/tests are now passing.
Looking at other failing tests also. Thanks! Please review.

Tests in mlir/lite/tests are also passing.

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 14, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 14, 2020
@copybara-service copybara-service bot merged commit af9ad9d into tensorflow:master Dec 15, 2020
@smit-hinsu
Copy link
Contributor

It finally got submitted! Thanks Prateek for the contribution and working through all the issues.

@pr4tgpt
Copy link
Contributor Author

pr4tgpt commented Dec 15, 2020

It finally got submitted! Thanks Prateek for the contribution and working through all the issues.

Thank you Smit for your valuable help and suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants