Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF-TRT Improve matrix multiplication conversion and enable dynamic shape mode #47215

Merged

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Feb 17, 2021

This PR improves the MatMul and BatchMatMul converters.

  • Unnecessary transpose of weights are removed. Transposing weights for IMatrixMultiplyLayer is not necessary, because IMatrixMultiplyLayer can directly pass the transpose flags to the underlaying GEMM call, which can use it to access elements with the correct stride without any actual transposition.
  • Restrictions caused by previous weight transpose ops eliminated.
  • Enabled explicit batch and dynamic shape input.
  • IFullyConnectedLayer (FC) usage fixed:
    • FC layer is preferred over IMatrixMultiply because it is expected to give better performance. Moreover, currently only FC layer supprorts INT8 precision.
    • Fixed and relaxed FC layer conversion condition.
    • Fixed input tensor_a shaped handling. In dynamic shape mode care has to be taken to retain static dim where available and not to confuse unknown dims with -1 wildcard.
    • Enabled rank > 2 for weights.
    • Fixed conversion of BatchMatMul to FC: broadcast now preserves the information whether the input is tensor or weight, so that we can correctly check FC compatibility condition.

BatchMatMul involves a potential broadcast step. TRT requires that the input tensors have the same rank, with 1 values filled in the dimensions which need to be broadcasted. A helper function BroadcastTensors was added to make the tensors match in rank. In dynamic shape mode we need shape inference for this step. The DynamicReshape function was modified to allow insertion of multiple singleton dimensions.

Tagging @bixia1 for review and @DEKHTIARJonathan for visibility.
Tracker: #45481

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label Feb 17, 2021
@google-cla google-cla bot added the cla: yes label Feb 17, 2021
@gbaned gbaned self-assigned this Feb 17, 2021
@gbaned gbaned added the comp:gpu:tensorrt Issues specific to TensorRT label Feb 17, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Feb 17, 2021
@gbaned gbaned requested a review from cheshire February 17, 2021 15:10
Copy link
Contributor

@bixia1 bixia1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!
I finished reviewing the code, but not the test yet. Sending out the comments I have now.

tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bixia1 for the review, I have addressed the issues!

tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@bixia1 bixia1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to rebase and squash.

tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.h Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bixia1 for the additional comments, I have addressed the issues!

tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@bixia1 bixia1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! Just one remaining thing: there is one place where you said you put a more detail description but I couldn't find it by searching the string in the code.

tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc Outdated Show resolved Hide resolved
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Feb 24, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 24, 2021
@tfeher
Copy link
Contributor Author

tfeher commented Feb 24, 2021

I have fixed the missing comment.

@kokoro-team kokoro-team removed kokoro:force-run Tests on submitted change labels Feb 24, 2021
@copybara-service copybara-service bot merged commit 22601eb into tensorflow:master Feb 26, 2021
PR Queue automation moved this from Approved by Reviewer to Merged Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu:tensorrt Issues specific to TensorRT ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants