New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF-TRT Improve matrix multiplication conversion and enable dynamic shape mode #47215
TF-TRT Improve matrix multiplication conversion and enable dynamic shape mode #47215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work!
I finished reviewing the code, but not the test yet. Sending out the comments I have now.
f9a52b0
to
f7fb167
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bixia1 for the review, I have addressed the issues!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember to rebase and squash.
f7fb167
to
2fb6a21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bixia1 for the additional comments, I have addressed the issues!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! Just one remaining thing: there is one place where you said you put a more detail description but I couldn't find it by searching the string in the code.
2fb6a21
to
5136ce1
Compare
I have fixed the missing comment. |
This PR improves the MatMul and BatchMatMul converters.
IMatrixMultiplyLayer
is not necessary, becauseIMatrixMultiplyLayer
can directly pass the transpose flags to the underlaying GEMM call, which can use it to access elements with the correct stride without any actual transposition.IFullyConnectedLayer
(FC) usage fixed:IMatrixMultiply
because it is expected to give better performance. Moreover, currently only FC layer supprorts INT8 precision.BatchMatMul
to FC: broadcast now preserves the information whether the input is tensor or weight, so that we can correctly check FC compatibility condition.BatchMatMul
involves a potential broadcast step. TRT requires that the input tensors have the same rank, with 1 values filled in the dimensions which need to be broadcasted. A helper functionBroadcastTensors
was added to make the tensors match in rank. In dynamic shape mode we need shape inference for this step. TheDynamicReshape
function was modified to allow insertion of multiple singleton dimensions.Tagging @bixia1 for review and @DEKHTIARJonathan for visibility.
Tracker: #45481