Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Feb 15, 2023

Fixed #7372

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@qjia7 qjia7 requested review from Linchenn and pyu10055 February 15, 2023 08:23
@qjia7
Copy link
Contributor Author

qjia7 commented Feb 15, 2023

Based on below profiling result. Many time are spent on PackProgram and UnpackProgram. The reason is that we didn't provide a packed Sin op. So the sequence will be like UnpackProgram -> Sin -> PackProgram -> MatMulPackedProgram, which brings many extra overhead. With this change, the time of the user provided model will be reduced by more than a third (712 ms ->449 ms) in my machine.

Kernel Time(ms) Inputs Output GPUPrograms
_FusedMatMul 115.86 input0: 3D[1,262144,256]input1: 3D[1,256,256]input2: 1D[256]input3: null 1,262144,256 PackProgram: 34.802812, MatMulPackedProgram: 81.055208
_FusedMatMul 112.14 input0: 3D[1,262144,256]input1: 3D[1,256,256]input2: 1D[256]input3: null 1,262144,256 PackProgram: 33.896666, MatMulPackedProgram: 78.238645
_FusedMatMul 111.30 input0: 3D[1,262144,256]input1: 3D[1,256,256]input2: 1D[256]input3: null 1,262144,256 PackProgram: 34.840781, MatMulPackedProgram: 76.462864
_FusedMatMul 65.40 input0: 3D[1,262144,256]input1: 3D[1,256,3]input2: 1D[3]input3: null 1,262144,3 PackProgram: 34.896979, MatMulPackedProgram: 30.505208
Sin 57.77 input0: 2D[262144,256] 262144,256 UnpackProgram: 29.14401, UnaryOpProgram: 28.624062
Sin 56.38 input0: 2D[262144,256] 262144,256 UnpackProgram: 28.064687, UnaryOpProgram: 28.311875
Sin 55.10 input0: 2D[262144,256] 262144,256 UnpackProgram: 27.104322, UnaryOpProgram: 27.99526
Sin 54.84 input0: 2D[262144,256] 262144,256 UnpackProgram: 28.026562, UnaryOpProgram: 26.813541

@qjia7 qjia7 marked this pull request as ready for review February 15, 2023 08:27
Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @Linchenn)

Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing it! LGTM

@mattsoulanille mattsoulanille merged commit 717d081 into tensorflow:master Feb 16, 2023
@qjia7 qjia7 deleted the sin_cos_pack branch February 16, 2023 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting ENGINE_COMPILE_ONLY to true is resulting in output tensor of 0s

4 participants