Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depthwise packed perf: reduce texel read for dilation of 2 #4954

Merged
merged 11 commits into from Apr 23, 2021

Conversation

pyu10055
Copy link
Collaborator

@pyu10055 pyu10055 commented Apr 19, 2021

Use flag to record if the texel has be read and ready to be reused. This would minimize unnecessary reading for all conditions.
Aligned the texel naming with the index.

reuse the same vertex shader for all GPGPU programs.

Verified the result of this change with the ssd colab.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@google-cla google-cla bot added the cla: yes label Apr 19, 2021
@pyu10055 pyu10055 requested a review from lina128 April 22, 2021 05:01
@pyu10055 pyu10055 changed the title [DRAFT] depthwise perf: reduce texel read for dilation of 2 depthwise packed perf: reduce texel read for dilation of 2 Apr 22, 2021
Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Curious, how much overhead is saved by reusing the same vertex shader?

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @pyu10055)


tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 78 at r1 (raw file):

      `;

      for (let texelC = 0; texelC < (texelsAcross + 1) / 2; texelC++) {

Just want to confirm this change changes the loop rounds, for example, if texelsAcross = 4, before was texelC < 3, now is texelC < 2.

Copy link
Collaborator Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there seems to be very minimal saving on the first inference.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @lina128)


tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 78 at r1 (raw file):

Previously, lina128 (Na Li) wrote…

Just want to confirm this change changes the loop rounds, for example, if texelsAcross = 4, before was texelC < 3, now is texelC < 2.

yes, used to generate extra loops that is not doing anything. Since in each loop it generates two values.

@pyu10055 pyu10055 merged commit 30a3e3b into master Apr 23, 2021
@pyu10055 pyu10055 deleted the depthwise_packed branch April 23, 2021 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants