New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webgl] use shader loop instead of expanding with js to reduce shader size for packed Depthwise #5714
Conversation
This is amazing! Didn't expect the loop unrolling will bring so such overhead for compiling time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, the line for (let c = 0; c < filterWidth; c++)
is just to initiate the variables for the loop, it cannot be unrolled.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ping, will this affect inference performance? I see that in the unpacked version, Daniel has a TODO to flatten the for loop, I don't know whether it's because of performance. https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/conv_gpu_depthwise.ts#L96
If inference performance doesn't change significantly, then the change looks good. Thank you!
Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance is the same for mobilenet and efficientdet, maybe a slightly better for some reason. My understanding of the comment in unpacked shader is aiming to use vec4 to group summation, which is already there for packed version.
Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, great. Thank you! LGTM.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @jinjingforever)
This is a simple change to move the outermost loop of filter height from js code expansion to shader loop, which will greatly reduce the shader size for depthwise (factor of 3-5x depends on the filter height).
Tested on mobilenet and effientdet models, it will improve the model initialization time close to the same as unpack version.
fix #5343
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is