-
Notifications
You must be signed in to change notification settings - Fork 2k
[WebGL] Support packed Conv2DBackpropInput #7339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
qjia7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever algorithm and great work!
pyu10055
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @Linchenn)
tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 55 at r1 (raw file):
//intialize dotProd with a small epsilon seems to reduce GPU accuracy loss. vec4 dotProd = vec4(0.000000000000001);
use an JS constant to interpolate in this value.
tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 90 at r1 (raw file):
dotProd.xy += vec2(dot(dyValue, wValue.xy), dot(dyValue, wValue.zw)) * idyCVal; dySample = getDy(batch, idyR, idyC2, d2);
might be good to move all memory accesses together.
tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 90 at r1 (raw file):
dotProd.xy += vec2(dot(dyValue, wValue.xy), dot(dyValue, wValue.zw)) * idyCVal; dySample = getDy(batch, idyR, idyC2, d2);
can idyC2 could be the same as idyC ? only read again if idyC2 != idyC ?
Linchenn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Ping. After tuning the shader, the inference time for the following op is improved from ~2.8928ms to ~2.6809903999999984ms on a Linux workstation:
const x = tf.ones([1,8,6,256]);
const w = tf.ones([4,4,256,256]);
s = (await tf.profile(()=>tf.conv2dTranspose(x, w, [1,16,12,256], 1, 'valid'))).kernels[0].kernelTimeMsAfter benchmarking, the improvement is mainly from add a condition for reading dySample2, from ~2.8928ms to ~2.711304960000001ms
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @mattsoulanille and @pyu10055)
tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 55 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
use an JS constant to interpolate in this value.
Removed it as we discussed, because this initial value currently is unnecessary, but we could keep this in mind when facing some correctness issues later.
tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 90 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
can idyC2 could be the same as idyC ? only read again if idyC2 != idyC ?
Done.
tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 90 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
might be good to move all memory accesses together.
Done.
pyu10055
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 2 of 1 approvals obtained (waiting on @mattsoulanille)
With this PR:
Conv2DBackpropInput op is accelerated ~2x, partially fixed #5197.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is