-
Notifications
You must be signed in to change notification settings - Fork 2k
webgpu: support BroadcastArgs kernel #7247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gyagp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit
| if (index < uniforms.size) { | ||
| var s0 = 1.0; | ||
| var s1 = 1.0; | ||
| let indexS0 = index - uniforms.size + uniforms.s0Length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, maybe
| let indexS0 = index - uniforms.size + uniforms.s0Length; | |
| let indexS0 = index - uniforms.size + uniforms.s0Size; |
| const program = new BroadcastArgsProgram(outputSize); | ||
| const uniformData = | ||
| [{type: 'int32', data: [s0Size]}, {type: 'int32', data: [s1Size]}]; | ||
| return backend.runWebGPUProgram(program, [s0, s1], 'int32', uniformData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a cpu path if s0 and s1 is on cpu since the inputs are really small?
qjia7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit.
| const {s0, s1} = inputs; | ||
|
|
||
| if (backend.shouldExecuteOnCPU([s0, s1])) { | ||
| const s0BufferInfo = backend.tensorMap.get(s0.dataId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s0BufferInfo -> s0TensorInfo
|
@gyagp Could you review it again, because I added cpu path after you approved. |
| const {s0, s1} = inputs; | ||
|
|
||
| if (backend.shouldExecuteOnCPU([s0, s1])) { | ||
| const s0TensorInfo = backend.tensorMap.get(s0.dataId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we reuse the CPU impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- cpu backend does not export the CPU impl for other backends.
- cpu backend also calls the tfjs-core common function to implement the feature, it is better for other backends to call the tfjs-core common function directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, a similar interface (xxxImplCPU) to be used as CPU fallback is a good approach. However, this may need some changes at CPU backend, so let's refine this in a future PR. LGTM with this one.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is