Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maxPool kernel for WebGPU. #1717

Merged
merged 38 commits into from May 1, 2019

Conversation

3 participants
@annxingyuan
Copy link
Collaborator

commented Apr 30, 2019

Changes

  • Add maxPool kernel.
  • Add helper for computing dispatch dimensions.
  • Add helper for generating getOutputCoords based on dispatch dimensions (to reduce arithmetic).

To see the logs from the Cloud Build CI, please join either
our discussion
or announcement mailing list.


This change is Reviewable

annxingyuan added some commits Apr 28, 2019

@annxingyuan annxingyuan self-assigned this Apr 30, 2019

@annxingyuan annxingyuan requested review from dsmilkov and nsthorat Apr 30, 2019

@nsthorat
Copy link
Collaborator

left a comment

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @nsthorat)


src/backends/webgpu/src/backend_webgpu.ts, line 23 at r1 (raw file):

import {DataMover, DataType, ENV, KernelBackend, Rank, ShapeMap, Tensor, Tensor3D, Tensor4D, util} from '@tensorflow/tfjs-core';
// How should this be imported?

welp, unfortunately this should probably be part of the public API... for now this is fine but can you send a PR to core to export conv_util?


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 30 at r1 (raw file):

  dispatch: [number, number, number];
  variableNames = ['x'];
  uniforms = 'ivec4 xShape, outShape; ' +

in the future it'd be nice to move these to something non-stringy


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 37 at r1 (raw file):

    this.outputShape = convInfo.outShape;

    const dispatchLayout = [[1], [2], [0, 3]] as [number[], number[], number[]];

I wonder if this should be an object with named keys for readability, e.g.

const dispatchLayout = { tileSize: ... threadGroups: ... };

etc (actually have no idea what they actually are but that might be nice)


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 52 at r1 (raw file):

      ${generateGetOutputCoords(dispatchLayout, this.outputShape.length)}

      void main() {

in the future I think you can optimize this by threads computing their own maxes and then merging the max at the end.


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 72 at r1 (raw file):

            }
  
            for(int wC=0; wC<filterDims.x; wC += dilation.x) {

spacing here:

for (int wC = 0; wC < filterDims.x; wC += dilation.x) {

same above


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 80 at r1 (raw file):

            }
          }
          

can you remove the blanks here and above?


src/backends/webgpu/src/kernels/webgpu_program.ts, line 116 at r1 (raw file):

export function makeShaderKey(program: WebGPUProgram): string {
  const key =
      (program.tileSize ? program.tileSize.join(',') : '') + program.userCode;

just curious why we need tileSize as a shader key?

@kainino0x
Copy link
Collaborator

left a comment

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @nsthorat)


src/backends/webgpu/src/shader_util.ts, line 29 at r1 (raw file):

  const numCoords = indicesArr.length;
  const dims = ['x', 'y', 'z', 'w'];
  const shape = indicesArr.map(d => `${variableName}.${dims[d]}`);

Not necessary to use .x, .y, .z, .w; you can use [0], [1], [2], [3].


src/backends/webgpu/src/shader_util.ts, line 54 at r1 (raw file):

    if (arr.length === 1) {
      gatherDimensionsStr +=
          `uint d${arr[0]} = gl_GlobalInvocationID.${globalDims[i]};`;

Same here


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 30 at r1 (raw file):

  dispatch: [number, number, number];
  variableNames = ['x'];
  uniforms = 'ivec4 xShape, outShape; ' +

The input is uint32, so these should be uvec


src/backends/webgpu/src/kernels/webgpu_program.ts, line 116 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

just curious why we need tileSize as a shader key?

The tilesize is used in the generated shader code. But several other things are also used. Could we just use the entire generated shader source as shader key? Should be very nearly as fast.

@kainino0x
Copy link
Collaborator

left a comment

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @dsmilkov, and @nsthorat)


src/backends/webgpu/src/kernels/matmul_packed_webgpu.ts, line 29 at r1 (raw file):

  variableNames = ['A', 'B'];
  uniforms = 'uint dimAOuter, dimInner, dimBOuter, batch;';
  tileSize: [number, number, number] = [32, 32, 1];

Based on my debugging today, we can't have tile sizes > 256, even though MatMulPacked is more stable than MatMul with 32x32. Can you turn this down to 16x16? (Can be in this PR or another one, I just didn't open one because it would conflict with this one)

annxingyuan added some commits May 1, 2019

@annxingyuan
Copy link
Collaborator Author

left a comment

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @dsmilkov, @kainino0x, and @nsthorat)


src/backends/webgpu/src/backend_webgpu.ts, line 23 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

welp, unfortunately this should probably be part of the public API... for now this is fine but can you send a PR to core to export conv_util?

Done #1724


src/backends/webgpu/src/shader_util.ts, line 29 at r1 (raw file):

Previously, kainino0x (Kai Ninomiya) wrote…

Not necessary to use .x, .y, .z, .w; you can use [0], [1], [2], [3].

Done


src/backends/webgpu/src/shader_util.ts, line 54 at r1 (raw file):

Previously, kainino0x (Kai Ninomiya) wrote…

Same here

Done


src/backends/webgpu/src/kernels/matmul_packed_webgpu.ts, line 29 at r1 (raw file):

Previously, kainino0x (Kai Ninomiya) wrote…

Based on my debugging today, we can't have tile sizes > 256, even though MatMulPacked is more stable than MatMul with 32x32. Can you turn this down to 16x16? (Can be in this PR or another one, I just didn't open one because it would conflict with this one)

Done


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 30 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

in the future it'd be nice to move these to something non-stringy

Agreed :)


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 30 at r1 (raw file):

Previously, kainino0x (Kai Ninomiya) wrote…

The input is uint32, so these should be uvec

Done

Added an Int32Array type to makeUniforms - I think padding can be negative.


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 37 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

I wonder if this should be an object with named keys for readability, e.g.

const dispatchLayout = { tileSize: ... threadGroups: ... };

etc (actually have no idea what they actually are but that might be nice)

Done


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 52 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

in the future I think you can optimize this by threads computing their own maxes and then merging the max at the end.

Yes - I've left a comment in the file.


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 72 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

spacing here:

for (int wC = 0; wC < filterDims.x; wC += dilation.x) {

same above

Done


src/backends/webgpu/src/kernels/maxpool_webgpu.ts, line 80 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you remove the blanks here and above?

Done


src/backends/webgpu/src/kernels/webgpu_program.ts, line 116 at r1 (raw file):

Previously, kainino0x (Kai Ninomiya) wrote…

The tilesize is used in the generated shader code. But several other things are also used. Could we just use the entire generated shader source as shader key? Should be very nearly as fast.

Yannick is currently profiling the cost of keying on the entire shader source on the WebGL side - maybe we can hold off on making a call here until he has results.

@annxingyuan

This comment has been minimized.

Copy link
Collaborator Author

commented May 1, 2019

@kainino0x Just curious - is there any way to avoid splicing in the tile size? Could we upload it somehow? Trying to figure out ways to minimize shader recompilation.

annxingyuan added some commits May 1, 2019

@annxingyuan annxingyuan merged commit 91aa260 into master May 1, 2019

2 checks passed

Trigger: ed916b23-8e01-4fda-85af-b7c8627d15a1 Summary
Details
cla/google All necessary CLAs are signed

@annxingyuan annxingyuan added this to Done in TensorFlow.js via automation May 1, 2019

@annxingyuan annxingyuan deleted the webgpu_maxpool branch May 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.