webgpu: add a naive implementation of conv3d #7016

xhcao · 2022-11-07T07:16:07Z

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

xhcao · 2023-01-09T03:48:42Z

Hi, @qjia7 @gyagp Some other kernels also depend on conv3d kernel, so I firstly added a naive conv3d implementation here, in order to quickly implement the other kernels. I will to implement a shared conv3d version in the future.

gyagp

LGTM with some nits. Overall, we should use more WGSL syntax sugar, like ++ and +=.

gyagp · 2023-01-09T09:31:13Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+        let inputDepthVec4Remainder = uniforms.xShape.u % 4;
+
+        var dotProd = 0.0;
+        for (var wF = 0; wF < uniforms.filterDims[0]; wF = wF + 1) {


Suggested change

for (var wF = 0; wF < uniforms.filterDims[0]; wF = wF + 1) {

for (var wF = 0; wF < uniforms.filterDims[0]; wF++) {

gyagp · 2023-01-09T09:31:24Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+            continue;
+          }
+
+          for (var wR = 0; wR < uniforms.filterDims[1]; wR = wR + 1) {


gyagp · 2023-01-09T09:31:32Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+              continue;
+            }
+
+            for (var wC = 0; wC < uniforms.filterDims[2]; wC = wC + 1) {


gyagp · 2023-01-09T09:31:46Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+                continue;
+              }
+
+              for (var d1 = 0; d1 < inputDepthNearestVec4; d1 = d1 + 4) {


gyagp · 2023-01-09T09:34:15Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+                  getW(wF, wR, wC, d1 + 3, d2)
+                );
+
+                dotProd = dotProd + dot(xValues, wValues);


gyagp · 2023-01-09T09:34:47Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+                  getW(wF, wR, wC, inputDepthNearestVec4, d2),
+                  getW(wF, wR, wC, inputDepthNearestVec4 + 1, d2)
+                );
+                dotProd = dotProd + dot(xValues, wValues);


gyagp · 2023-01-09T09:35:50Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+                  getW(wF, wR, wC, inputDepthNearestVec4 + 1, d2),
+                  getW(wF, wR, wC, inputDepthNearestVec4 + 2, d2)
+                );
+                dotProd = dotProd + dot(xValues, wValues);


gyagp · 2023-01-09T09:47:46Z

tfjs-backend-webgpu/src/shader_util.ts


  const numCoords = indicesArr.length;
-  const shape = indicesArr.map(d => `${variableName}[${d}]`);
+  const indicesStr = ['.x', '.y', '.z', '.w', '.u', '.v'];


Why is the sequence wuv instead of uvw?
You may just define indicesStr as 'xyzuvw', then use it as ${variableName}.${indicesStr[d]}.

Hi, @gyagp, xyzw is used for single component selection by WGSL https://gpuweb.github.io/gpuweb/wgsl/#:~:text=7.7.1.1.%20Vector%20Single%20Component%20Selection
TFJS webgpu backend extends the vec4 to vec5 and vec6

tfjs/tfjs-backend-webgpu/src/webgpu_program.ts

Line 313 in ea8d7db

struct vec5 {x: i32, y: i32, z: i32, w: i32, u: i32};

, and uses uv to select the fifth and sixth components.

All other comments are addressed, thank you.

Thanks for the info!

qjia7 · 2023-01-11T07:04:19Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

@@ -0,0 +1,129 @@
+/**
+ * @license
+ * Copyright 2022 Google LLC.


nit: 2022 -> 2023 and similar for other places

qjia7 · 2023-01-11T07:16:42Z

tfjs-backend-webgpu/src/conv3d_naive_webgpu.ts

+  variableNames = ['x', 'W'];
+  uniforms =
+      'filterDims: vec3<i32>, pad: vec3<i32>, strides: vec3<i32>, dilations: vec3<i32>,';
+  workgroupSize: [number, number, number] = [4, 4, 8];


Can we use the flat dispatch layout here? Previous experience shows that flat dispatch layout always has good performance if the algorithm is irrelevant with dispatch layout. You can have a micro-benchmark to verify it.

xhcao force-pushed the conv3d branch 2 times, most recently from 3f7772d to d69431f Compare November 8, 2022 01:04

xhcao force-pushed the conv3d branch from d69431f to 072ee37 Compare January 9, 2023 02:50

xhcao requested review from gyagp and qjia7 January 9, 2023 03:48

gyagp approved these changes Jan 10, 2023

View reviewed changes

xhcao force-pushed the conv3d branch from 072ee37 to 18e2e02 Compare January 11, 2023 05:13

qjia7 reviewed Jan 11, 2023

View reviewed changes

qjia7 approved these changes Jan 11, 2023

View reviewed changes

xhcao added 3 commits January 12, 2023 10:44

webgpu: add a naive implementation of conv3d

2cba4db

Address Yang's comments

7ca1623

Address Jiajia' comments

604fac1

xhcao force-pushed the conv3d branch from 2d48f0d to 604fac1 Compare January 12, 2023 02:51

xhcao merged commit 94c0c2b into tensorflow:master Jan 12, 2023

	for (var wF = 0; wF < uniforms.filterDims[0]; wF = wF + 1) {
	for (var wF = 0; wF < uniforms.filterDims[0]; wF++) {

webgpu: add a naive implementation of conv3d #7016

webgpu: add a naive implementation of conv3d #7016

Uh oh!

Conversation

xhcao commented Nov 7, 2022 • edited by dsmilkov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xhcao commented Jan 9, 2023

Uh oh!

gyagp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xhcao commented Nov 7, 2022 •

edited by dsmilkov

Loading