Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995

jinjingforever · 2020-09-30T22:09:12Z

This change is

jinjingforever · 2020-09-30T22:47:43Z

Hi Ann,

Here is a draft PR where I attempt to fix the problem for one simple op "floor" as an example. This op has the exact problem you described because "floor" has been modularized in cpu (so this.cpuBackend.floor call would fail).

Please take a look and see if this PR is heading towards the right direction:) The basic idea is: for each cpu forwarding call from webgl (this.cpuBackend.xxx()), we need to check if the corresponding op has been modularized or not. If so (an error would be thrown), we need to call the shared cpu implementation instead.

Once you think this fix is good, I will continue working on this PR to apply the same logic to all the other places.

Thank you very much!

annxingyuan · 2020-10-01T11:13:03Z

Thank you @jinjingforever ! This is pretty much exactly what I was thinking :) The only feedback I have is - I think elsewhere in our codebase we have used typeof func === 'function' to check whether functions are defined, rather than using try... catch. Also, since we're manually modularizing each kernel, maybe we could just call the modularized CPU impl function where necessary, rather than going through a generic check for each kernel.

Would also be awesome if you could add a few e2e tests that explicitly test CPU forwarding!

jinjingforever

Thank you Ann for the feedback! Really appreciate it. I will continue working on this and will let you know when the fix is ready.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

tafsiri · 2020-10-07T14:59:02Z

+1 to just calling the modularized cpu impl where available. However in general we should wrap the other (non modularized) cpu forwarding calls in a try-catch block before falling back to webgl implementations. Using typeof func === 'function' may not work because there would usually be a function definition inherited from the base class (that throws an exception)

jinjingforever · 2020-10-07T19:24:17Z

Some notes about this PR:

I had to pull some changes (mainly the changes in kernel_utils.ts and its callers) from Ann's PR ([webgl] Modularize cast, complex, concat, add, sub, multiply, notEqual, slice, fft, ifft. #3956) to make things work with minimum conflicts.
This PR also touches some webgl kernels that will be modularized (and fixed) in Ann's PR (e.g. multiply, concat). Depending which PR is merged first, one of us could just safely delete the code in backend_webgl.ts to resolve conflicts.
Last week, I added a bunch of e2e tests covering all kernels that directly do cpu forwarding. As you guys said, it is probably not necessary, but I will just leave them there for now:)
I tested locally with cpu forwarding turned-on (thanks to Yannick's draft PR) and things seem to be passing.

annxingyuan

Thank you Jing! Do we want to also make the changes that we discussed from Yannick's draft PR? (#4031)

I think the changes are:

Change existing environments in backend_webgl_test_registry.ts to not explicitly set WEBGL_CPU_FORWARD
Add a third test environment to backend_webgl_test_registry.tsthat explicitly sets WEBGL_CPU_FORWARD to false
Add another run-browserstack invocation in tfjs-backend-webgl/scripts/test-ci.sh nightly to explicitly enable WEBGL_CPU_FORWARD
Disable IS_TEST check in the backend_webgl.ts shouldExecuteOnCPU function

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, @pyu10055, and @tafsiri)

tafsiri

Thanks for making this PR. Just to confirm, this doesn't cover all the uses of CPU forwarding in the webgl backend right? (for example I was expecting to see return this.cpuBackend.greater(a, b); wrapped in a try catch).

Also could you merge the branch from my PR into this branch so we can see the output on CI and iterate on the config we want for testing locally and CI right in the context of this PR?

With regard to try-catch blocks I think we have a number of options:

The commits on my branch should catch unwrapped calls into the cpu backend (e.g. return this.cpuBackend.greater(a, b);) so in theory we do not need to wrap them. In fact leaving them unwrapped forces us to do the new style CPU forwarding in the same PR as modularizing a CPU kernel.
Alternatively we can wrap them so that even if they do fail webgl falls back onto a webgl implementation, this would cause a performance regression in those cases but will always allow the webgl backend to keep executing.
We can combine the two ideas above by re-throwing the exception in the catch block in tests (but in prod allowing fallback to webgl). The advantage of this is that even if something slips through our tests an op wouldn't fail on webgl. Here is a code snippet illustrating that.

backend_webgl greater()

greater(a: Tensor, b: Tensor): Tensor {
    if (this.shouldExecuteOnCPU([a, b])) {
      try {
        return this.cpuBackend.greater(a, b);
      } catch() {
         if env().getBool('IS_TEST') {
            throw new Exception('CPU forwarding failed')
         }
         // else we just fall back to webgl implementation.
      }
    }

    if (env().getBool('WEBGL_PACK_BINARY_OPERATIONS')) {
      return this.packedBinaryOp(a, b, binaryop_packed_gpu.GREATER, 'bool');
    }

    const program = new BinaryOpProgram(binaryop_gpu.GREATER, a.shape, b.shape);
    return this.compileAndRun(program, [a, b], 'bool');
  }

Thoughts? I lean towards the third option.

Reviewed 4 of 8 files at r1, 18 of 18 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @jinjingforever, @lina128, @pyu10055, and @tafsiri)

tfjs-backend-cpu/src/kernels/Ceil.ts, line 24 at r2 (raw file):

export const ceilImpl = createSimpleUnaryImpl((xi) => Math.ceil(xi));
export const ceilKernelFunc = unaryKernelFuncFromImpl(Ceil, ceilImpl);

we should stick to one convention for naming these exported kernel funcs. In existing code they do not have the kernelFunc suffix. I think we should stick to that so that on importing these in other cpu kernels we do have the kernelFunc suffix present (or alternatively we could have the suffix on all of them).

tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):

    if (this.shouldExecuteOnCPU([x])) {
      try {
        return this.cpuBackend.slice(x, begin, size);

We shouldn't need to have this and the code in the catch block below. We can just use the code in the catch block.

For kernels that aren't modularized in cpu we should try catch.

tfjs-backend-webgl/src/backend_webgl.ts, line 820 at r2 (raw file):

    // PR #3956 will modularize concat without this.
    // if (this.shouldExecuteOnCPU(tensors)) {
    //   return this.cpuBackend.concat(tensors, axis);

I think there should be a try catch block here. If it fails we skip the early return and let webgl proceed as if there was no cpu forwarding.

tfjs-backend-webgl/src/backend_webgl.ts, line 950 at r2 (raw file):

    if (this.shouldExecuteOnCPU([a, b])) {
      try {
        return this.cpuBackend.multiply(a, b);

same as above, if we can import multipleImplCPU we should not even try calling cpuBackend.multiply.

tfjs-backend-webgl/src/backend_webgl.ts, line 1732 at r2 (raw file):

        return this.cpuBackend.rsqrt(x);
      } catch (e) {
        const outValues = rsqrtImplCPU(

should a utility function be made for these? there seem to be a lot of similar calls.

pyu10055

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @pyu10055)

tfjs-backend-webgl/src/backend_webgl.ts, line 1658 at r2 (raw file):

  expm1<T extends Tensor>(x: T): T {
    if (this.shouldExecuteOnCPU([x])) {

can we create a method something like
return cpuExecuteOrFallback(executeFunc, fallbackFunc)?
the executeFunc will be
() => this.cpuBackend.expm1(x)
the fallbackFunc will be
() => {
const outValues = expm1ImplCPU(
this.texData.get(x.dataId).values as TypedArray, x.dtype);
const outInfo = this.makeTensorInfo(x.shape, x.dtype);
this.texData.get(outInfo.dataId).values = outValues;
return engine().makeTensorFromDataId(
outInfo.dataId, outInfo.shape, outInfo.dtype) as T;
}

lina128

Regarding test env in test registry, why can't we keep it. And then just add one line to test-ci.sh, yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": true}'

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @tafsiri)

tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

We shouldn't need to have this and the code in the catch block below. We can just use the code in the catch block.

For kernels that aren't modularized in cpu we should try catch.

+1

tafsiri · 2020-10-07T21:14:55Z

Chatted with @jinjingforever offline. I think we want to pick between the registerTestEnv approach and the explicit settings of flags approach. If we do the following (as Ann suggests)

registerTestEnv({
  name: 'webgl1',
  backendName: 'webgl',
  flags: {
    'WEBGL_VERSION': 1,
    'WEBGL_CPU_FORWARD': false,
    'WEBGL_SIZE_UPLOAD_UNIFORM': 0
  },
  isDataSync: true
});

registerTestEnv({
  name: 'webgl2',
  backendName: 'webgl',
  flags: {
    'WEBGL_VERSION': 2,
    'WEBGL_CPU_FORWARD': false,
    'WEBGL_SIZE_UPLOAD_UNIFORM': 0
  },
  isDataSync: true
});

registerTestEnv({
  name: 'webgl2',
  backendName: 'webgl',
  flags: {
    'WEBGL_VERSION': 2,
    'WEBGL_CPU_FORWARD': true,
    'WEBGL_SIZE_UPLOAD_UNIFORM': 0
  },
  isDataSync: true
});

Then we do not need to add a line with an explicit setting of 'WEBGL_CPU_FORWARD': true, this is because when no flags are passed the tests will run in all the test_envs listed above. I'm fine with that as its super safe and give immediate feedback (at the cost of longer test runs—but the functionality being tested seems pretty key).

However I do think there may be places where we have been running the same config twice, looking at our current test-ci for webgl

if [ "$NIGHTLY" = true ]; then
  yarn run-browserstack --browsers=bs_safari_mac,bs_ios_11 --testEnv webgl1 --flags '{"WEBGL_CPU_FORWARD": false, "WEBGL_SIZE_UPLOAD_UNIFORM": 0}'
  yarn run-browserstack --browsers=bs_firefox_mac,bs_chrome_mac
  yarn run-browserstack --browsers=bs_chrome_mac,win_10_chrome,bs_android_9 --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": false, "WEBGL_SIZE_UPLOAD_UNIFORM": 0}'
  
  yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_PACK": false}'
else
  yarn run-browserstack --browsers=bs_chrome_mac
fi

To my eyes, in the nightly build the bs_chrome_mac on the second run and bs_chrome_mac on the third line run under the same conditions. I'm not 100% sure of that so we don't have to resolve it in this PR. But overall being able to easily trace where to set things to get the conditions needed is useful, which is why i was initially in favor of primarily using one of these mechanisms. But I've come around to the approach described above.

pyu10055

registerTestEnv({
name: 'webgl2',
backendName: 'webgl',
flags: {
'WEBGL_VERSION': 2,
'WEBGL_CPU_FORWARD': false,
'WEBGL_SIZE_UPLOAD_UNIFORM': 0
},
isDataSync: true
});

registerTestEnv({
name: 'webgl2',
backendName: 'webgl',
flags: {
'WEBGL_VERSION': 2,
'WEBGL_CPU_FORWARD': true,
'WEBGL_SIZE_UPLOAD_UNIFORM': 0
},
isDataSync: true
});

There two webgl2 registered with different WEBGL_CPU_FORWARD flag values?

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever and @tafsiri)

annxingyuan

Actually let's do as Na suggested and leave the backend_webgl test registry alone, but add another call to run-browserstack in the nightly test-ci for WebGL that explicitly turns CPU forwarding on.

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever and @tafsiri)

jinjingforever

Thanks for all the comments! PTAL

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)

tfjs-backend-cpu/src/kernels/Ceil.ts, line 24 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

we should stick to one convention for naming these exported kernel funcs. In existing code they do not have the kernelFunc suffix. I think we should stick to that so that on importing these in other cpu kernels we do have the kernelFunc suffix present (or alternatively we could have the suffix on all of them).

Good point. I will go through them in another PR to make them consistent.

tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):

Previously, lina128 (Na Li) wrote…

+1

Ah, of course. Good point. Updated (here and other places)

tfjs-backend-webgl/src/backend_webgl.ts, line 820 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

I think there should be a try catch block here. If it fails we skip the early return and let webgl proceed as if there was no cpu forwarding.

For now, I just removed the cpu-forwarding logic, since Ann's PR doesn't have this after concat is modularized. Will sync with her about this offline.

tfjs-backend-webgl/src/backend_webgl.ts, line 950 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

same as above, if we can import multipleImplCPU we should not even try calling cpuBackend.multiply.

Done.

tfjs-backend-webgl/src/backend_webgl.ts, line 1658 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can we create a method something like
return cpuExecuteOrFallback(executeFunc, fallbackFunc)?
the executeFunc will be
() => this.cpuBackend.expm1(x)
the fallbackFunc will be
() => {
const outValues = expm1ImplCPU(
this.texData.get(x.dataId).values as TypedArray, x.dtype);
const outInfo = this.makeTensorInfo(x.shape, x.dtype);
this.texData.get(outInfo.dataId).values = outValues;
return engine().makeTensorFromDataId(
outInfo.dataId, outInfo.shape, outInfo.dtype) as T;
}

I added/refactored some helper methods combining various comments above.

tfjs-backend-webgl/src/backend_webgl.ts, line 1732 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

should a utility function be made for these? there seem to be a lot of similar calls.

Done. (I didn't make it super generic due to some edge cases).

lina128

Thank you Jing for the fast iteration! Is test condition added to test-ci?

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)

lina128

LGTM!

annxingyuan

Thanks so much for the quick fix Jing! Just left a small suggestion.

Reviewable status: complete! 2 of 1 approvals obtained (waiting on @jinjingforever, @pyu10055, and @tafsiri)

tfjs-backend-webgl/src/backend_webgl.ts, line 2420 at r3 (raw file):

  private makeOutput<T extends Tensor>(
      shape: number[], dtype: DataType, values?: BackendValues): T {
    const {dataId} = this.makeTensorInfo(shape, dtype);

if you pass in values to makeTensorInfo you shoudn't need the if block below

jinjingforever

Reviewable status: complete! 2 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-backend-webgl/src/backend_webgl.ts, line 2420 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

if you pass in values to makeTensorInfo you shoudn't need the if block below

Nice! Done. Thank you!

tafsiri

Great! (left one suggestion about testing cpu forwarding on PRs as well as nightly)

Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 3 of 1 approvals obtained (waiting on @annxingyuan, @jinjingforever, and @pyu10055)

tfjs-backend-webgl/scripts/test-ci.sh, line 26 at r4 (raw file):

  yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": true}'
else
  yarn run-browserstack --browsers=bs_chrome_mac

Should we add the line added above to here so that we can catch these bugs like this on PRs and not just nightly?

jinjingforever

Reviewable status: complete! 3 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)

tfjs-backend-webgl/scripts/test-ci.sh, line 26 at r4 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Should we add the line added above to here so that we can catch these bugs like this on PRs and not just nightly?

Thanks! I asked @lina128 about this and she said we probably don't have enough resources on browserstack to run more tests for PRs.

pyu10055

Reviewed 4 of 8 files at r1, 17 of 18 files at r2, 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 4 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)

googlebot added the cla: yes label Sep 30, 2020

jinjingforever linked an issue Sep 30, 2020 that may be closed by this pull request

[webgl] CPU forwarding should work for unmodularized kernels. #3994

Closed

jinjingforever requested a review from annxingyuan September 30, 2020 22:47

jinjingforever commented Oct 1, 2020

View reviewed changes

jinjingforever removed the request for review from annxingyuan October 1, 2020 16:04

jinjingforever added 4 commits October 1, 2020 09:09

save

8933be4

add comments

0386e54

refactor

02053d2

save

4d2e8a4

jinjingforever added 3 commits October 7, 2020 09:09

merge

e19d36f

more

3bd2dc8

even more

47e8350

jinjingforever force-pushed the bugfix branch from f2cb128 to 47e8350 Compare October 7, 2020 19:15

jinjingforever marked this pull request as ready for review October 7, 2020 19:23

jinjingforever requested review from lina128, annxingyuan, tafsiri and pyu10055 October 7, 2020 19:23

annxingyuan reviewed Oct 7, 2020

View reviewed changes

tafsiri suggested changes Oct 7, 2020

View reviewed changes

pyu10055 requested changes Oct 7, 2020

View reviewed changes

lina128 reviewed Oct 7, 2020

View reviewed changes

pyu10055 reviewed Oct 7, 2020

View reviewed changes

annxingyuan reviewed Oct 7, 2020

View reviewed changes

address comments

a466d13

jinjingforever commented Oct 7, 2020

View reviewed changes

lina128 reviewed Oct 7, 2020

View reviewed changes

lina128 approved these changes Oct 7, 2020

View reviewed changes

This was linked to issues Oct 7, 2020

WebGL CPU forwarding failed for slice op #4029

Closed

'multiply' not yet implemented or not found in the registry #4030

Closed

annxingyuan approved these changes Oct 8, 2020

View reviewed changes

address comment

2dcd7e0

jinjingforever commented Oct 8, 2020

View reviewed changes

tafsiri approved these changes Oct 8, 2020

View reviewed changes

jinjingforever commented Oct 8, 2020

View reviewed changes

pyu10055 approved these changes Oct 8, 2020

View reviewed changes

pyu10055 merged commit 8c3836b into master Oct 8, 2020

pyu10055 deleted the bugfix branch October 8, 2020 03:31

tafsiri mentioned this pull request Oct 8, 2020

test webgl cpu forwarding #4031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995

Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995

jinjingforever commented Sep 30, 2020 •

edited by nsthorat

jinjingforever commented Sep 30, 2020

annxingyuan commented Oct 1, 2020

jinjingforever left a comment

tafsiri commented Oct 7, 2020

jinjingforever commented Oct 7, 2020

annxingyuan left a comment •

edited

tafsiri left a comment

pyu10055 left a comment

lina128 left a comment

tafsiri commented Oct 7, 2020

pyu10055 left a comment

annxingyuan left a comment

jinjingforever left a comment

lina128 left a comment

lina128 left a comment

annxingyuan left a comment

jinjingforever left a comment

tafsiri left a comment

jinjingforever left a comment

pyu10055 left a comment

Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995

Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995

Conversation

jinjingforever commented Sep 30, 2020 • edited by nsthorat

jinjingforever commented Sep 30, 2020

annxingyuan commented Oct 1, 2020

jinjingforever left a comment

Choose a reason for hiding this comment

tafsiri commented Oct 7, 2020

jinjingforever commented Oct 7, 2020

annxingyuan left a comment • edited

Choose a reason for hiding this comment

tafsiri left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

lina128 left a comment

Choose a reason for hiding this comment

tafsiri commented Oct 7, 2020

pyu10055 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

jinjingforever left a comment

Choose a reason for hiding this comment

lina128 left a comment

Choose a reason for hiding this comment

lina128 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

jinjingforever left a comment

Choose a reason for hiding this comment

tafsiri left a comment

Choose a reason for hiding this comment

jinjingforever left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

jinjingforever commented Sep 30, 2020 •

edited by nsthorat

annxingyuan left a comment •

edited