Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995

Merged
merged 9 commits into from
Oct 8, 2020

Conversation

jinjingforever
Copy link
Collaborator

@jinjingforever jinjingforever commented Sep 30, 2020

This change is Reviewable

@jinjingforever
Copy link
Collaborator Author

Hi Ann,

Here is a draft PR where I attempt to fix the problem for one simple op "floor" as an example. This op has the exact problem you described because "floor" has been modularized in cpu (so this.cpuBackend.floor call would fail).

Please take a look and see if this PR is heading towards the right direction:) The basic idea is: for each cpu forwarding call from webgl (this.cpuBackend.xxx()), we need to check if the corresponding op has been modularized or not. If so (an error would be thrown), we need to call the shared cpu implementation instead.

Once you think this fix is good, I will continue working on this PR to apply the same logic to all the other places.

Thank you very much!

@annxingyuan
Copy link
Collaborator

Thank you @jinjingforever ! This is pretty much exactly what I was thinking :) The only feedback I have is - I think elsewhere in our codebase we have used typeof func === 'function' to check whether functions are defined, rather than using try... catch. Also, since we're manually modularizing each kernel, maybe we could just call the modularized CPU impl function where necessary, rather than going through a generic check for each kernel.

Would also be awesome if you could add a few e2e tests that explicitly test CPU forwarding!

Copy link
Collaborator Author

@jinjingforever jinjingforever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Ann for the feedback! Really appreciate it. I will continue working on this and will let you know when the fix is ready.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)

@jinjingforever jinjingforever removed the request for review from annxingyuan October 1, 2020 16:04
@tafsiri
Copy link
Contributor

tafsiri commented Oct 7, 2020

+1 to just calling the modularized cpu impl where available. However in general we should wrap the other (non modularized) cpu forwarding calls in a try-catch block before falling back to webgl implementations. Using typeof func === 'function' may not work because there would usually be a function definition inherited from the base class (that throws an exception)

@jinjingforever
Copy link
Collaborator Author

Some notes about this PR:

  • I had to pull some changes (mainly the changes in kernel_utils.ts and its callers) from Ann's PR ([webgl] Modularize cast, complex, concat, add, sub, multiply, notEqual, slice, fft, ifft. #3956) to make things work with minimum conflicts.

  • This PR also touches some webgl kernels that will be modularized (and fixed) in Ann's PR (e.g. multiply, concat). Depending which PR is merged first, one of us could just safely delete the code in backend_webgl.ts to resolve conflicts.

  • Last week, I added a bunch of e2e tests covering all kernels that directly do cpu forwarding. As you guys said, it is probably not necessary, but I will just leave them there for now:)

  • I tested locally with cpu forwarding turned-on (thanks to Yannick's draft PR) and things seem to be passing.

Copy link
Collaborator

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jing! Do we want to also make the changes that we discussed from Yannick's draft PR? (#4031)

I think the changes are:

  • Change existing environments in backend_webgl_test_registry.ts to not explicitly set WEBGL_CPU_FORWARD
  • Add a third test environment to backend_webgl_test_registry.tsthat explicitly sets WEBGL_CPU_FORWARD to false
  • Add another run-browserstack invocation in tfjs-backend-webgl/scripts/test-ci.sh nightly to explicitly enable WEBGL_CPU_FORWARD
  • Disable IS_TEST check in the backend_webgl.ts shouldExecuteOnCPU function

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, @pyu10055, and @tafsiri)

Copy link
Contributor

@tafsiri tafsiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR. Just to confirm, this doesn't cover all the uses of CPU forwarding in the webgl backend right? (for example I was expecting to see return this.cpuBackend.greater(a, b); wrapped in a try catch).

Also could you merge the branch from my PR into this branch so we can see the output on CI and iterate on the config we want for testing locally and CI right in the context of this PR?

With regard to try-catch blocks I think we have a number of options:

  • The commits on my branch should catch unwrapped calls into the cpu backend (e.g. return this.cpuBackend.greater(a, b);) so in theory we do not need to wrap them. In fact leaving them unwrapped forces us to do the new style CPU forwarding in the same PR as modularizing a CPU kernel.
  • Alternatively we can wrap them so that even if they do fail webgl falls back onto a webgl implementation, this would cause a performance regression in those cases but will always allow the webgl backend to keep executing.
  • We can combine the two ideas above by re-throwing the exception in the catch block in tests (but in prod allowing fallback to webgl). The advantage of this is that even if something slips through our tests an op wouldn't fail on webgl. Here is a code snippet illustrating that.

backend_webgl greater()

greater(a: Tensor, b: Tensor): Tensor {
    if (this.shouldExecuteOnCPU([a, b])) {
      try {
        return this.cpuBackend.greater(a, b);
      } catch() {
         if env().getBool('IS_TEST') {
            throw new Exception('CPU forwarding failed')
         }
         // else we just fall back to webgl implementation.
      }
    }

    if (env().getBool('WEBGL_PACK_BINARY_OPERATIONS')) {
      return this.packedBinaryOp(a, b, binaryop_packed_gpu.GREATER, 'bool');
    }

    const program = new BinaryOpProgram(binaryop_gpu.GREATER, a.shape, b.shape);
    return this.compileAndRun(program, [a, b], 'bool');
  }

Thoughts? I lean towards the third option.

Reviewed 4 of 8 files at r1, 18 of 18 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @jinjingforever, @lina128, @pyu10055, and @tafsiri)


tfjs-backend-cpu/src/kernels/Ceil.ts, line 24 at r2 (raw file):

export const ceilImpl = createSimpleUnaryImpl((xi) => Math.ceil(xi));
export const ceilKernelFunc = unaryKernelFuncFromImpl(Ceil, ceilImpl);

we should stick to one convention for naming these exported kernel funcs. In existing code they do not have the kernelFunc suffix. I think we should stick to that so that on importing these in other cpu kernels we do have the kernelFunc suffix present (or alternatively we could have the suffix on all of them).


tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):

    if (this.shouldExecuteOnCPU([x])) {
      try {
        return this.cpuBackend.slice(x, begin, size);

We shouldn't need to have this and the code in the catch block below. We can just use the code in the catch block.

For kernels that aren't modularized in cpu we should try catch.


tfjs-backend-webgl/src/backend_webgl.ts, line 820 at r2 (raw file):

    // PR #3956 will modularize concat without this.
    // if (this.shouldExecuteOnCPU(tensors)) {
    //   return this.cpuBackend.concat(tensors, axis);

I think there should be a try catch block here. If it fails we skip the early return and let webgl proceed as if there was no cpu forwarding.


tfjs-backend-webgl/src/backend_webgl.ts, line 950 at r2 (raw file):

    if (this.shouldExecuteOnCPU([a, b])) {
      try {
        return this.cpuBackend.multiply(a, b);

same as above, if we can import multipleImplCPU we should not even try calling cpuBackend.multiply.


tfjs-backend-webgl/src/backend_webgl.ts, line 1732 at r2 (raw file):

        return this.cpuBackend.rsqrt(x);
      } catch (e) {
        const outValues = rsqrtImplCPU(

should a utility function be made for these? there seem to be a lot of similar calls.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @pyu10055)


tfjs-backend-webgl/src/backend_webgl.ts, line 1658 at r2 (raw file):

  expm1<T extends Tensor>(x: T): T {
    if (this.shouldExecuteOnCPU([x])) {

can we create a method something like
return cpuExecuteOrFallback(executeFunc, fallbackFunc)?
the executeFunc will be
() => this.cpuBackend.expm1(x)
the fallbackFunc will be
() => {
const outValues = expm1ImplCPU(
this.texData.get(x.dataId).values as TypedArray, x.dtype);
const outInfo = this.makeTensorInfo(x.shape, x.dtype);
this.texData.get(outInfo.dataId).values = outValues;
return engine().makeTensorFromDataId(
outInfo.dataId, outInfo.shape, outInfo.dtype) as T;
}

Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding test env in test registry, why can't we keep it. And then just add one line to test-ci.sh, yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": true}'

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @tafsiri)


tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

We shouldn't need to have this and the code in the catch block below. We can just use the code in the catch block.

For kernels that aren't modularized in cpu we should try catch.

+1

@tafsiri
Copy link
Contributor

tafsiri commented Oct 7, 2020

Chatted with @jinjingforever offline. I think we want to pick between the registerTestEnv approach and the explicit settings of flags approach. If we do the following (as Ann suggests)

registerTestEnv({
  name: 'webgl1',
  backendName: 'webgl',
  flags: {
    'WEBGL_VERSION': 1,
    'WEBGL_CPU_FORWARD': false,
    'WEBGL_SIZE_UPLOAD_UNIFORM': 0
  },
  isDataSync: true
});

registerTestEnv({
  name: 'webgl2',
  backendName: 'webgl',
  flags: {
    'WEBGL_VERSION': 2,
    'WEBGL_CPU_FORWARD': false,
    'WEBGL_SIZE_UPLOAD_UNIFORM': 0
  },
  isDataSync: true
});

registerTestEnv({
  name: 'webgl2',
  backendName: 'webgl',
  flags: {
    'WEBGL_VERSION': 2,
    'WEBGL_CPU_FORWARD': true,
    'WEBGL_SIZE_UPLOAD_UNIFORM': 0
  },
  isDataSync: true
});

Then we do not need to add a line with an explicit setting of 'WEBGL_CPU_FORWARD': true, this is because when no flags are passed the tests will run in all the test_envs listed above. I'm fine with that as its super safe and give immediate feedback (at the cost of longer test runs—but the functionality being tested seems pretty key).

However I do think there may be places where we have been running the same config twice, looking at our current test-ci for webgl

if [ "$NIGHTLY" = true ]; then
  yarn run-browserstack --browsers=bs_safari_mac,bs_ios_11 --testEnv webgl1 --flags '{"WEBGL_CPU_FORWARD": false, "WEBGL_SIZE_UPLOAD_UNIFORM": 0}'
  yarn run-browserstack --browsers=bs_firefox_mac,bs_chrome_mac
  yarn run-browserstack --browsers=bs_chrome_mac,win_10_chrome,bs_android_9 --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": false, "WEBGL_SIZE_UPLOAD_UNIFORM": 0}'
  
  yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_PACK": false}'
else
  yarn run-browserstack --browsers=bs_chrome_mac
fi

To my eyes, in the nightly build the bs_chrome_mac on the second run and bs_chrome_mac on the third line run under the same conditions. I'm not 100% sure of that so we don't have to resolve it in this PR. But overall being able to easily trace where to set things to get the conditions needed is useful, which is why i was initially in favor of primarily using one of these mechanisms. But I've come around to the approach described above.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registerTestEnv({
name: 'webgl2',
backendName: 'webgl',
flags: {
'WEBGL_VERSION': 2,
'WEBGL_CPU_FORWARD': false,
'WEBGL_SIZE_UPLOAD_UNIFORM': 0
},
isDataSync: true
});

registerTestEnv({
name: 'webgl2',
backendName: 'webgl',
flags: {
'WEBGL_VERSION': 2,
'WEBGL_CPU_FORWARD': true,
'WEBGL_SIZE_UPLOAD_UNIFORM': 0
},
isDataSync: true
});

There two webgl2 registered with different WEBGL_CPU_FORWARD flag values?

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever and @tafsiri)

Copy link
Collaborator

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually let's do as Na suggested and leave the backend_webgl test registry alone, but add another call to run-browserstack in the nightly test-ci for WebGL that explicitly turns CPU forwarding on.

Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever and @tafsiri)

Copy link
Collaborator Author

@jinjingforever jinjingforever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the comments! PTAL

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)


tfjs-backend-cpu/src/kernels/Ceil.ts, line 24 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

we should stick to one convention for naming these exported kernel funcs. In existing code they do not have the kernelFunc suffix. I think we should stick to that so that on importing these in other cpu kernels we do have the kernelFunc suffix present (or alternatively we could have the suffix on all of them).

Good point. I will go through them in another PR to make them consistent.


tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):

Previously, lina128 (Na Li) wrote…

+1

Ah, of course. Good point. Updated (here and other places)


tfjs-backend-webgl/src/backend_webgl.ts, line 820 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

I think there should be a try catch block here. If it fails we skip the early return and let webgl proceed as if there was no cpu forwarding.

For now, I just removed the cpu-forwarding logic, since Ann's PR doesn't have this after concat is modularized. Will sync with her about this offline.


tfjs-backend-webgl/src/backend_webgl.ts, line 950 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

same as above, if we can import multipleImplCPU we should not even try calling cpuBackend.multiply.

Done.


tfjs-backend-webgl/src/backend_webgl.ts, line 1658 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can we create a method something like
return cpuExecuteOrFallback(executeFunc, fallbackFunc)?
the executeFunc will be
() => this.cpuBackend.expm1(x)
the fallbackFunc will be
() => {
const outValues = expm1ImplCPU(
this.texData.get(x.dataId).values as TypedArray, x.dtype);
const outInfo = this.makeTensorInfo(x.shape, x.dtype);
this.texData.get(outInfo.dataId).values = outValues;
return engine().makeTensorFromDataId(
outInfo.dataId, outInfo.shape, outInfo.dtype) as T;
}

I added/refactored some helper methods combining various comments above.


tfjs-backend-webgl/src/backend_webgl.ts, line 1732 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

should a utility function be made for these? there seem to be a lot of similar calls.

Done. (I didn't make it super generic due to some edge cases).

Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jing for the fast iteration! Is test condition added to test-ci?

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)

Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the quick fix Jing! Just left a small suggestion.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @jinjingforever, @pyu10055, and @tafsiri)


tfjs-backend-webgl/src/backend_webgl.ts, line 2420 at r3 (raw file):

  private makeOutput<T extends Tensor>(
      shape: number[], dtype: DataType, values?: BackendValues): T {
    const {dataId} = this.makeTensorInfo(shape, dtype);

if you pass in values to makeTensorInfo you shoudn't need the if block below

Copy link
Collaborator Author

@jinjingforever jinjingforever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-backend-webgl/src/backend_webgl.ts, line 2420 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

if you pass in values to makeTensorInfo you shoudn't need the if block below

Nice! Done. Thank you!

Copy link
Contributor

@tafsiri tafsiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! (left one suggestion about testing cpu forwarding on PRs as well as nightly)

Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 3 of 1 approvals obtained (waiting on @annxingyuan, @jinjingforever, and @pyu10055)


tfjs-backend-webgl/scripts/test-ci.sh, line 26 at r4 (raw file):

  yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": true}'
else
  yarn run-browserstack --browsers=bs_chrome_mac

Should we add the line added above to here so that we can catch these bugs like this on PRs and not just nightly?

Copy link
Collaborator Author

@jinjingforever jinjingforever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 3 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)


tfjs-backend-webgl/scripts/test-ci.sh, line 26 at r4 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Should we add the line added above to here so that we can catch these bugs like this on PRs and not just nightly?

Thanks! I asked @lina128 about this and she said we probably don't have enough resources on browserstack to run more tests for PRs.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 8 files at r1, 17 of 18 files at r2, 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 4 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)

@pyu10055 pyu10055 merged commit 8c3836b into master Oct 8, 2020
@pyu10055 pyu10055 deleted the bugfix branch October 8, 2020 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants