-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix webgl CPU forwarding for ops that have been modularized in cpu backend #3995
Conversation
Hi Ann, Here is a draft PR where I attempt to fix the problem for one simple op "floor" as an example. This op has the exact problem you described because "floor" has been modularized in cpu (so Please take a look and see if this PR is heading towards the right direction:) The basic idea is: for each cpu forwarding call from webgl ( Once you think this fix is good, I will continue working on this PR to apply the same logic to all the other places. Thank you very much! |
Thank you @jinjingforever ! This is pretty much exactly what I was thinking :) The only feedback I have is - I think elsewhere in our codebase we have used Would also be awesome if you could add a few e2e tests that explicitly test CPU forwarding! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Ann for the feedback! Really appreciate it. I will continue working on this and will let you know when the fix is ready.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)
+1 to just calling the modularized cpu impl where available. However in general we should wrap the other (non modularized) cpu forwarding calls in a try-catch block before falling back to webgl implementations. Using |
Some notes about this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Jing! Do we want to also make the changes that we discussed from Yannick's draft PR? (#4031)
I think the changes are:
- Change existing environments in
backend_webgl_test_registry.ts
to not explicitly setWEBGL_CPU_FORWARD
- Add a third test environment to
backend_webgl_test_registry.ts
that explicitly setsWEBGL_CPU_FORWARD
tofalse
- Add another run-browserstack invocation in
tfjs-backend-webgl/scripts/test-ci.sh
nightly to explicitly enableWEBGL_CPU_FORWARD
- Disable
IS_TEST
check in the backend_webgl.tsshouldExecuteOnCPU
function
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @lina128, @pyu10055, and @tafsiri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this PR. Just to confirm, this doesn't cover all the uses of CPU forwarding in the webgl backend right? (for example I was expecting to see return this.cpuBackend.greater(a, b);
wrapped in a try catch).
Also could you merge the branch from my PR into this branch so we can see the output on CI and iterate on the config we want for testing locally and CI right in the context of this PR?
With regard to try-catch blocks I think we have a number of options:
- The commits on my branch should catch unwrapped calls into the cpu backend (e.g.
return this.cpuBackend.greater(a, b);
) so in theory we do not need to wrap them. In fact leaving them unwrapped forces us to do the new style CPU forwarding in the same PR as modularizing a CPU kernel. - Alternatively we can wrap them so that even if they do fail webgl falls back onto a webgl implementation, this would cause a performance regression in those cases but will always allow the webgl backend to keep executing.
- We can combine the two ideas above by re-throwing the exception in the catch block in tests (but in prod allowing fallback to webgl). The advantage of this is that even if something slips through our tests an op wouldn't fail on webgl. Here is a code snippet illustrating that.
backend_webgl greater()
greater(a: Tensor, b: Tensor): Tensor {
if (this.shouldExecuteOnCPU([a, b])) {
try {
return this.cpuBackend.greater(a, b);
} catch() {
if env().getBool('IS_TEST') {
throw new Exception('CPU forwarding failed')
}
// else we just fall back to webgl implementation.
}
}
if (env().getBool('WEBGL_PACK_BINARY_OPERATIONS')) {
return this.packedBinaryOp(a, b, binaryop_packed_gpu.GREATER, 'bool');
}
const program = new BinaryOpProgram(binaryop_gpu.GREATER, a.shape, b.shape);
return this.compileAndRun(program, [a, b], 'bool');
}
Thoughts? I lean towards the third option.
Reviewed 4 of 8 files at r1, 18 of 18 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @jinjingforever, @lina128, @pyu10055, and @tafsiri)
tfjs-backend-cpu/src/kernels/Ceil.ts, line 24 at r2 (raw file):
export const ceilImpl = createSimpleUnaryImpl((xi) => Math.ceil(xi)); export const ceilKernelFunc = unaryKernelFuncFromImpl(Ceil, ceilImpl);
we should stick to one convention for naming these exported kernel funcs. In existing code they do not have the kernelFunc suffix. I think we should stick to that so that on importing these in other cpu kernels we do have the kernelFunc suffix present (or alternatively we could have the suffix on all of them).
tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):
if (this.shouldExecuteOnCPU([x])) { try { return this.cpuBackend.slice(x, begin, size);
We shouldn't need to have this and the code in the catch block below. We can just use the code in the catch block.
For kernels that aren't modularized in cpu we should try catch.
tfjs-backend-webgl/src/backend_webgl.ts, line 820 at r2 (raw file):
// PR #3956 will modularize concat without this. // if (this.shouldExecuteOnCPU(tensors)) { // return this.cpuBackend.concat(tensors, axis);
I think there should be a try catch block here. If it fails we skip the early return and let webgl proceed as if there was no cpu forwarding.
tfjs-backend-webgl/src/backend_webgl.ts, line 950 at r2 (raw file):
if (this.shouldExecuteOnCPU([a, b])) { try { return this.cpuBackend.multiply(a, b);
same as above, if we can import multipleImplCPU we should not even try calling cpuBackend.multiply.
tfjs-backend-webgl/src/backend_webgl.ts, line 1732 at r2 (raw file):
return this.cpuBackend.rsqrt(x); } catch (e) { const outValues = rsqrtImplCPU(
should a utility function be made for these? there seem to be a lot of similar calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @pyu10055)
tfjs-backend-webgl/src/backend_webgl.ts, line 1658 at r2 (raw file):
expm1<T extends Tensor>(x: T): T { if (this.shouldExecuteOnCPU([x])) {
can we create a method something like
return cpuExecuteOrFallback(executeFunc, fallbackFunc)?
the executeFunc will be
() => this.cpuBackend.expm1(x)
the fallbackFunc will be
() => {
const outValues = expm1ImplCPU(
this.texData.get(x.dataId).values as TypedArray, x.dtype);
const outInfo = this.makeTensorInfo(x.shape, x.dtype);
this.texData.get(outInfo.dataId).values = outValues;
return engine().makeTensorFromDataId(
outInfo.dataId, outInfo.shape, outInfo.dtype) as T;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding test env in test registry, why can't we keep it. And then just add one line to test-ci.sh, yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": true}'
Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever, @lina128, and @tafsiri)
tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
We shouldn't need to have this and the code in the catch block below. We can just use the code in the catch block.
For kernels that aren't modularized in cpu we should try catch.
+1
Chatted with @jinjingforever offline. I think we want to pick between the registerTestEnv approach and the explicit settings of flags approach. If we do the following (as Ann suggests)
Then we do not need to add a line with an explicit setting of However I do think there may be places where we have been running the same config twice, looking at our current test-ci for webgl
To my eyes, in the nightly build the bs_chrome_mac on the second run and bs_chrome_mac on the third line run under the same conditions. I'm not 100% sure of that so we don't have to resolve it in this PR. But overall being able to easily trace where to set things to get the conditions needed is useful, which is why i was initially in favor of primarily using one of these mechanisms. But I've come around to the approach described above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
registerTestEnv({
name: 'webgl2',
backendName: 'webgl',
flags: {
'WEBGL_VERSION': 2,
'WEBGL_CPU_FORWARD': false,
'WEBGL_SIZE_UPLOAD_UNIFORM': 0
},
isDataSync: true
});registerTestEnv({
name: 'webgl2',
backendName: 'webgl',
flags: {
'WEBGL_VERSION': 2,
'WEBGL_CPU_FORWARD': true,
'WEBGL_SIZE_UPLOAD_UNIFORM': 0
},
isDataSync: true
});
There two webgl2 registered with different WEBGL_CPU_FORWARD flag values?
Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever and @tafsiri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's do as Na suggested and leave the backend_webgl test registry alone, but add another call to run-browserstack in the nightly test-ci for WebGL that explicitly turns CPU forwarding on.
Reviewable status: 0 of 1 approvals obtained (waiting on @jinjingforever and @tafsiri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the comments! PTAL
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)
tfjs-backend-cpu/src/kernels/Ceil.ts, line 24 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
we should stick to one convention for naming these exported kernel funcs. In existing code they do not have the kernelFunc suffix. I think we should stick to that so that on importing these in other cpu kernels we do have the kernelFunc suffix present (or alternatively we could have the suffix on all of them).
Good point. I will go through them in another PR to make them consistent.
tfjs-backend-webgl/src/backend_webgl.ts, line 734 at r2 (raw file):
Previously, lina128 (Na Li) wrote…
+1
Ah, of course. Good point. Updated (here and other places)
tfjs-backend-webgl/src/backend_webgl.ts, line 820 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
I think there should be a try catch block here. If it fails we skip the early return and let webgl proceed as if there was no cpu forwarding.
For now, I just removed the cpu-forwarding logic, since Ann's PR doesn't have this after concat is modularized. Will sync with her about this offline.
tfjs-backend-webgl/src/backend_webgl.ts, line 950 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
same as above, if we can import multipleImplCPU we should not even try calling cpuBackend.multiply.
Done.
tfjs-backend-webgl/src/backend_webgl.ts, line 1658 at r2 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
can we create a method something like
return cpuExecuteOrFallback(executeFunc, fallbackFunc)?
the executeFunc will be
() => this.cpuBackend.expm1(x)
the fallbackFunc will be
() => {
const outValues = expm1ImplCPU(
this.texData.get(x.dataId).values as TypedArray, x.dtype);
const outInfo = this.makeTensorInfo(x.shape, x.dtype);
this.texData.get(outInfo.dataId).values = outValues;
return engine().makeTensorFromDataId(
outInfo.dataId, outInfo.shape, outInfo.dtype) as T;
}
I added/refactored some helper methods combining various comments above.
tfjs-backend-webgl/src/backend_webgl.ts, line 1732 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
should a utility function be made for these? there seem to be a lot of similar calls.
Done. (I didn't make it super generic due to some edge cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Jing for the fast iteration! Is test condition added to test-ci?
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the quick fix Jing! Just left a small suggestion.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @jinjingforever, @pyu10055, and @tafsiri)
tfjs-backend-webgl/src/backend_webgl.ts, line 2420 at r3 (raw file):
private makeOutput<T extends Tensor>( shape: number[], dtype: DataType, values?: BackendValues): T { const {dataId} = this.makeTensorInfo(shape, dtype);
if you pass in values to makeTensorInfo you shoudn't need the if block below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-backend-webgl/src/backend_webgl.ts, line 2420 at r3 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
if you pass in values to makeTensorInfo you shoudn't need the if block below
Nice! Done. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! (left one suggestion about testing cpu forwarding on PRs as well as nightly)
Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 3 of 1 approvals obtained (waiting on @annxingyuan, @jinjingforever, and @pyu10055)
tfjs-backend-webgl/scripts/test-ci.sh, line 26 at r4 (raw file):
yarn run-browserstack --browsers=bs_chrome_mac --testEnv webgl2 --flags '{"WEBGL_CPU_FORWARD": true}' else yarn run-browserstack --browsers=bs_chrome_mac
Should we add the line added above to here so that we can catch these bugs like this on PRs and not just nightly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 3 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)
tfjs-backend-webgl/scripts/test-ci.sh, line 26 at r4 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
Should we add the line added above to here so that we can catch these bugs like this on PRs and not just nightly?
Thanks! I asked @lina128 about this and she said we probably don't have enough resources on browserstack to run more tests for PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 8 files at r1, 17 of 18 files at r2, 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 4 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)
This change is