-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebGPU Performance Issues #5689
Comments
@vladmandic Thanks for the good comments and data, as always!
Thanks again for your valuable feedback, hopes to hear more details from your side about warmup regression (e.g., hardware configuration), and looks forward to more collaborations in the future! |
Thank you for the notes, here are full details Performance TestingEnvironment: Notes
Test Results{ message: 'initial', warmup: 3134, inference: 2638, tfjs: '3.9.0', backend: 'wasm', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 3119, inference: 2618, tfjs: '3.9.0', backend: 'wasm', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 11836, inference: 61, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 2665, inference: 60, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 6128, inference: 54, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'cached', warmup: 1202, inference: 67, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'initial', warmup: 5018, inference: 23, tfjs: '3.9.0', backend: 'webgpu', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 4454, inference: 22, tfjs: '3.9.0', backend: 'webgpu', tensors: 304, agent: 'Chrome/94', env: [] } IssuesUsing
ReproductionFully automated test in |
Above post is using single model (can be re-tested using any model, I've used Inception v4 trained on ImageNet 1k) However, when I try My best guess is that some ops get executed on CPU thus causing major slowdown You can try using following URLs: |
@vladmandic Can you put the Inception v4 model somewhere that I can access? It seems that And for your demo app, I can reproduce the bad performance for webgpu. Thanks for the reporting. I will take a look. |
To keep it reproducible with a readily available public model, you can use any mid-complexity model, { message: 'initial', warmup: 2645, inference: 1908, tfjs: '3.9.0', backend: 'wasm', tensors: 394, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 2330, inference: 1808, tfjs: '3.9.0', backend: 'wasm', tensors: 394, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 20148, inference: 107, tfjs: '3.9.0', backend: 'webgl', tensors: 394, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 5374, inference: 105, tfjs: '3.9.0', backend: 'webgl', tensors: 394, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 7428, inference: 119, tfjs: '3.9.0', backend: 'webgl', tensors: 394, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'cached', warmup: 2053, inference: 103, tfjs: '3.9.0', backend: 'webgl', tensors: 394, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'initial', warmup: 5087, inference: 64, tfjs: '3.9.0', backend: 'webgpu', tensors: 394, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 4427, inference: 70, tfjs: '3.9.0', backend: 'webgpu', tensors: 394, agent: 'Chrome/94', env: [] } As you can see, data is pretty much the same as with Inception v4 model
I've traced it down - there are couple of places where WebGPU is a touch slower than WebGL,
FYI NMS function params are: boxes.shape = [896, 4]
scores.shape = [896]
maxOutputSize = 1
iouThreshold = 0.1
scoreThreshold = 0.2 Also, it seems like For example, running a
|
@vladmandic Thanks for the detailed information. I can run your benchmarks using
|
Thanks
Much appreciated!
Thanks for confirming
You're right, perf problem is basically ANY first TF opereration executed in JS code (outside of the model) - there is a massive latency penalty Simple reproduction: const numIterations = 50;
const arr = new Uint8Array(imageData?.data.buffer); // input data in my case is 4k imageData, but can be any dataset
const t0 = performance.now();
for (let i = 0; i < numIterations; i++) {
const rgba = tf.tensor(arr, [imageData.width, imageData.height, 4], 'int32'); // create rgba tensor
const rgb = tf.slice3d(rgba, [0, 0, 0], [-1, -1, 3]); // strip alpha channel
const tensor = tf.expandDims(rgb, 0); // create standard image tensor [1, height, width, 3]
// const data = await tensor.array(); // download data from gpu
tf.dispose([rgba, rgb, tensor]); // just dispose everything
}
const t1 = performance.now();
const avgTime = Math.round((t1 - t0) / numIterations);
console.log({ backend: tf.getBackend(), average: avgTime }); This loop in Setting Note that when disabled line that downloads data back from GPU is enabled, both So running models in WebGPU is faster than WebGL, but preparing inputs and processing outputs adds huge penalty at the moment
I'm getting warning even running this simple code from above, no model execution at all Message is slightly different in Chrome 97 vs 94 and maximum binding size is much bigger, but error is pretty much the same:
There are no errors logged in |
@qjia7 did you have a chance to look at the |
@vladmandic Sorry, I can't reproduce the latency issue you mentioned. From the code snippet you paste, it basically does nothing and is executed instantaneously in my side.
Some update for the warmup time:
|
@qjia7 thanks for the update! and the description of the chromium queue handling is sounds like it could be the same root cause for what i'm seeing as extreme latency issues for reproduction, im guessing your test failed since here's a live link: https://vladmandic.github.io/tfjs-utils/src/latency-issue.html and output on my notebook:
basically, for any "real" work, setting |
@vladmandic Thanks for your live case. I can reproduce it now. After debugging, I find the time mainly costs on |
@vladmandic Jiawei in our team has fixed the And for the long warmup time, we drafted the prototype for the parallel compilation, showing almost 4x speedup for the warmup time. Currently, we are discussing how to expose this capability uniformly between webgl and webgpu. Will keep you updated. |
Thanks @qjia7! I've tested with Chrome 99 and latency is gone - WebGPU now performs on-par with WebGL I'm looking forward to other proposed changes (once issue is resolved) for Anyhow, I'm closing this issue as resolved... |
PERF BUG: #5689 * webgpu: Use mapAsync instead of writeBuffer for uploading * Correct test cases * Ignore the promise rejection * Fix buffer was not provided error * Fix bots failure * Recover some tests * Remove unnecessary early-return and reset * add benchmark test
i just tried new
tfjs-backend-webgpu
0.0.1-alpha.8 ontfjs
3.9.0environment: chrome 96 canary on windows 11
first, great job on adding tons of new ops - from perspective of supported kernel ops,
webgpu
is becoming usable!however, switch to WGSL is anything but useful so far - it comes as a major performance degredation
overall,
webgpu
has gotten slower thanwebgl
(and
webgl
itself has become significantly slower sincetfjs
3.4.0 - this is discussed separately in several open issues)not to mention that new work that has gone into
webgl
to make it manageable (enable uniforms) has no effect onwebgpu
comparing warmup times
(fyi, my app by default uses 8 simple models running in parallel - total models size is actually tiny, below 30mb):
webgl
(default settings)webgl
withWEBGL_PACK_DEPTHWISECONV=false
andWEBGL_USE_SHAPES_UNIFORMS=true
webgpu
(default settings)webgpu
withWEBGPU_USE_GLSL=true
wasm
(no real warmup, included for refrerence only)imo, when developing new backend, goal should be that its better than the previous one - not just that it passes unit tests
if
webgpu
is not significantly improved, it will be a d.o.a. once releasedcc @qjia7 and @xhcao due to work on webgpu
cc @pyu10055 as assignee on webgl performance degradation issue
The text was updated successfully, but these errors were encountered: