Skip to content

[Web] WebGPU performance discrepancy between ONNX and ORT formats in browser: WASM ops dominate in ORT model #24475

Open
@grazder

Description

@grazder

Describe the issue

I observe a significant difference in execution behavior when switching from ONNX to ORT model formats for WebGPU usage.

1. With ONNX format:

  • run() completes quickly, and most latency comes from await device.queue.onSubmittedWorkDone() (expected for GPU-bound work).
  • Profiler shows only GPU-related operations.

2. With ORT format:

  • The entire latency shifts to await model.run(), with no significant wait on GPU queue completion.
  • Profiler reveals many WASM operations instead of GPU-accelerated kernels.

I think that ORT format should leverage GPU execution providers (WebGPU/WebGL) similarly to ONNX format, without introducing unexpected WASM-based CPU fallbacks.

Profling for model which I attach in to reproduce:
Image

Another model profiling (this one larger):
Image

  • Conversion flags matter? Does the ORT format pre-optimize the model for CPU/WASM by default?
  • Are there hidden constraints when using ORT format in browsers?

To reproduce

  1. Load an ONNX model and run inference with:
await this._model.run({ input: tensor }, { output: this._outputTensor });
await this._device.queue.onSubmittedWorkDone(); // Majority of latency here
  1. Convert the model to ORT format (using convert_onnx_models_to_ort.py)
  2. Load the ORT model and run the same inference code.
  3. Observe profiler showing WASM ops and latency moved to await model.run().

I attach model example in ONNX and ORT formats on which I face issue - models.zip

I can reproduce it on 1.20.0 and d27fecd

Urgency

Can you help me to clarify:

  • What are these operations and why do they appear?
  • Does this mean that the model runs on the CPU?

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

d27fecd

Execution Provider

'webgpu' (WebGPU)

Metadata

Metadata

Assignees

Labels

ep:WebGPUort-web webgpu providerplatform:webissues related to ONNX Runtime web; typically submitted using template

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions