[Web] WebGPU performance discrepancy between ONNX and ORT formats in browser: WASM ops dominate in ORT model

### Describe the issue

I observe a significant difference in execution behavior when switching from ONNX to ORT model formats for WebGPU usage.  

### 1. With ONNX format:
* `run()` completes quickly, and most latency comes from `await device.queue.onSubmittedWorkDone()` (expected for GPU-bound work).
* Profiler shows only GPU-related operations.

### 2. With ORT format:
* The entire latency shifts to `await model.run()`, with no significant wait on GPU queue completion.
* Profiler reveals many WASM operations instead of GPU-accelerated kernels.

I think that ORT format should leverage GPU execution providers (WebGPU/WebGL) similarly to ONNX format, without introducing unexpected WASM-based CPU fallbacks.

Profling for model which I attach in `to reproduce`:
![Image](https://github.com/user-attachments/assets/8c7f01eb-cfec-421c-9676-9d06f8151249)

Another model profiling (this one larger):
![Image](https://github.com/user-attachments/assets/72fa4417-2bd2-402d-bad3-9b844dd838c1)

* Conversion flags matter? Does the ORT format pre-optimize the model for CPU/WASM by default?
* Are there hidden constraints when using ORT format in browsers?

### To reproduce

1. Load an ONNX model and run inference with:
```
await this._model.run({ input: tensor }, { output: this._outputTensor });
await this._device.queue.onSubmittedWorkDone(); // Majority of latency here
```
2. Convert the model to ORT format (using `convert_onnx_models_to_ort.py`)
3. Load the ORT model and run the same inference code.
4. Observe profiler showing WASM ops and latency moved to `await model.run()`.

I attach model example in ONNX and ORT formats on which I face issue - [models.zip](https://github.com/user-attachments/files/19821114/models.zip)

I can reproduce it on `1.20.0` and https://github.com/microsoft/onnxruntime/commit/d27fecd3d3837864a268bc96f00f2b8dce294697

### Urgency

Can you help me to clarify:
* What are these operations and why do they appear?
* Does this mean that the model runs on the CPU?

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

d27fecd3d3837864a268bc96f00f2b8dce294697

### Execution Provider

'webgpu' (WebGPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Web] WebGPU performance discrepancy between ONNX and ORT formats in browser: WASM ops dominate in ORT model #24475

Describe the issue

1. With ONNX format:

2. With ORT format:

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Web] WebGPU performance discrepancy between ONNX and ORT formats in browser: WASM ops dominate in ORT model #24475

Description

Describe the issue

1. With ONNX format:

2. With ORT format:

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions