[Web] WebGPU Incorrect predictions in ONNX model when using Electron on Intel devices

### Describe the issue

We're using `onnxruntime-web` with `WebGPU backend` on different platforms and Electron is one of them.

We observe unstable/inaccurate predictions from an ONNX segmentation model when running inference via ONNX Runtime Web in Electron on specific Intel integrated Intel GPUs (Gen-12LP, Gen-9, Gen-11). The issue does not occur in Chrome on the same devices. The problem manifests as significant tensor value mismatches (e.g., abs/rel errors) in convolution layers, leading to invalid segmentation masks.

On `1.20.1` we faced this problem mostly on intel `gen-12lp` devices: i5-12400, i7-13700H, i7-11850H, i7-12700, i5-1235U and on a lot of others.

I tried to cherry-pick versions to find solution to this problem to found problem solution for devices above. I found out that it is broken until `1.21.0-dev.20241107-6a295eb75b` and fixed after `1.21.0-dev.20241109-d3ad76b2cf`.

After that I decided to use [d27fecd3d3837864a268bc96f00f2b8dce294697](https://github.com/microsoft/onnxruntime/commit/d27fecd3d3837864a268bc96f00f2b8dce294697) commit, because everything seemed stable and for devices above problem was solved.

But after that we've faced problem on various devices. Examples:
* `gen-12lp`: i7-12700H (breaks after model reinitialization), i3-1215U, i5-1035G1
* `gen-11`: i5-11320H
* `gen-9`: i3-7100U, i5-7200U, i7-8565U

I noticed similar problems, for example, that the prediction models are too different from the reference (atol > 0.1) on Ampere and Turing GPUs in Chrome, and also found in many devices for fp16. But we face this problems much less.

I also tried versions above, but faced look-alike problems on `i7-13700H` for example.

To help sort out this problem I can produce more info like WebGPU reports, provide more devices examples, try some commits more on this devices.

### To reproduce

I can reproduce on my own devices: 
* Mac M1 (metal-3)
* NVIDIA GeForce RTX 3060 (ampere)
* i5-12400 (gen-12lp) - here is I can see some problems

I attach some Convs from my model, on which tests fail - [test_examples.zip](https://github.com/user-attachments/files/19778965/test_examples.zip)

Master - [4d03aeff0ef86a62dacf02d67624cf26050125fd](https://github.com/microsoft/onnxruntime/commit/4d03aeff0ef86a62dacf02d67624cf26050125fd)

```
git checkout 4d03aeff0ef86a62dacf02d67624cf26050125fd
cd onnxruntime/js
npm ci
cd common
npm ci
cd ../web
npm ci
npm run pull:wasm
npm run build
```

Move test cases from above into `onnxruntime/js/web/test/data/node/opset_20` (`opset_20` is random name that the testing scripts work with)

Change `onnxruntime/js/web/test/suite-test-list.jsonc` to:
```
{
  "webgpu": {
    "onnx": [],
    "node": ["8_Conv", "21_Conv", "31_Conv"],
    "ops": []
  }
}
```

After that I run tests for this ops on all of my devices
```
// gen-12lp
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS

// ampere
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS

// metal-3
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS
```

After that I checkout to [6a295eb75b](https://github.com/microsoft/onnxruntime/commit/6a295eb75b)
```
git checkout 6a295eb75b
js\build_jsep.bat r

// building etc

// metal-3
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS

// gen-12lp
npm run test -- suite1 --backend webgpu --env electron 
// FAIL
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS

// ampere
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS
```

I see following mismatch on `gen-12lp` (I print here only first 10 tensor numbers):
```
LOG: 'e Validator 2025-04-16T13:01:25.774Z|abs/rel check failed-- index:163839: actual=1.9458719491958618,expected=3.159862518310547'
LOG: 'e TestRunner 2025-04-16T13:01:25.774Z|Tensor mismatch:
ACTUAL: type=float32; dims=[1,16,80,128]; data=[-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426,-1.5702481269836426]
EXPECT: type=float32; dims=[1,16,80,128]; data=[0.6060669422149658,0.5686113834381104,0.5930850505828857,0.5984766483306885,0.5964930057525635,0.5918130874633789,0.5929081439971924,0.6105263233184814,0.6307907104492188,0.6446692943572998]'
LOG: 'e TestRunner 2025-04-16T13:01:25.774Z|  Result: FAILED'
LOG: 'e TestRunner 2025-04-16T13:01:25.774Z|Failed to run test data from folder: test_data_set_0. Error: [AssertionError: tensor data should match: expected false to be true]'
```

After that I checkout to [d3ad76b2cf](https://github.com/microsoft/onnxruntime/commit/d3ad76b2cf)
```
git checkout d3ad76b2cf
js\build_jsep.bat r

// building etc

// metal-3
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS

// gen-12lp
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS

// ampere
npm run test -- suite1 --backend webgpu --env electron 
// SUCCESS
npm run test -- suite1 --backend webgpu --env chrome
// SUCCESS
```

So this fixes the issue for my device, but I assume that on devices with the incorrect predictions listed above we will face the same errors.

So, It seems that Convolutions are unstable on Electron for a lot of Intel devices.

### Urgency

I'm working on segmentation model, and I see on some devices weird model predictions, so this problem is very important. And I face it a lot. But as a workaround I developed some tests, which I run on the initialization, so I can turn off model if it provides incorrect predictions. 

Here is picture of incorrect Convolution behaviour (It's not because model was trained badly, It's 100% because of incorrect predictions.)

<img width="721" alt="Image" src="https://github.com/user-attachments/assets/6b8855d0-786e-454a-bb11-e1a0c1e4a942" />

So I think this problem is critical for `onnxruntime-web` usage on Electron.

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

d27fecd3d3837864a268bc96f00f2b8dce294697

### Execution Provider

'webgpu' (WebGPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Web] WebGPU Incorrect predictions in ONNX model when using Electron on Intel devices #24442

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Web] WebGPU Incorrect predictions in ONNX model when using Electron on Intel devices #24442

Description

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions