Open
Description
Describe the feature request
We would like to request the re-introduction of the XNNPACK Execution Provider (EP) for ONNX Runtime Web.
Motivation
We’ve benchmarked default Wasm and XNNPACK-Wasm execution providers for browser-based inference, and observed that XNNPACK EP outperforms the current WebAssembly-based EP for Mobilenetv3-like models, on various laptop devices.
Thread1
Device | Execution Time Reduction by Switching to XNNPACK |
---|---|
TGL / i7-1165G7 | 54% |
MTL/Ultra7 155H | 76% |
LNL/Ultra7 268V | 79% |
In our benchmarks, XNNPACK EP achieves over 50% lower latency than the MLAS Wasm EP on all laptop devices.
We have a local build for adding xnnpack ep back on the latest onnxruntime and are willing to contribute. Would this change be welcomed?
Describe scenario use case
Browser-based fp32 inference could benefit from the XNNPack ep.