[Feature Request] Parallelization of the CPU DequantizeLinear

### Describe the feature request

DequantizeLinear's current implementation is naive - single thread and scalar instructions.

https://github.com/microsoft/onnxruntime/pull/20901

Could we prioritize a MT/vectorized implementation for this code path to match the MlasQuantizeLinearKernel implementation?

@fajin-corp already made some comments about this.

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/quantization/quantize_linear.cc#L302

### Describe scenario use case

We should see performance improvements to a multitude of use cases.

Recently, the QNN-EP made this code path the default for the execution provider for performance reasons as well. It's likely vectorization would help this effort even more.

https://github.com/microsoft/onnxruntime/releases/tag/v1.20.2

Another user recently commented also about performance gains with Qwen 2.5 0.5B model:

https://github.com/microsoft/onnxruntime/issues/23395

The thread has since become stale, so I cannot add onto it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Parallelization of the CPU DequantizeLinear #24124

Describe the feature request

Describe scenario use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Parallelization of the CPU DequantizeLinear #24124

Description

Describe the feature request

Describe scenario use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions