Skip to content

[Feature Request] Parallelization of the CPU DequantizeLinear #24124

Open
@VamsiRatnakaramWork

Description

@VamsiRatnakaramWork

Describe the feature request

DequantizeLinear's current implementation is naive - single thread and scalar instructions.

#20901

Could we prioritize a MT/vectorized implementation for this code path to match the MlasQuantizeLinearKernel implementation?

@fajin-corp already made some comments about this.

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/quantization/quantize_linear.cc#L302

Describe scenario use case

We should see performance improvements to a multitude of use cases.

Recently, the QNN-EP made this code path the default for the execution provider for performance reasons as well. It's likely vectorization would help this effort even more.

https://github.com/microsoft/onnxruntime/releases/tag/v1.20.2

Another user recently commented also about performance gains with Qwen 2.5 0.5B model:

#23395

The thread has since become stale, so I cannot add onto it.

Metadata

Metadata

Labels

ep:QNNissues related to QNN exeution providerfeature requestrequest for unsupported feature or enhancementquantizationissues related to quantization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions