Skip to content

[Performance] Upstream MLAS backend optimization for better thread partitioning in multi-group or large batch convolutions #25152

Open
@zoeczy

Description

@zoeczy

Describe the issue

We have implemented a performance optimization in the MLAS backend of ONNX Runtime that improves CPU utilization in convolution workloads with multiple groups or large batch sizes. As a result, in multi-group or large-batch convolution scenarios, we observe near-linear performance scaling with increasing intra_op_num_threads.

We would like to ask:

  1. Whether this kind of optimization aligns with the upstream design goals for MLAS?
  2. If so, would the ONNX Runtime team be open to reviewing a PR that introduces this optimization?
  3. Are there any guidelines or preferred patterns for contributing such low-level performance improvements to MLAS?

To reproduce

This optimization targets CPU inference scenarios with:

  • Convolution models using group > 1, or
  • Convolutions with large batch size (e.g., batch ≥ 32)

To observe the issue without optimization:

  1. Use ort.SessionOptions().intra_op_num_threads = N (e.g., 16 or 32) to explicitly control the number of intra-op threads.
  2. Benchmark the execution time and thread utilization of convolution operators with multiple groups or large batch sizes
  3. Measure the overall inference latency of such models and compare core utilization across threads.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04 (x86_64)

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

master

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Other / Unknown

Execution Provider Library Version

MLAS(cpu default execution provider)

Model File

No response

Is this a quantized model?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions