cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256 #3411

Anallear · 2025-06-10T14:07:30Z

Description

Moving brgemm_1x1_convolution_fwd_t<sve_256> before the jit_sve_1x1_convolution_fwd implementation based on benchmark results showing better performance.

Performance improvements

When running using pytorch its showing slower results with stride 1, thats why we put a condition to run brgemm 1X1 when stride is more or equal to 2
Performance Results (these results ran on 16 threads )
mb: mini batch size , ic: input channels , S: Stride
Brgconv,
S=1, mb=64, ic=384 ,time= 21.25 ms
S=2, mb=64, ic=384 ,time= 5.48 ms
S=3, mb=64, ic=384 ,time= 2.57 ms

JIT_1X1,
S=1, mb=64, ic=384 ,time= 21.62 ms
S=2, mb=64, ic=384 ,time= 6.16 ms
S=3, mb=64, ic=384 ,time= 5.87 ms

Bug fixes

Fixes nightly regression failures

Ryo-not-rio

Thanks for this fix, just few minor formatting issues and looks like you need to run clang-format!

src/cpu/aarch64/jit_brgemm_1x1_conv.cpp

Ryo-not-rio · 2025-06-10T16:23:38Z

src/cpu/aarch64/jit_brgemm_1x1_conv.cpp

@@ -44,6 +44,7 @@ using namespace data_type;

 template <cpu_isa_t isa>
 status_t brgemm_1x1_convolution_fwd_t<isa>::pd_t::init(engine_t *engine) {
+


Ryo-not-rio · 2025-06-10T16:23:49Z

src/cpu/aarch64/jit_brgemm_1x1_conv.cpp

+
+

Ryo-not-rio · 2025-06-10T16:23:58Z

src/cpu/aarch64/jit_brgemm_1x1_conv.cpp

+
+

Moving brgemm_1x1_convolution_fwd_t<sve_256> before the jit_sve_1x1_convolution_fwd implementation based on benchmark results showing better performance.

Anallear requested review from a team as code owners June 10, 2025 14:07

github-actions bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 component:common labels Jun 10, 2025

Ryo-not-rio requested changes Jun 10, 2025

View reviewed changes

Anallear force-pushed the fixreg branch 2 times, most recently from d5ee1d9 to d94617f Compare June 12, 2025 12:49

cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256

c87c151

Moving brgemm_1x1_convolution_fwd_t<sve_256> before the jit_sve_1x1_convolution_fwd implementation based on benchmark results showing better performance.

Anallear force-pushed the fixreg branch from d94617f to c87c151 Compare June 12, 2025 14:58

Ryo-not-rio approved these changes Jun 13, 2025

View reviewed changes

Radu2k approved these changes Jun 13, 2025

View reviewed changes

vpirogov approved these changes Jun 17, 2025

View reviewed changes

Sqvid merged commit fd8d5eb into uxlfoundation:main Jun 18, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256 #3411

cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256 #3411

Uh oh!

Anallear commented Jun 10, 2025 •

edited by Ryo-not-rio

Loading

Uh oh!

Ryo-not-rio left a comment •

edited

Loading

Uh oh!

Uh oh!

Ryo-not-rio Jun 10, 2025

Uh oh!

Ryo-not-rio Jun 10, 2025

Uh oh!

Ryo-not-rio Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -44,6 +44,7 @@ using namespace data_type;

		template <cpu_isa_t isa>
		status_t brgemm_1x1_convolution_fwd_t<isa>::pd_t::init(engine_t *engine) {

cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256 #3411

cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256 #3411

Uh oh!

Conversation

Anallear commented Jun 10, 2025 • edited by Ryo-not-rio Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance improvements

Bug fixes

Uh oh!

Ryo-not-rio left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ryo-not-rio Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Ryo-not-rio Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Ryo-not-rio Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Anallear commented Jun 10, 2025 •

edited by Ryo-not-rio

Loading

Ryo-not-rio left a comment •

edited

Loading