cpu: aarch64: Re-Enable JIT Depthwise Convolution for BF16 #3441

renato-arantes · 2025-06-18T16:57:34Z

Description

This PR re-enables the PR 3308, which was reverted due to failing nightly tests. It enables JIT depthwise convolution for bf16, preventing it from going to ref and significantly improving its performance, from ~4% for one thread to ~2.5x for 32 threads, when compared to the f32 JIT operation. These performance numbers were extracted from a benchmark executed on an AWS c7g.16xlarge instance.

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?
Have you submitted performance data that demonstrates performance improvements?

Performance improvements

Below are small test case logs that demonstrate performance numbers before and after.

OMP_NUM_THREADS=16 ./tests/benchdnn/benchdnn --conv --dt=bf16 --mode=p --alg=convolution_direct g32mb64_ic32oc32_ih112oh112kh3sh1dh0ph1_iw112ow112kw3sw1dw0pw1

bf16: total perf: min(ms):0.815918 avg(ms):0.826171
f32: total perf: min(ms):1.02075 avg(ms):1.03415
ref: total perf: min(ms):225.645 avg(ms):225.808

dzarukin

Just curious: which change addressed the aforementioned failures?

src/cpu/cpu_convolution_list.cpp

renato-arantes · 2025-06-19T08:44:20Z

src/cpu/aarch64/jit_uni_dw_conv_kernel_f32.cpp

+                if (jcp.dst_dt == data_type::f32) {
+                    ld1w(ZRegS(0), P_ALL_ONE, ptr(reg_tmp_addr));
+                    fadd(zregs_acc, zregs_acc, ZRegS(0));
+                } else if (jcp.dst_dt == data_type::bf16) {


Just curious: which change addressed the aforementioned failures?

This is the change that deals with the failed tests. The post-op sum had not been addressed previously.

renato-arantes requested review from a team as code owners June 18, 2025 16:57

github-actions bot added platform:cpu-aarch64 component:common labels Jun 18, 2025

dzarukin approved these changes Jun 18, 2025

View reviewed changes

src/cpu/cpu_convolution_list.cpp Outdated Show resolved Hide resolved

renato-arantes commented Jun 19, 2025

View reviewed changes

cpu: aarch64: Re-Enable JIT Depthwise Convolution for BF16

7a71365

renato-arantes force-pushed the jit_bf16_depth_again branch from 7432276 to 7a71365 Compare June 19, 2025 09:33

jondea approved these changes Jun 19, 2025

View reviewed changes

Sqvid merged commit ad46dbb into uxlfoundation:main Jun 19, 2025
24 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpu: aarch64: Re-Enable JIT Depthwise Convolution for BF16 #3441

cpu: aarch64: Re-Enable JIT Depthwise Convolution for BF16 #3441

Uh oh!

renato-arantes commented Jun 18, 2025

Uh oh!

dzarukin left a comment

Uh oh!

Uh oh!

renato-arantes Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

cpu: aarch64: Re-Enable JIT Depthwise Convolution for BF16 #3441

cpu: aarch64: Re-Enable JIT Depthwise Convolution for BF16 #3441

Uh oh!

Conversation

renato-arantes commented Jun 18, 2025

Description

General

Performance improvements

Uh oh!

dzarukin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

renato-arantes Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!