-
Notifications
You must be signed in to change notification settings - Fork 1k
cpu: aarch64: prefer brgemm over jit for 1x1 convolutions with sve_256 #3411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix, just few minor formatting issues and looks like you need to run clang-format!
@@ -44,6 +44,7 @@ using namespace data_type; | |||
|
|||
template <cpu_isa_t isa> | |||
status_t brgemm_1x1_convolution_fwd_t<isa>::pd_t::init(engine_t *engine) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
d5ee1d9
to
d94617f
Compare
Moving brgemm_1x1_convolution_fwd_t<sve_256> before the jit_sve_1x1_convolution_fwd implementation based on benchmark results showing better performance.
Description
Moving brgemm_1x1_convolution_fwd_t<sve_256> before the jit_sve_1x1_convolution_fwd implementation based on benchmark results showing better performance.
Performance improvements
mb: mini batch size , ic: input channels , S: Stride
Brgconv,
S=1, mb=64, ic=384 ,time= 21.25 ms
S=2, mb=64, ic=384 ,time= 5.48 ms
S=3, mb=64, ic=384 ,time= 2.57 ms
JIT_1X1,
S=1, mb=64, ic=384 ,time= 21.62 ms
S=2, mb=64, ic=384 ,time= 6.16 ms
S=3, mb=64, ic=384 ,time= 5.87 ms
Bug fixes