Open
Description
Describe the feature request
Similar to how x86 has optimized convolution kernels written using AVX, for example - SconvKernelAvx512F.S, it would be nice to have SVE/NEON convolution kernels for aarch64.
This would greatly benefit ONNX users running CV models on Mac M-series chips, NVIDIA Grace, AWS Graviton, Fujitsu-Monaka, etc.
I want to understand from Arm ISA experts if this is a planned activity or not; @MaajidKhan @milpuz01 @chenfucn @snnn @yufenglee
Describe scenario use case
Models containing convolutional operators will benefit from these optimized kernels if they are running on aarch64 machines.