[Feature Request] Optimized Convolution kernels for aarch64

### Describe the feature request

Similar to how x86 has optimized convolution kernels written using AVX, for example - [SconvKernelAvx512F.S](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/mlas/lib/x86_64/SconvKernelAvx512F.S), it would be nice to have SVE/NEON convolution kernels for aarch64.
This would greatly benefit ONNX users running CV models on Mac M-series chips, NVIDIA Grace, AWS Graviton, Fujitsu-Monaka, etc.
I want to understand from Arm ISA experts if this is a planned activity or not; @MaajidKhan @milpuz01  @chenfucn  @snnn  @yufenglee  

### Describe scenario use case

Models containing convolutional operators will benefit from these optimized kernels if they are running on aarch64 machines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Optimized Convolution kernels for aarch64 #24790

Describe the feature request

Describe scenario use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Optimized Convolution kernels for aarch64 #24790

Description

Describe the feature request

Describe scenario use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions