[Feature Request] Multi-Head Latent Attention(DeepSeek) support on CPU/NPU

### Describe the feature request

DeepSeek's models use Multi-head Latent Attention, the current ONNX model [https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX](url) release leverages GroupQueryAttention.

Is MLA on roadmap for ONNXRT? 

### Describe scenario use case

Lower KV cache footprint with Multi-Head Latent Attention improving mobile and edge inference