[Mobile] Dynamic Shape Challenge: Enabling LLM on QNN-HTP

### Describe the issue

In the ONNX Runtime [documentation](https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html), it's stated that QNN-HTP doesn't support dynamic shapes. 

**My questions are:**
1. Are the Slice, Split, and Gather operators considered dynamic shapes?
2. Is there a plan to update the dynamic shape computation? The [QNN document](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/HtpOpDefSupplement.html#stridedslice) now declares support for it.

According to both the ONNX Runtime and Qualcomm official documentation, QNN-HTP supports these operators. However, we encountered a failure during the compilation stage when testing an LLM model or a simple test model with the correct ONNX Runtime and QNN-HTP backend settings `(Android 14, ORT==1.20.1, QNN_SDK=2.28.*, HTP=v73, SOC=43)`. 

**We aim to run an LLM model on QNN-HTP, but we're having trouble with dynamic input/output.**

Attempts made:

1. Using a dynamic shape input tensor resulted in expected compilation failure.
2. Using a static shape input tensor with Slice, Split, and Gather operators also failed during compilation.
3. Using a static large shape input tensor with a mask (processing the full tensor regardless of input word count) led to heavy computation and extremely low performance.
4. Modifying the code to process with static shapes on HTP while moving other operations to the CPU resulted in processing one token at a time during prefill stage, which was extremely slow.

**We are looking for solutions or ideas to address these issues and believe the Microsoft team is brilliant and knowledgeable about ONNX Runtime.**

### To reproduce

```python

# Export the Slice model for testing Android-QNN-HTP backend.

import torch
import torch.nn as nn


class SliceModel(nn.Module):
    def forward(self, data, start):
        return data[:, :, :, start:]   # Slice the data from the 'start' to the end.


# Instantiate the model
model = SliceModel()
exported_path = "slice_model.onnx"

# Create dummy inputs:
data = torch.zeros((1, 1, 1, 100), dtype=torch.int16)   # NCHW format
start = torch.tensor([10], dtype=torch.int64)

# Export the model to ONNX
torch.onnx.export(
    model,
    (data, start),
    exported_path,
    input_names=["data", "start"],
    output_names=["slice_output"],
    opset_version=17
)

print("Exported the model to slice_model.onnx")
```

### Urgency

_No response_

### Platform

Android

### OS Version

14

### ONNX Runtime Installation

Built from Source

### Compiler Version (if 'Built from Source')

CMake=3.31.2, NDK=26.3.*

### Package Name (if 'Released Package')

onnxruntime-android

### ONNX Runtime Version or Commit ID

1.20.1

### ONNX Runtime API

C++/C

### Architecture

ARM64

### Execution Provider

Other / Unknown, SNPE

### Execution Provider Library Version

QNN SDK 2.28.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Mobile] Dynamic Shape Challenge: Enabling LLM on QNN-HTP #23832

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Mobile] Dynamic Shape Challenge: Enabling LLM on QNN-HTP #23832

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions