Description
Describe the issue
In the ONNX Runtime documentation, it's stated that QNN-HTP doesn't support dynamic shapes.
My questions are:
- Are the Slice, Split, and Gather operators considered dynamic shapes?
- Is there a plan to update the dynamic shape computation? The QNN document now declares support for it.
According to both the ONNX Runtime and Qualcomm official documentation, QNN-HTP supports these operators. However, we encountered a failure during the compilation stage when testing an LLM model or a simple test model with the correct ONNX Runtime and QNN-HTP backend settings (Android 14, ORT==1.20.1, QNN_SDK=2.28.*, HTP=v73, SOC=43)
.
We aim to run an LLM model on QNN-HTP, but we're having trouble with dynamic input/output.
Attempts made:
- Using a dynamic shape input tensor resulted in expected compilation failure.
- Using a static shape input tensor with Slice, Split, and Gather operators also failed during compilation.
- Using a static large shape input tensor with a mask (processing the full tensor regardless of input word count) led to heavy computation and extremely low performance.
- Modifying the code to process with static shapes on HTP while moving other operations to the CPU resulted in processing one token at a time during prefill stage, which was extremely slow.
We are looking for solutions or ideas to address these issues and believe the Microsoft team is brilliant and knowledgeable about ONNX Runtime.
To reproduce
# Export the Slice model for testing Android-QNN-HTP backend.
import torch
import torch.nn as nn
class SliceModel(nn.Module):
def forward(self, data, start):
return data[:, :, :, start:] # Slice the data from the 'start' to the end.
# Instantiate the model
model = SliceModel()
exported_path = "slice_model.onnx"
# Create dummy inputs:
data = torch.zeros((1, 1, 1, 100), dtype=torch.int16) # NCHW format
start = torch.tensor([10], dtype=torch.int64)
# Export the model to ONNX
torch.onnx.export(
model,
(data, start),
exported_path,
input_names=["data", "start"],
output_names=["slice_output"],
opset_version=17
)
print("Exported the model to slice_model.onnx")
Urgency
No response
Platform
Android
OS Version
14
ONNX Runtime Installation
Built from Source
Compiler Version (if 'Built from Source')
CMake=3.31.2, NDK=26.3.*
Package Name (if 'Released Package')
onnxruntime-android
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
C++/C
Architecture
ARM64
Execution Provider
Other / Unknown, SNPE
Execution Provider Library Version
QNN SDK 2.28.*