Skip to content

[Mobile] Dynamic Shape Challenge: Enabling LLM on QNN-HTP #23832

Open
@DakeQQ

Description

@DakeQQ

Describe the issue

In the ONNX Runtime documentation, it's stated that QNN-HTP doesn't support dynamic shapes.

My questions are:

  1. Are the Slice, Split, and Gather operators considered dynamic shapes?
  2. Is there a plan to update the dynamic shape computation? The QNN document now declares support for it.

According to both the ONNX Runtime and Qualcomm official documentation, QNN-HTP supports these operators. However, we encountered a failure during the compilation stage when testing an LLM model or a simple test model with the correct ONNX Runtime and QNN-HTP backend settings (Android 14, ORT==1.20.1, QNN_SDK=2.28.*, HTP=v73, SOC=43).

We aim to run an LLM model on QNN-HTP, but we're having trouble with dynamic input/output.

Attempts made:

  1. Using a dynamic shape input tensor resulted in expected compilation failure.
  2. Using a static shape input tensor with Slice, Split, and Gather operators also failed during compilation.
  3. Using a static large shape input tensor with a mask (processing the full tensor regardless of input word count) led to heavy computation and extremely low performance.
  4. Modifying the code to process with static shapes on HTP while moving other operations to the CPU resulted in processing one token at a time during prefill stage, which was extremely slow.

We are looking for solutions or ideas to address these issues and believe the Microsoft team is brilliant and knowledgeable about ONNX Runtime.

To reproduce

# Export the Slice model for testing Android-QNN-HTP backend.

import torch
import torch.nn as nn


class SliceModel(nn.Module):
    def forward(self, data, start):
        return data[:, :, :, start:]   # Slice the data from the 'start' to the end.


# Instantiate the model
model = SliceModel()
exported_path = "slice_model.onnx"

# Create dummy inputs:
data = torch.zeros((1, 1, 1, 100), dtype=torch.int16)   # NCHW format
start = torch.tensor([10], dtype=torch.int64)

# Export the model to ONNX
torch.onnx.export(
    model,
    (data, start),
    exported_path,
    input_names=["data", "start"],
    output_names=["slice_output"],
    opset_version=17
)

print("Exported the model to slice_model.onnx")

Urgency

No response

Platform

Android

OS Version

14

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

CMake=3.31.2, NDK=26.3.*

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

C++/C

Architecture

ARM64

Execution Provider

Other / Unknown, SNPE

Execution Provider Library Version

QNN SDK 2.28.*

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:QNNissues related to QNN exeution providerep:SNPEissues related to SNPE execution providerplatform:mobileissues related to ONNX Runtime mobile; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions