Skip to content

[Mobile] MatMulNbits Q8 Errors out on Android #24769

Open
@DakeQQ

Description

@DakeQQ

Describe the issue

Hi everyone,

I'm using the matmul_nbits_quantizer with an 8-bit setting and everything runs fine during quantization—no errors on my side. However, when I run the resulting model on an Android device, it crashes with the following error message:

[E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running MatMulNBits node. Name:'/k_proj/MatMul_Q8' Status Message: /home/iamj/Downloads/onnxruntime/onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc:442 Status onnxruntime::contrib::MatMulNBits<float>::ComputeBUnpacked(const Tensor *, const Tensor *, const Tensor *, const Tensor *, const Tensor *, const Tensor *, Tensor *, AllocatorPtr &, concurrency::ThreadPool *, const MatMulComputeHelper &) const [T1 = float] nbits_ == 4 was false. Only 4b quantization is supported for unpacked compute.

It works well with 4-bits quanted model.

Details about my setup:

  • ONNX Runtime version: 1.22.0 (Android libonnxruntime.so built from source)
  • Quantization config: bits=8
  • Quantizer initialization:
quant_config.bits = 8
quant = matmul_nbits_quantizer.MatMulNBitsQuantizer(
    model,
    block_size=32,
    is_symmetric=False,
    accuracy_level=4,
    quant_format=quant_utils.QuantFormat.QOperator,
    algo_config=quant_config,
    nodes_to_exclude=None
)
quant.process()
quant.model.save_model_to_file(
    quanted_model_path,
    True
)

Thanks in advance for your help!


To reproduce

//

Urgency

//

Platform

Android

OS Version

14

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

cmake=3.31.6, NDK=26.3

Package Name (if 'Released Package')

None

ONNX Runtime Version or Commit ID

1.22.0

ONNX Runtime API

C++/C

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

1.22.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    platform:mobileissues related to ONNX Runtime mobile; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions