Skip to content

[Bug] Invalid type for QuantizeLinear dtype post-ORT optimizations #25001

Open
@ShabnamSheikhha

Description

@ShabnamSheikhha

Describe the issue

After optimizing an ONNX model using ONNX Runtime, I encounter an error when running the optimized model. The error message indicates that the QuantizeLinear operator received a zero-point of invalid dtype (int32). This issue does not exist in the original model and is introduced during optimization level 0 (Basic).

Error Message:

Traceback (most recent call last):
    ort_session = ort.InferenceSession(
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int32)' of input parameter (dq0_x_zero_point) of operator (QuantizeLinear) in node (r_out_q) is invalid.

I believe the error is introduced during optimization QDQPropagationTransformer, specifically this line. QuantizeLinear does not support zero-points of type int32 while DequantizeLinear does. This line of code inserts a QuantizeLinear node by copying its matching DequantizeLinear node's parameters, which causes the incorrect dtype.

To reproduce

Run the below script (note the commented line towards the end):

import onnx.helper as helper
import onnx.numpy_helper as numpy_helper
from onnx import TensorProto
import numpy as np
import onnxruntime as ort

dq0 = helper.make_node(
    'DequantizeLinear',
    ['dq0_x', 'dq0_x_scale', 'dq0_x_zero_point'],
    ['dq_out'],
)

r = helper.make_node(
    'Reshape',
    ['dq_out', 'shape'],
    ['r_out'],
)

add = helper.make_node(
    'Add',
    ['in', 'r_out'],
    ['add_out'],
)

q = helper.make_node(
    'QuantizeLinear',
    ['add_out', 'q_y_scale', 'q_y_zero_point'],
    ['q_out'],
)

dq1 = helper.make_node(
    'DequantizeLinear',
    ['q_out', 'dq1_x_scale', 'dq1_x_zero_point'],
    ['out'],
)

initializers = [
    numpy_helper.from_array(np.random.rand(1000).astype(np.int32), name='dq0_x'),
    numpy_helper.from_array(np.array(0.1, dtype=np.float32), name='dq0_x_scale'),
    numpy_helper.from_array(np.array(0, dtype=np.int32), name='dq0_x_zero_point'),
    numpy_helper.from_array(np.array(0.1, dtype=np.float32), name='dq1_x_scale'),
    numpy_helper.from_array(np.array(0, dtype=np.uint8), name='dq1_x_zero_point'),
    numpy_helper.from_array(np.array(0.1, dtype=np.float32), name='q_y_scale'),
    numpy_helper.from_array(np.array(0, dtype=np.uint8), name='q_y_zero_point'),
    numpy_helper.from_array(np.array([1, 1000], dtype=np.int64), name='shape')
]

graph = helper.make_graph(
    nodes=[dq0, r, add, q, dq1],
    name='QDQ-bug',
    inputs=[helper.make_tensor_value_info('in', TensorProto.FLOAT, [1, 1000])],
    outputs=[helper.make_tensor_value_info('out', TensorProto.FLOAT, [1, 1000])],
    initializer=initializers
)
opset_imports = [helper.make_operatorsetid("ai.onnx", 20)]
model = helper.make_model(graph, opset_imports=opset_imports)

sess_options = ort.SessionOptions()
##### if you uncomment this line, the code works #######
# sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
ort_session = ort.InferenceSession(
    model.SerializeToString(),
    sess_options=sess_options
)

Note: changing to opset ai.onnx 21 also mitigates the error. Unclear to me why this is the case since QuantizeLinear in opset 21 still does not support int32 zero-point...?

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.4 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    quantizationissues related to quantization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions