Description
Describe the issue
After optimizing an ONNX model using ONNX Runtime, I encounter an error when running the optimized model. The error message indicates that the QuantizeLinear operator received a zero-point of invalid dtype (int32). This issue does not exist in the original model and is introduced during optimization level 0 (Basic).
Error Message:
Traceback (most recent call last):
ort_session = ort.InferenceSession(
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int32)' of input parameter (dq0_x_zero_point) of operator (QuantizeLinear) in node (r_out_q) is invalid.
I believe the error is introduced during optimization QDQPropagationTransformer
, specifically this line. QuantizeLinear
does not support zero-points of type int32
while DequantizeLinear
does. This line of code inserts a QuantizeLinear
node by copying its matching DequantizeLinear
node's parameters, which causes the incorrect dtype.
To reproduce
Run the below script (note the commented line towards the end):
import onnx.helper as helper
import onnx.numpy_helper as numpy_helper
from onnx import TensorProto
import numpy as np
import onnxruntime as ort
dq0 = helper.make_node(
'DequantizeLinear',
['dq0_x', 'dq0_x_scale', 'dq0_x_zero_point'],
['dq_out'],
)
r = helper.make_node(
'Reshape',
['dq_out', 'shape'],
['r_out'],
)
add = helper.make_node(
'Add',
['in', 'r_out'],
['add_out'],
)
q = helper.make_node(
'QuantizeLinear',
['add_out', 'q_y_scale', 'q_y_zero_point'],
['q_out'],
)
dq1 = helper.make_node(
'DequantizeLinear',
['q_out', 'dq1_x_scale', 'dq1_x_zero_point'],
['out'],
)
initializers = [
numpy_helper.from_array(np.random.rand(1000).astype(np.int32), name='dq0_x'),
numpy_helper.from_array(np.array(0.1, dtype=np.float32), name='dq0_x_scale'),
numpy_helper.from_array(np.array(0, dtype=np.int32), name='dq0_x_zero_point'),
numpy_helper.from_array(np.array(0.1, dtype=np.float32), name='dq1_x_scale'),
numpy_helper.from_array(np.array(0, dtype=np.uint8), name='dq1_x_zero_point'),
numpy_helper.from_array(np.array(0.1, dtype=np.float32), name='q_y_scale'),
numpy_helper.from_array(np.array(0, dtype=np.uint8), name='q_y_zero_point'),
numpy_helper.from_array(np.array([1, 1000], dtype=np.int64), name='shape')
]
graph = helper.make_graph(
nodes=[dq0, r, add, q, dq1],
name='QDQ-bug',
inputs=[helper.make_tensor_value_info('in', TensorProto.FLOAT, [1, 1000])],
outputs=[helper.make_tensor_value_info('out', TensorProto.FLOAT, [1, 1000])],
initializer=initializers
)
opset_imports = [helper.make_operatorsetid("ai.onnx", 20)]
model = helper.make_model(graph, opset_imports=opset_imports)
sess_options = ort.SessionOptions()
##### if you uncomment this line, the code works #######
# sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
ort_session = ort.InferenceSession(
model.SerializeToString(),
sess_options=sess_options
)
Note: changing to opset ai.onnx 21 also mitigates the error. Unclear to me why this is the case since QuantizeLinear
in opset 21 still does not support int32 zero-point...?
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04.4 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response