Closed as not planned
Description
Describe the issue
A quantized ONNX model that works correctly with ONNX Runtime 1.19.1 now fails during inference in version 1.21.1 with the following error:
[E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running
QLinearConv node. Name:'/features/features.0/Conv_token_1' Status Message:
/onnxruntime_src/onnxruntime/core/providers/cpu/quantization/qlinearconv.cc:67 static void
onnxruntime::QLinearConv<ActType>::ComputeOffset(onnxruntime::OpKernelContext*, int64_t, ActType&,
ActType&, uint8_t&) [with ActType = unsigned char; int64_t = long int; uint8_t = unsigned char]
IsScalarOr1ElementVector(Y_zero_point) was false. QLinearConv : result zero point must be a scalar or 1D tensor of
size 1
ONNX version: 1.17.0
To reproduce
To reproduce the issue, run the following code:
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("test_model.onnx", providers=["CPUExecutionProvider"])
output = sess.run(None, {"input.0": np.random.normal(size=(1, 1, 20, 20)).astype(np.float32)})
print(output)
- 1.19.1 – inference works
- 1.21.1 – inference fails with the error above
Additionally, I’ve found that if I disable ONNX graph optimization, inference works in 1.21.1. This suggests that the issue may be related to optimizations applied to quantized model.
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
Urgency
There is a regression in ONNX Runtime
Platform
Linux
OS Version
Ubuntu Ubuntu 20.04.6 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.21.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response