When using the int8 quantization model to convert to onnx, an error occurs during runtime

I am trying to convert my model to onnxruntime, and my model itself has been int8 quantized. During the runtime, the following error occurred
`NOT_IMPLEMENTED : Could not find an implementation for Add(14) node with name '/self_attn/q_proj/Add'`

<img width="895" alt="Image" src="https://github.com/user-attachments/assets/5b04a9d2-dbd6-40d7-bd02-ef583132e0dd" />

During the debugging process, I found that after quantization, the linear layer became the QuantLinear layer, where qweight was int32 and bias was float16. Perhaps the error was caused by the different types of the two layers


I passed the model loading check when loading the model, as shown in the following code，it not have error report

`onnx.checker.check_model(self.model_path)`

Does the onnx model not support converting pytorch's quantization model? If supported, what do I need to do?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When using the int8 quantization model to convert to onnx, an error occurs during runtime #23879

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When using the int8 quantization model to convert to onnx, an error occurs during runtime #23879

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions