-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using the int8 quantization model to convert to onnx, an error occurs during runtime #23879
Comments
If you view the model in Netron and search for that Add node what data types do the inputs/outputs of it have? What version of ONNX Runtime are you using? |
this was my onnx version: C:\Users\30585>pip list | findstr onnx and it input was float16,ouput was float16.but it have smoe int32 value |
Not following where the int32 value comes into it. Are both inputs to the Add float16? |
Okay, maybe we can set aside this issue for now, because I found a similar problem. I used PyTorch to convert BFLOAT16 to float32, then performed PoW calculation, and then converted it to the calculation of multiplying BFLOAT16 with another BFLOAT16. Similar errors occurred during this process. My analysis suggests that if there are two different data types in my model, such as converting one value type during the calculation process and then continuing the calculation, an error will occur. Do you have any good suggestions for solving this error and this is my code
In this code, if I try to modify the last line of |
In this process, since my model benchmark is BF16, but there is a pow calculation in between, I considered exporting my model directly without class conversion, but that would result in an error |
Now, I am trying to replace the original BF16 as the input value with float16 or float32. This will not result in an error, but it seems to have some impact on the original model? Do you have any good suggestions for this conversion loss |
I am trying to convert my model to onnxruntime, and my model itself has been int8 quantized. During the runtime, the following error occurred
NOT_IMPLEMENTED : Could not find an implementation for Add(14) node with name '/self_attn/q_proj/Add'
During the debugging process, I found that after quantization, the linear layer became the QuantLinear layer, where qweight was int32 and bias was float16. Perhaps the error was caused by the different types of the two layers
I passed the model loading check when loading the model, as shown in the following code,it not have error report
onnx.checker.check_model(self.model_path)
Does the onnx model not support converting pytorch's quantization model? If supported, what do I need to do?
The text was updated successfully, but these errors were encountered: