-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot run some network with spike #10
Comments
Hm I've never seen that "Error message: Protobuf parsing failed" before. That syscall seems to correspond to I cloned a fresh version of the repo (on centos7) and couldn't reproduce the issue, which is going to make this a bit harder to debug. (Just as a quick sanity-check, can you run The only significant difference I can see at the moment is that your version of Can you try building
Hopefully this works, if not we can do some more digging (the fact that protobuf parsing fails seems to indicate that it's a pretty fundamental issue unrelated to the gemmini extension). As for the calibration script, it's written to assume that the Also do note that the Also last point for the mxnet derived models is that I've found them to be very finnicky and sensitive to quantization. You might want to first get the non-quantized (i.e. floating point) model running first (for which you'll likely need to write your own runner), at which point you can start playing around with quantization. The accuracy for ResNet models is also a bit poor at the moment because we're limited to power-of-2 scale factors. I've also never really tried the quantization/calibration scripts with non-imagenet models (although they should presumably work since they were modified from microsoft's upstream implementation), so I'd be interested to hear how it turns out! |
Oh another thing you can try is to run it with qemu. If you download qemu and enable riscv-user space emulation via |
Thanks a lot. The md5 value does not match. So I re-download the onnx file with a proxy, and it works.
|
Seems like this is relevant; please read through it (and any associated linked issues therein) and try the suggested fix? |
I have tried the fix, and it gives me another error. It seems that the problem is caused by the network, too. Because I also tried googlenet in onnx model zoo and succeed. So maybe i will build a new network and convert it to onnx format and then quantize it. If I do so, Is there anything I should pay attention to, so that I can avoid some strange problems like what happens during the arcface's conversion? Thank you very much. |
What error did it give? I do recall that mxnet-based models had some batch-normalization version mismatch thing after quantization because they use some attribute that was removed in opsets newer than opset version 7 (and quantized operators require opset 10 or above). If that was the case you could try using this tool to convert it to a newer opset before running the quantization, but as you suggested it's probably a better idea to just export a new model from PyTorch with the latest opset.
I think just making sure to export the latest opset version (anything >= version 10 should be fine) is sufficient. Let me know if you run into any issues though. |
This is the error. It seems that it is not caused by opset verison, because the origin network is opset version 8 and have the same error as the opset version 9 one which is converted by
|
Hm that's very weird. Looks like it's a bug in the quantization script then. I can reproduce this on my end so I'll take a look and see if I can figure out what's going on. If you'd also like to try debugging this, maybe you could surround
with a try-except and set a pdb breakpoint in the except. That way you can print out the parameters and take a look. It seems weird that |
Ok fixed it! It's a 1 line change :) The issue was that the calibration script was returning a non-numpy type in the edge-case where Please keep me posted on your results. I'd be interested to see how well the quantized model performs in terms of accuracy. There's a good chance the accuracy might be off at first due to several reasons
|
Sorry for the late reply. The network now can be quantized. But actually I have not run the network successfully. There are some errors and seems caused by the nerual network.
And I tried to use the |
That's an error that shouldn't happen since we make sure to round to the nearest power. Can you add a Edit: Ok I can reproduce this as well. I'll investigate and see what's happening. Also this is unrelated to the error you got, but it seems that the network takes as input a As for the Finally, with regard to
Maybe it's better to export a clean model from pytorch? It seems the original model might be broken as described in the issue you linked. Alternatively maybe try upgrading the opset version using |
Ok found the issue. It's again an issue with the calibration/quantization script and in the same line as before. In this case Fixed in commit 6475932 (and a further correction to that fix in c057799) After this I can successfully run the model using the runner script (I have not tried post-processing the output so I don't know if it's accurate though). I should probably also update the assertions in the cpp file since |
Good, now I can run the quantized network. But I cannot run the orign network to get a comparison because of the |
Describe the bug
When run onnx models here
https://github.com/pranav-prakash/onnxruntime-riscv/releases/tag/v0.01
, I gotbad syscall #131!
I have tried googlenet_quantized.onnx, mobilenet_quantized_optimized.onnx and resnet50_quantized.onnx, only the resnet could run normally.
System information
To Reproduce
from
https://github.com/pranav-prakash/onnxruntime-riscv/releases/tag/v0.01
spike --extension=gemmini pk ort_test -m googlenet_quantized.onnx -i images/cat.jpg -p caffe2 -x 1 -O 0
Here is the result
Expected behavior
Get some outputs.
Additional context
I also have some other questions. I want to get a network to do face verification. And I found
https://github.com/onnx/models/tree/master/vision/body_analysis/arcface
. But I tried run the onnx file directly, and it gives me the same bad syscall #131. I also tried to quantize this network by runningpython3 calibrate.py --model_path arcfaceresnet100-8.onnx --dataset_path ./ --output_model_path arcfaceresnet100-8_quantized.onnx --static=True --data_preprocess=mxnet --mode=int8
. It also failed.Here is the result
Why I cannot run some network, and why I get the error when try to quantize a network? How to fix them? Thanks for helping.
The text was updated successfully, but these errors were encountered: