You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using AutoModel.from_pretrained("Xenova/yolos-tiny") to load the Yolos model for object detection. Does transformers.js load the model_quantized.onnx by default? Would I be able to load model.onnx?
A related question: Is there a way to check which model is loaded once the model is loaded?
The text was updated successfully, but these errors were encountered:
Right, by default, Transformers.js uses the 8-bit quantized model (model_quantized.onnx). With Transformers.js v2, you can specify { quantized: false } to use the unquantized (fp32) model:
I was trying to analyze the model (in this case - https://huggingface.co/Xenova/detr-resnet-50) , using the python onnx module. How do I verify the quantization type? The onnx module does not report anything for quantization type.
There are 3 models there, model.onnx (fp32), model_fp16 (fp16) and model_quantized (int8). You may open these models with Netron, and look into the details.
For model_fp16, there is a op called cast at the very beginning to cast input data from fp32 to fp16. And in the end, there is another cast to cast output from fp16 to fp32.
For model_quantized, you may see op DynamicQuantizeLinear for quantization, then MatMul is replaced with MatMulInteger.
Question
I am using AutoModel.from_pretrained("Xenova/yolos-tiny") to load the Yolos model for object detection. Does transformers.js load the model_quantized.onnx by default? Would I be able to load model.onnx?
A related question: Is there a way to check which model is loaded once the model is loaded?
The text was updated successfully, but these errors were encountered: