Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoModel.from_pretrained - Which model is loaded #849

Open
MayuraRam opened this issue Jul 16, 2024 · 3 comments
Open

AutoModel.from_pretrained - Which model is loaded #849

MayuraRam opened this issue Jul 16, 2024 · 3 comments
Labels
question Further information is requested

Comments

@MayuraRam
Copy link

Question

I am using AutoModel.from_pretrained("Xenova/yolos-tiny") to load the Yolos model for object detection. Does transformers.js load the model_quantized.onnx by default? Would I be able to load model.onnx?

A related question: Is there a way to check which model is loaded once the model is loaded?

@MayuraRam MayuraRam added the question Further information is requested label Jul 16, 2024
@xenova
Copy link
Owner

xenova commented Jul 17, 2024

Right, by default, Transformers.js uses the 8-bit quantized model (model_quantized.onnx). With Transformers.js v2, you can specify { quantized: false } to use the unquantized (fp32) model:

AutoModel.from_pretrained("Xenova/yolos-tiny", { quantized: false })

In Transformers.js v3, this will change to be dtype:

AutoModel.from_pretrained("Xenova/yolos-tiny", { dtype: 'q8' }) // or 'fp32' or 'fp16' or ...

@MayuraRam
Copy link
Author

MayuraRam commented Aug 1, 2024

I was trying to analyze the model (in this case - https://huggingface.co/Xenova/detr-resnet-50) , using the python onnx module. How do I verify the quantization type? The onnx module does not report anything for quantization type.

@gyagp
Copy link

gyagp commented Aug 9, 2024

There are 3 models there, model.onnx (fp32), model_fp16 (fp16) and model_quantized (int8). You may open these models with Netron, and look into the details.
For model_fp16, there is a op called cast at the very beginning to cast input data from fp32 to fp16. And in the end, there is another cast to cast output from fp16 to fp32.
For model_quantized, you may see op DynamicQuantizeLinear for quantization, then MatMul is replaced with MatMulInteger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants