You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @mobicham, I don't know if we put a lot of effort into supporting float16 inference for models anymore! In general, bfloat16 is the new standard for model training, and float16 is increasingly a legacy format.
If you (or anyone reading this) can find a simple fix that will enable float16 inference for Gemma3, we can probably accept a PR for that, but we won't be able to prioritize debugging it and adding the feature ourselves.
@Rocketknight1 that is not exactly true. The efficient low-bit kernels to run quantized models faster mainly support fp16 not bfp16 (because of some limitations related to atomic addition with bfp16) - this includes gemlite and Marlin in vllm.
I actually found the issue and will do a PR shortly.
System Info
transformers
version: 4.50.0.dev0Who can help?
Gemma 3 works fine with bfloat16 but the output is empty with float16.
@amyeroberts, @qubvel @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Gemma 3 should work with float16 weights too.
The text was updated successfully, but these errors were encountered: