-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch_dtype is actually used now? #36567
Comments
Yep, that was cause by one of PRs to enable different @ArthurZucker WDYT? If we revert, we either lose dtypes for multimodals or have to manually pass |
FWIW I think its a pretty big change to make silently, with no warnings, no notice in the release, etc. Especially because of this comment that was (still is actually) in the code: transformers/src/transformers/modeling_utils.py Lines 2795 to 2796 in 052e652
|
Also it seems to be different when using different construction APIs
|
Repeating mosaicml/llm-foundry#1734 here. huggingface/transformers#36567 for more context. Also turns out that FSDP wrapping in Composer hangs if you mix dtypes on different ranks. Sorry for the type ignores, they came with the transformers upgrade. Not sure why pyright thinks all the special tokens are strings... ``` In [1]: import transformers In [2]: llamat = transformers.AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct') In [3]: llamat.eos_token_id Out[3]: 128009 In [4]: type(llamat.eos_token_id) Out[4]: int In [5]: transformers.__version__ Out[5]: '4.49.0' ```
Sorry about that, completely missed the notification... it is a big breaking change that should not have made it to the release |
As we did not have much report, we are just gonna update the release notes and the code there ! Thanks a lot @dakinggg |
FWIW, I'd be a bit worried that people just hadn't noticed, because sometimes this will error (e.g. you have different layers of a model initialized differently, and some part of the training stack doesn't like having different dtypes for different params), but if you don't have unit tests to check this, and don't encounter any error cases, you will just accidentally. use bf16 weights instead of fp32 (likely fine if you are just doing inference, and not fine if you are trianing) |
System Info
different transformers versions. see description
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Previously (v4.46.3, didn't check all versions),
torch_dtype
in the config was ignored, meaning that model weights would get loaded in fp32 by default (correct behavior for training). On latest transformers version (v4.49.0), it seems it is now used, and so the weights get loaded with whatever is in the checkpoint. Was this change intentional? I previously recall seeing somewhere in the code that you weren't going to make the change to actually use torch_dtype until v5, and I didn't see anything in release notes at a glance, although maybe I missed it.Expected behavior
Not actually sure, would like to confirm what you expect now.
The text was updated successfully, but these errors were encountered: