Closed
Description
System Info
I'm using pytorch 2.6.0 on Linux.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Consider the following code:
import transformers
model = transformers.AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")
tokenizer = transformers.AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
text = "Hi! My name is Stinky Bob and I'm a [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs["logits"][0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(predicted_token)
On v4.49.0-Gemma-3
it produces the following output:
man
On 4.49.0
, 4.48.0
, 4.40.0
, 4.30.0
it produces the following output:
friend
As far as I can see it breaks on v4.49.0-Gemma-3
because the checkpoint loading is broken, and it doesn't load the weights for the model.cls.predictions.transform.LayerNorm
properly (the weights are just default initialized).
Expected behavior
I expect the BERT weights to be properly loaded and the output consistent with the previous version of transformers.