Skip to content

BERT is broken on v4.49.0-Gemma-3 #36802

Closed
@koute

Description

@koute

System Info

I'm using pytorch 2.6.0 on Linux.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Consider the following code:

import transformers

model = transformers.AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-cased")
tokenizer = transformers.AutoTokenizer.from_pretrained("google-bert/bert-base-cased")

text = "Hi! My name is Stinky Bob and I'm a [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs["logits"][0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(predicted_token)

On v4.49.0-Gemma-3 it produces the following output:

man

On 4.49.0, 4.48.0, 4.40.0, 4.30.0 it produces the following output:

friend

As far as I can see it breaks on v4.49.0-Gemma-3 because the checkpoint loading is broken, and it doesn't load the weights for the model.cls.predictions.transform.LayerNorm properly (the weights are just default initialized).

Expected behavior

I expect the BERT weights to be properly loaded and the output consistent with the previous version of transformers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions