Weights not initialized correctly when instantiating model with a pretrained backbone

### System Info

- `transformers` version: 4.51.3
- Platform: macOS-14.4.1-arm64-arm-64bit
- Python version: 3.9.19
- Huggingface_hub version: 0.30.2
- Safetensors version: 0.4.3
- Accelerate version: 1.2.1
- Accelerate config: 	not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.7.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

@amyeroberts 
@qubvel 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I am trying to load a Mask2Former model with a pretrained backbone following [this PR](https://github.com/huggingface/transformers/pull/28214).

However, the backbone weights do not appear to be properly initialized when using ```use_pretrained_backbone=True``` in the config. Here's a minimal example:
```
from transformers import (
    SwinForImageClassification,
    Mask2FormerForUniversalSegmentation,
    Mask2FormerConfig,
)

swin_model_name = "microsoft/swin-tiny-patch4-window7-224"

def params_match(params1, params2):
    return all([(p1 == p2).all() for p1, p2 in zip(params1, params2)])

# load pretrained swin model
swin_model = SwinForImageClassification.from_pretrained(swin_model_name)

# load Mask2Former with a pretrained swin backbone
config = Mask2FormerConfig(
    backbone=swin_model_name,
    use_pretrained_backbone=True,
)
m2f = Mask2FormerForUniversalSegmentation(config)

# AssertionError: parameters don't match
assert params_match(
    swin_model.base_model.encoder.parameters(),
    m2f.model.pixel_level_module.encoder.encoder.parameters(),
)

```
The Swin parameters in Mask2Former do not match those from the separately loaded Swin model, suggesting the backbone was not properly initialized.

However, if I explicitly load the backbone via ```load_backbone``` function, the parameters do match:
```
from transformers.utils.backbone_utils import load_backbone

m2f.model.pixel_level_module.encoder = load_backbone(config)

# Now passes
assert params_match(
    swin_model.base_model.encoder.parameters(),
    m2f.model.pixel_level_module.encoder.encoder.parameters(),
)
```
Could this be caused by the ```post_init()``` method being called during the instantiation of Mask2Former, even if a pretrained backbone is being loaded?

### Expected behavior

As mentioned before, the backbone should be correctly initialized when specifying ```use_pretrained_backbone=True``` in the config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weights not initialized correctly when instantiating model with a pretrained backbone #38061

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Weights not initialized correctly when instantiating model with a pretrained backbone #38061

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Activity

Rocketknight1 commented on May 12, 2025

bvantuan commented on Jun 1, 2025

github-actions commented on Jun 25, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions