Open
Description
System Info
transformers
version: 4.51.3- Platform: macOS-14.4.1-arm64-arm-64bit
- Python version: 3.9.19
- Huggingface_hub version: 0.30.2
- Safetensors version: 0.4.3
- Accelerate version: 1.2.1
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.7.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
Who can help?
Information
- The official example scriptsMy own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...)My own task or dataset (give details below)
Reproduction
I am trying to load a Mask2Former model with a pretrained backbone following this PR.
However, the backbone weights do not appear to be properly initialized when using use_pretrained_backbone=True
in the config. Here's a minimal example:
from transformers import (
SwinForImageClassification,
Mask2FormerForUniversalSegmentation,
Mask2FormerConfig,
)
swin_model_name = "microsoft/swin-tiny-patch4-window7-224"
def params_match(params1, params2):
return all([(p1 == p2).all() for p1, p2 in zip(params1, params2)])
# load pretrained swin model
swin_model = SwinForImageClassification.from_pretrained(swin_model_name)
# load Mask2Former with a pretrained swin backbone
config = Mask2FormerConfig(
backbone=swin_model_name,
use_pretrained_backbone=True,
)
m2f = Mask2FormerForUniversalSegmentation(config)
# AssertionError: parameters don't match
assert params_match(
swin_model.base_model.encoder.parameters(),
m2f.model.pixel_level_module.encoder.encoder.parameters(),
)
The Swin parameters in Mask2Former do not match those from the separately loaded Swin model, suggesting the backbone was not properly initialized.
However, if I explicitly load the backbone via load_backbone
function, the parameters do match:
from transformers.utils.backbone_utils import load_backbone
m2f.model.pixel_level_module.encoder = load_backbone(config)
# Now passes
assert params_match(
swin_model.base_model.encoder.parameters(),
m2f.model.pixel_level_module.encoder.encoder.parameters(),
)
Could this be caused by the post_init()
method being called during the instantiation of Mask2Former, even if a pretrained backbone is being loaded?
Expected behavior
As mentioned before, the backbone should be correctly initialized when specifying use_pretrained_backbone=True
in the config.
Activity
Rocketknight1 commentedon May 12, 2025
cc @qubvel and maybe @NielsRogge?
bvantuan commentedon Jun 1, 2025
Hi @Rocketknight1 and @matteot11, I opened a PR to fix this issue. I’d love to hear your feedback.
github-actions commentedon Jun 25, 2025
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.