Skip to content

model.config.to_diff_dict() delivers different result to model.save_pretrained() #35426

Closed
@umarbutler

Description

@umarbutler

System Info

  • transformers version: 4.48.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.12.5
  • Huggingface_hub version: 0.25.1
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA GeForce RTX 4090

Who can help?

@ArthurZuc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I have a use case that requires that model weights always be encrypted when in local storage and only be decrypted in memory. As a result, it is not an option to use model.from_pretrained(dir).

Instead, my workaround has been to do:

import msgspec
from pyfakefs.fake_filesystem_unittest import Patcher as ffspatcher
from transformers import AutoConfig, AutoModelForSequenceClassification, PreTrainedModel

weights = {...} # Deserialized to `dict` from an encrypted file elsewhere.
config = {...} # Deserialized to `dict` from an encrypted file elsewhere.

json_encoder = msgspec.json.encode

with ffspatcher() as patcher:
    fakepath = f'FAKE_FILE_SYSTEM://config.json'
    patcher.fs.create_file(fakepath, contents = json_encoder(config))
    config = AutoConfig.from_pretrained(fakepath)

model: PreTrainedModel = AutoModelForSequenceClassification.from_config(config)
model.load_state_dict(weights)

The problem I've noticed, however, is that when I serialize my config like so:

config = model.config.to_diff_dict()

The resulting config includes the key _attn_implementation_autoset set to True whereas the actual config of the model does not include that key and as a result when I try loading the config with AutoConfig.from_pretrained(), it ends up not using the default attention implementation for my model, SDPA, delivering effectively a different model with different logits.

My current hotfix is to just delete the key _attn_implementation_autoset from all of my configs. But is it really necessary to add that key to to_diff_dict() when it is not added when you do save_pretrained()?

Expected behavior

I get the same model in a reproduciable way as when I save the config with to_diff_dict() vs save_pretrained().

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions