Description
System Info
transformers
version: 4.48.0.dev0- Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.12.5
- Huggingface_hub version: 0.25.1
- Safetensors version: 0.4.5
- Accelerate version: 0.34.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA GeForce RTX 4090
Who can help?
@ArthurZuc
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I have a use case that requires that model weights always be encrypted when in local storage and only be decrypted in memory. As a result, it is not an option to use model.from_pretrained(dir)
.
Instead, my workaround has been to do:
import msgspec
from pyfakefs.fake_filesystem_unittest import Patcher as ffspatcher
from transformers import AutoConfig, AutoModelForSequenceClassification, PreTrainedModel
weights = {...} # Deserialized to `dict` from an encrypted file elsewhere.
config = {...} # Deserialized to `dict` from an encrypted file elsewhere.
json_encoder = msgspec.json.encode
with ffspatcher() as patcher:
fakepath = f'FAKE_FILE_SYSTEM://config.json'
patcher.fs.create_file(fakepath, contents = json_encoder(config))
config = AutoConfig.from_pretrained(fakepath)
model: PreTrainedModel = AutoModelForSequenceClassification.from_config(config)
model.load_state_dict(weights)
The problem I've noticed, however, is that when I serialize my config like so:
config = model.config.to_diff_dict()
The resulting config includes the key _attn_implementation_autoset
set to True
whereas the actual config of the model does not include that key and as a result when I try loading the config with AutoConfig.from_pretrained()
, it ends up not using the default attention implementation for my model, SDPA, delivering effectively a different model with different logits.
My current hotfix is to just delete the key _attn_implementation_autoset
from all of my configs. But is it really necessary to add that key to to_diff_dict()
when it is not added when you do save_pretrained()
?
Expected behavior
I get the same model in a reproduciable way as when I save the config with to_diff_dict()
vs save_pretrained()
.