Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lm_head.weight missing from convert_mistral_weights_to_hf.STATE_DICT_MAPPING #36908

Open
2 of 4 tasks
jamesbraza opened this issue Mar 23, 2025 · 2 comments
Open
2 of 4 tasks
Labels

Comments

@jamesbraza
Copy link
Contributor

System Info

  • transformers version: 4.49.0
  • Platform: Linux-5.15.0-112-generic-x86_64-with-glibc2.35
  • Python version: 3.12.9
  • Huggingface_hub version: 0.29.2
  • Safetensors version: 0.5.3
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: 0.15.3
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA H100 80GB HBM3

Who can help?

@younesbelkada @Cyrilvallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am working with https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501, and seem to be unable to pass it into convert_mistral_weights_to_hf.py (keep reading).

Here's the model.safetensors.index.json:

In [1]: import json

In [2]: with open("model.safetensors.index.json") as f:
   ...:     index = json.load(f)

In [3]: index
Out[3]:
{'metadata': {'total_size': 47144806400},
 'weight_map': {'lm_head.weight': 'model-00010-of-00010.safetensors',
  'model.embed_tokens.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.input_layernorm.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.mlp.down_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.mlp.gate_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.mlp.up_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.post_attention_layernorm.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.k_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.o_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.q_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.v_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.1.input_layernorm.weight': 'model-00001-of-00010.safetensors',
  'model.layers.1.mlp.down_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.1.mlp.gate_proj.weight': 'model-00001-of-00010.safetensors',
  ...,
  'model.norm.weight': 'model-00010-of-00010.safetensors'}}

Then running:

> python /path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py /path/to/checkpoint .
Traceback (most recent call last):
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 282, in <module>
    main()
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 277, in main
    convert_and_write_model(args.input_dir, args.output_dir, args.max_position_embeddings, args.modules_are_split)
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 205, in convert_and_write_model
    new_dict = convert_state_dict(original_state_dict, config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 87, in convert_state_dict
    new_key = map_old_key_to_new(old_key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 59, in map_old_key_to_new
    raise ValueError(f"Key: {old_key} could not be mapped (check the mapping).")
ValueError: Key: lm_head.weight could not be mapped (check the mapping).

Expected behavior

The lm_head.weight to work with convert_mistral_weights_to_hf.py

@jamesbraza jamesbraza added the bug label Mar 23, 2025
@Cyrilvallez
Copy link
Member

Well, you are trying to use the converter on weights that are already converted, so this is expected to fail 😁

@jamesbraza
Copy link
Contributor Author

Haha I see, and thanks for the response. Perhaps we should update that ValueError's wording to mention this possibility.

To be honest, I don't quite understand yet, I thought the purpose of this script was to convert many individual model safetensors to consolidated.safetensors. Am I misunderstanding?

config.json
generation_config.json
model-00001-of-00010.safetensors
model-00002-of-00010.safetensors
model-00003-of-00010.safetensors
model-00004-of-00010.safetensors
model-00005-of-00010.safetensors
model-00006-of-00010.safetensors
model-00007-of-00010.safetensors
model-00008-of-00010.safetensors
model-00009-of-00010.safetensors
model-00010-of-00010.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants