`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING` #36908

jamesbraza · 2025-03-23T01:53:06Z

System Info

transformers version: 4.49.0
Platform: Linux-5.15.0-112-generic-x86_64-with-glibc2.35
Python version: 3.12.9
Huggingface_hub version: 0.29.2
Safetensors version: 0.5.3
Accelerate version: 1.4.0
Accelerate config: not found
DeepSpeed version: 0.15.3
PyTorch version (GPU?): 2.6.0+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA H100 80GB HBM3

Who can help?

@younesbelkada @Cyrilvallez

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I am working with https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501, and seem to be unable to pass it into convert_mistral_weights_to_hf.py (keep reading).

Here's the model.safetensors.index.json:

In [1]: import json

In [2]: with open("model.safetensors.index.json") as f:
   ...:     index = json.load(f)

In [3]: index
Out[3]:
{'metadata': {'total_size': 47144806400},
 'weight_map': {'lm_head.weight': 'model-00010-of-00010.safetensors',
  'model.embed_tokens.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.input_layernorm.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.mlp.down_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.mlp.gate_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.mlp.up_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.post_attention_layernorm.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.k_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.o_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.q_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.0.self_attn.v_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.1.input_layernorm.weight': 'model-00001-of-00010.safetensors',
  'model.layers.1.mlp.down_proj.weight': 'model-00001-of-00010.safetensors',
  'model.layers.1.mlp.gate_proj.weight': 'model-00001-of-00010.safetensors',
  ...,
  'model.norm.weight': 'model-00010-of-00010.safetensors'}}

Then running:

> python /path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py /path/to/checkpoint .
Traceback (most recent call last):
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 282, in <module>
    main()
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 277, in main
    convert_and_write_model(args.input_dir, args.output_dir, args.max_position_embeddings, args.modules_are_split)
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 205, in convert_and_write_model
    new_dict = convert_state_dict(original_state_dict, config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 87, in convert_state_dict
    new_key = map_old_key_to_new(old_key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 59, in map_old_key_to_new
    raise ValueError(f"Key: {old_key} could not be mapped (check the mapping).")
ValueError: Key: lm_head.weight could not be mapped (check the mapping).

Expected behavior

The lm_head.weight to work with convert_mistral_weights_to_hf.py

The text was updated successfully, but these errors were encountered:

Cyrilvallez · 2025-03-24T10:56:26Z

Well, you are trying to use the converter on weights that are already converted, so this is expected to fail 😁

jamesbraza · 2025-03-24T16:58:27Z

Haha I see, and thanks for the response. Perhaps we should update that ValueError's wording to mention this possibility.

To be honest, I don't quite understand yet, I thought the purpose of this script was to convert many individual model safetensors to consolidated.safetensors. Am I misunderstanding?

config.json
generation_config.json
model-00001-of-00010.safetensors
model-00002-of-00010.safetensors
model-00003-of-00010.safetensors
model-00004-of-00010.safetensors
model-00005-of-00010.safetensors
model-00006-of-00010.safetensors
model-00007-of-00010.safetensors
model-00008-of-00010.safetensors
model-00009-of-00010.safetensors
model-00010-of-00010.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json

jamesbraza added the bug label Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING` #36908

`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING` #36908

jamesbraza commented Mar 23, 2025

Cyrilvallez commented Mar 24, 2025

jamesbraza commented Mar 24, 2025

lm_head.weight missing from convert_mistral_weights_to_hf.STATE_DICT_MAPPING #36908

lm_head.weight missing from convert_mistral_weights_to_hf.STATE_DICT_MAPPING #36908

Comments

jamesbraza commented Mar 23, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Cyrilvallez commented Mar 24, 2025

jamesbraza commented Mar 24, 2025

`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING` #36908

`lm_head.weight` missing from `convert_mistral_weights_to_hf.STATE_DICT_MAPPING` #36908