Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size mismatch for lm_head when fintune QWEN2.5 #36550

Closed
2 of 4 tasks
minmie opened this issue Mar 5, 2025 · 8 comments
Closed
2 of 4 tasks

size mismatch for lm_head when fintune QWEN2.5 #36550

minmie opened this issue Mar 5, 2025 · 8 comments
Labels

Comments

@minmie
Copy link

minmie commented Mar 5, 2025

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • transformers version: 4.49.0
  • Platform: Linux-6.6.0-72.0.0.64.oe2403.x86_64-x86_64-with-glibc2.38
  • Python version: 3.10.16
  • Huggingface_hub version: 0.29.1
  • Safetensors version: 0.5.3
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.2.2+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA L40

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I finetune qwen2.5 using follow code:

from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

dataset = load_dataset("trl-lib/Capybara", split="train")
dataset = dataset.select(range(500))
MODEL_ID = 'Qwen/Qwen2.5-0.5B'
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules="all-linear",
    modules_to_save=["lm_head", "embed_token"],
    task_type="CAUSAL_LM",
)
args = SFTConfig(
    output_dir="Qwen2.5-0.5B-SFT-Capybara",  # directory to save and repository id
    num_train_epochs=1,  # number of training epochs
    per_device_train_batch_size=4,  # batch size per device during training
    gradient_accumulation_steps=4,  # number of steps before performing a backward/update pass
    gradient_checkpointing=True,  # use gradient checkpointing to save memory
    optim="adamw_torch_fused",  # use fused adamw optimizer
    logging_steps=10,  # log every 10 steps
    save_strategy="epoch",  # save checkpoint every epoch
    bf16=True,  # use bfloat16 precision
    tf32=True,  # use tf32 precision
    learning_rate=2e-4,  # learning rate, based on QLoRA paper
    max_grad_norm=0.3,  # max gradient norm based on QLoRA paper
    warmup_ratio=0.03,  # warmup ratio based on QLoRA paper
    lr_scheduler_type="constant",  # use constant learning rate scheduler
    push_to_hub=False,  # push model to hub
    # report_to="tensorboard",  # report metrics to tensorboard
)

trainer = SFTTrainer(
    MODEL_ID,
    train_dataset=dataset,
    args=args,
    peft_config=peft_config
)

trainer.train()
print('end')

and I use follow code to inference:

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, pipeline
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "/home/chenjq/pythonWork/nlp/Qwen2.5-0.5B-SFT-Capybara/checkpoint-31"
# peft_model_id = args.output_dir
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_id,
    device_map="auto",
    torch_dtype=torch.float16
)

prompt = "3的5倍是多少"
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=200
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
print(1)

an error occur when load model with AutoPeftModelForCausalLM:


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Traceback (most recent call last):
  File "/home/chenjq/.pycharm_helpers/pydev/pydevd.py", line 1500, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/chenjq/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/chenjq/pythonWork/nlp/test14.py", line 11, in <module>
    model = AutoPeftModelForCausalLM.from_pretrained(
  File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/auto.py", line 130, in from_pretrained
    return cls._target_peft_class.from_pretrained(
  File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/peft_model.py", line 581, in from_pretrained
    load_result = model.load_adapter(
  File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/peft_model.py", line 1239, in load_adapter
    load_result = set_peft_model_state_dict(
  File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 451, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 896]) from checkpoint, the shape in current model is torch.Size([151665, 896]).

Process finished with exit code 1

Expected behavior

expecte model can predict normally.

@minmie minmie added the bug label Mar 5, 2025
@Rocketknight1
Copy link
Member

Hi @minmie, it's possible there's a bug here, but can you simplify the reproducer code a little bit? For example, does making a PEFT adapter for a Qwen result in the bug, or do you need Trainer as well?

@minmie
Copy link
Author

minmie commented Mar 7, 2025

hi @Rocketknight1 ,thank you for your help. But sorry, I don't known how to simplify the reproducer code, the error is directly cause by load the adapter using AutoPeftModelForCausalLM, if I use the following code to load model, everything work fine.

from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_name ='/home/models/qwen/Qwen2.5-0.5B'
adapter_model_name = "/home/chenjq/pythonWork/nlp/Qwen2.5-0.5B-SFT-Capybara/checkpoint-31"
model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

you can get the adapter using the training code with a small dataset, and then load the adapter to reproduce this error.

@Rocketknight1
Copy link
Member

cc @BenjaminBossan @sayakpaul in that case!

@sayakpaul
Copy link
Member

@minmie I think you could further minimize the training procedure. Does it happen during loading or training? In any case, if you suspect the bug to be stemming from peft, please open an issue there.

@BenjaminBossan
Copy link
Member

It looks like the vocab / embedding size was extended. Not sure if trl may do this under the hood. When loading the model, we thus need to ensure that the vocab size is again extended to the same size it had before. The code would be something like:

base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-0.5B')
base_model.resize_token_embeddings(151936)
model = PeftModel.from_pretrained(base_model, peft_model_id)

@ManukyanD
Copy link
Contributor

ManukyanD commented Mar 8, 2025

Hi everyone! I did some research and found out that the error occurs because the len(tokenizer)(151665) and the embedding size (151936) of Qwen/Qwen2.5-0.5B do not match. _BaseAutoPeftModel.from_pretrained resizes the base model embeddings to match with the tokenizer (here) and as a result, it is unable to load the saved weights. I think a possible solution might be to only resize base model embeddings if the tokenizer size differs from the base tokenizer size. What do you think?

@leosongwei
Copy link

my reproduce on transformers==4.49.0:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, AutoConfig
from transformers.models.qwen2.modeling_qwen2 import Qwen2ForCausalLM
import torch

model_dir = "./Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_dir)
model: Qwen2ForCausalLM = AutoModelForCausalLM.from_pretrained(
    model_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code=True
)

which prompt: Sliding Window Attention is enabled but not implemented for sdpa; unexpected results may be encountered.

After downgrade to 4.48.0 the warning disappeared.

@minmie
Copy link
Author

minmie commented Mar 10, 2025

@minmie I think you could further minimize the training procedure. Does it happen during loading or training? In any case, if you suspect the bug to be stemming from peft, please open an issue there.

hi @sayakpaul, thanks for your help. This error occur during loading. I have open an issue in peft.

@minmie minmie closed this as completed Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants